Another observation:
Something prevents to make full optimisation of following intermediate
#(#block 1 #(#push #ebp) #(#mov #esp #ebp)) #(#block 3 #(#mov #(#mem #(#add #ebp -4)) 't1') #(#mov #(#mem #(#add #ebp -8)) 't2')) #(#block 5 #(#mov #(#add 't1' 't2') 't3') #(#jmp #block4)) #(#block 4) #(#block 6 #(#mov 't3' #eax) #(#jmp #block2)) #(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret))
into just: #(#(#block 1 #(#push #ebp) #(#mov #esp #ebp)) #(#block 3 #(#mov #(-8 #ebp) #eax) #(#add #(-4 #ebp) #eax)) #(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret)))
it generates following instead: #(#(#block 1 #(#push #ebp) #(#mov #esp #ebp)) #(#block 3 #(#mov #(-4 #ebp) #ebx) #(#mov #(-8 #ebp) #eax)) #(#block 5 #(#add #ebx #eax)) #(#block 4) #(#block 6) #(#block 2 #(#mov #ebp #esp) #(#pop #ebp) #(#ret)))
which uses 1 extra register and 1 more instruction comparing to previous output. Its not very important, i'm just curious what can prevent it from optimizing ? is this because movs and add placed in separate blocks, or because there is jumps (which is removed later, but prevent to fully optimize the code)? Maybe its better to put jumps remover before register allocation?