<div dir="ltr"><div><div><div><div><div><div><div><div><div><div>Hi,<br></div>I have strange result with 64bits FFI function returning single precision float.<br></div>Here is an example:<br><br>(LapackSGEMatrix rows: #((2.3))) absMax.<br><br></div>This matrix has a single element, 2.3 rounded to single precision float<br>(2.299999952316284 when printed as a double precision)<br><br></div>absMax is supposed to take the maximum of absolute values in the matrix.<br></div>It does so thru Lapack function slange:<br>"<br>*  Purpose<br>*  =======<br>*  SLANGE  returns the value of the one norm,  or the Frobenius norm, or<br>*  the  infinity norm,  or the  element of  largest absolute value  of a<br>*  real matrix A.<br>"<br>    <cdecl: float 'slange_'( char * long * long * float * long * float * long )><br><br></div>Unfortunately above snippet returns 3.6893488147419103e19<br><br></div>It correctly calls this:<br>            floatRet = dispatchFunctionPointerwithwithwithwithwithwith(((float (*)(sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t, sqIntptr_t)) procAddr), ((calloutState->integerRegisters))[0], ((calloutState->integerRegisters))[1], ((calloutState->integerRegisters))[2], ((calloutState->integerRegisters))[3], ((calloutState->integerRegisters))[4], ((calloutState->integerRegisters))[5]);<br><br></div>which translates into something like:<br><br>    0x10833c537 <+2615>: movq   -0x28(%rbp), %rax<br>    0x10833c53b <+2619>: movq   -0xe8(%rbp), %rcx<br>    0x10833c542 <+2626>: movq   0xd8(%rcx), %rdi<br>    0x10833c549 <+2633>: movq   -0xe8(%rbp), %rcx<br>    0x10833c550 <+2640>: movq   0xe0(%rcx), %rsi<br>    0x10833c557 <+2647>: movq   -0xe8(%rbp), %rcx<br>    0x10833c55e <+2654>: movq   0xe8(%rcx), %rdx<br>    0x10833c565 <+2661>: movq   -0xe8(%rbp), %rcx<br>    0x10833c56c <+2668>: movq   0xf0(%rcx), %rcx<br>    0x10833c573 <+2675>: movq   -0xe8(%rbp), %r8<br>    0x10833c57a <+2682>: movq   0xf8(%r8), %r8<br>    0x10833c581 <+2689>: movq   -0xe8(%rbp), %r9<br>    0x10833c588 <+2696>: movq   0x100(%r9), %r9<br>->  0x10833c58f <+2703>: callq  *%rax<br>    0x10833c591 <+2705>: cvtss2sd %xmm0, %xmm0<br>    0x10833c595 <+2709>: movsd  %xmm0, -0x150(%rbp)<br><br></div>If I print $xmm0 just after the callq, then<br>(lldb) nexti<br>(lldb) print $xmm0<br>(unsigned char __attribute__((ext_vector_type(16)))) $212 = (0x00, 0x00, 0x00, 0x60, 0x66, 0x66, 0x02, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)<br><br></div>and just after the connversion to double precision:<br>(lldb) nexti<br>(lldb) print $xmm0<br>(unsigned char __attribute__((ext_vector_type(16)))) $213 = (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00)<br><div><div><div><div><div><br></div><div>Let's see:<br><br>tmp := #[16r00   16r00   16r00   16r60   16r66   16r66   16r02   16r40 ].<br>{tmp doubleAt: 1.<br>tmp floatAt: 1}.<br> #(2.299999952316284 3.6893488147419103e19)<br><br></div><div>Bingo! that means that the value returned in xmm0 was already in double precision.<br></div><div>When we convert it back to single precision (it's like interpreting the 4 LSB of the double as a single precision), then we get the incorrect value...<br><br></div><div>So why was slange result promoted to double?<br></div><div>I can reproduce on macosx with pre-installed veclib, and in win64 compiling LAPACK 3.3.1 from sources (translated by f2c) with MSVC10.</div><div><br></div><div>Ah, Ah, f2c! Dont you promote float return values to double? YES<br>But why this does not happen with the 32bits VM ???<br></div><div>That's what drove me off the solution for a while...<br></div><div>It's the IA32 ABI... return value is stored in ST0 (allways promoted to double).<br></div><div>So converting it to a double again like we do is a no-op and just works in 32bits.<br><br></div><div>That's going to be a problem for FORTRAN functions on 64bits.<br></div><div>IF compiled thru g77 or f2c conventions, then float results are promoted to double!<br></div><div>IF compiled thru gfortran, then float result just remain float results.<br></div><div>It means a major source of incompatibility: how to guess how this binary was compiled? (for example vecLib...)<br><br></div><div>And how to adapt my FFI source code?<br></div><div>Last thing, f2c might also be non standard when returning a complex value<br></div><div>Big ball of mud...<br><br></div></div></div></div></div></div>