I'm wondering if anyone has favorite intel GCC compile flag clues they would like to share when one compiles a VM.
tossing -Os at GCC 4.0 for the mac intel port seems so primitive....
-- ======================================================================== === John M. McIntosh johnmci@smalltalkconsulting.com 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== ===
John M McIntosh wrote:
I'm wondering if anyone has favorite intel GCC compile flag clues they would like to share when one compiles a VM.
tossing -Os at GCC 4.0 for the mac intel port seems so primitive....
Yes -- we've found that adding -mcpu=your_cpu can make a huge difference.
Use -fomit-frame-pointer to free up another register (should help a lot on x86 CPUs).
Using profile-directed-optimisation can also make a big difference, but you'd need to have a set of tests that represent your typical usage to run in squeak to generate the profiling data. You build first with -fprofile-generate, run your tests, and then rebuild with -fprofile-use. Since squeak can be told to run a .st file at startup, it's easy to script this whole process.
Use -fwhole-program -funit-at-a-time to build all the sources files in one go, to allow for inter-module-analysis to take place. This however requires changes to the build system, so I haven't tried it.
Allowing GCC to do the type-based-alias-analysis (TBAA) can also give a very nice speedup, but when I tried it (vm-3.6.something), the VM code contained lots of pointer aliasing. I seem to remember that the build used -fno-sctrict-aliasing at the time. I don't know if the situation's changed since then.
Using the latest officially-released GCC is a good starting point. And of course measuring before, during, and after is vital to convince yourself that any speedups are "for real".
Let us know how it goes, Andrew.
squeak-dev@lists.squeakfoundation.org