[Vm-dev] Re: ARM Cog progress

Eliot Miranda eliot.miranda at gmail.com
Sat Jun 6 17:03:15 UTC 2015


On Sat, Jun 6, 2015 at 9:33 AM, tim Rowledge <tim at rowledge.org> wrote:

>
> On 06-06-2015, at 8:15 AM, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> >     so yesterday I finally switched on the Raspberry Pi Doug gave me as
> an xmas present, built the Spur ARM Cog VM and ... we definitely have a
> working VM.
>
> It’s really nice to get to this. There are still some ‘exciting’ parts to
> get working though… floating point for example.
>
> >  I was able to update a Spur image from mid February all the way to tip
> and run tests.  3751 run, 3628 passes, 24 expected failures, 89 failures,
> 10 errors, 0 unexpected passes
>
> Did this include the FloatMathPluginTests? Because on my Pi2 that
> segfaults in all versions of the vm - interpreter, stack, cog. Then again
> my Pi2 is segfaulting on any vm compiled with -O2 right now whereas Eliot’s
> PiB is just fine with that. Good old GCC strikes again.
>
> > Fun!  So I want to revisit the literal load question.
> > In ARMv6T2 and later, MOV can load any 16-bit number, giving a range of
> 0x0-0xFFFF (0-65535).
> > The following table shows the range of 8-bit values that can be loaded
> in a single ARM MOV or MVN instruction (for data processing operations).
> The value to load must be a multiple of the value shown in the Step column.
> >
> Sadly the Pi B/+ are NOT 6T2 cpus. I checked this with Eben a while back.
> One of the side-effects of the flexibility ARM provides to actual
> manufacturers is a fairly complex range of possible features within any
> particular architecture level.
>

Damn, you're right.  gcc with the -march=armv6t2 option will generate
16-bit literal loads, e.g.

long it() { return 0x1A2B3C4D; }

=>

.arch armv6t2
...
.text
.align 2
.global it
.type it, %function
it:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movw r0, #15437
movt r0, 6699
bx lr



but compiling, linking and running does indeed signal Illegal instruction.
That's /my/ weekend ruined ;-)


That doesn’t mean we can’t do tricks to make the Pi_2_ use the nice v7
> features whilst using out of line data loads on the older machines. In the
> best case, where the data is already in the cache (we can use  PLD to help
> with that) a LDR takes 2 cycles as opposed to the 4 currently used by our
> mov/orr^3 unit. Using the v7 MOVT/H is also two instructions but *always*
> two cycles with possibility of an out-of-cache delay, so I still think it
> is probably better.
>

Except that in 64-bits don't we end up with 6 cycles (2 x MOVT/H plus a
shift and an add, or maybe 5 cycles if MOVT/H leave other bits undisturbed)
vs 2 for the out-of-line literal load?  In which case, the out-of-line is a
clear win for 64-bits and that's likely our most important target, given
the ubiquity of smart phones.


>
>
> tim
> --
> tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
> Strange OpCodes: EIV: Erase IPL Volume
>
>
>

-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20150606/103167a5/attachment.htm


More information about the Vm-dev mailing list