[Vm-dev] VM Maker: VMMaker.oscog-eem.3309.mcz

commits at source.squeak.org commits at source.squeak.org
Sun Mar 12 01:38:20 UTC 2023


Eliot Miranda uploaded a new version of VMMaker to project VM Maker:
http://source.squeak.org/VMMaker/VMMaker.oscog-eem.3309.mcz

==================== Summary ====================

Name: VMMaker.oscog-eem.3309
Author: eem
Time: 11 March 2023, 5:37:57.868763 pm
UUID: 4212f0ab-e6e3-44f8-9b1e-f4639143b303
Ancestors: VMMaker.oscog-eem.3308

ThrreadedFFIPlugin.
Fix ThreadedFFIPlugin>>ffiCall:ArgArrayOrNil:NumArgs:'s use of ffiCheckReturn:With:in: so ffiCheckReturn:With:in: can be inlined.

ThreadedARM64FFIPlugin. pass structures as defined by the ARM Procedure Call Standard for the ARM 64-bit Architecture, IHI0055B_aapcs64.pdf, 22nd May 2013. Fixes the crashes guarded for by FFIPluginTests>>#arm64Check.

=============== Diff against VMMaker.oscog-eem.3308 ===============

Item was removed:
- ----- Method: ThreadedARM64AppleFFIPlugin>>ffiPushStructure:ofSize:typeSpec:ofLength:in: (in category 'marshalling') -----
- ffiPushStructure: pointer ofSize: structSize typeSpec: argSpec ofLength: argSpecSize in: calloutState
- 	<var: #pointer type: #'void *'>
- 	<var: #argSpec type: #'unsigned int *'>
- 	<var: #calloutState type: #'CalloutState *'>
- 	<inline: #always>
- 	| availableRegisterSpace stackPartSize roundedSize |
- 	"Stage B, step B.4 -- composite type larger than 16 bytes copied to caller-allocated memory and replaced by pointer"
- 	self flag: #todo.
- 
- 	"See IHI0055B_aapcs64.pdf sections 4.3.5 & 5.4.2 Stage C; we don't yet support HVA's"
- 	(self structIsHomogenousFloatArrayOfSize: structSize typeSpec: argSpec ofLength: argSpecSize)
- 		ifTrue:
- 			[availableRegisterSpace := (NumFloatRegArgs - calloutState floatRegisterIndex) * self wordSize.
- 			 structSize <= availableRegisterSpace ifTrue: "Stage C, step C.2, all in floating-point registers (!!!!)"
- 				[self 
- 					memcpy: (self cCoerceSimple: (self addressOf: (calloutState floatRegisters at: calloutState floatRegisterIndex)) to: #'void *') 
- 					_: pointer 
- 					_: structSize.
- 					"Round structSize up and divide by 8 ( NB: _not_ 4 !!)"
- 				 calloutState floatRegisterIndex: calloutState floatRegisterIndex + (structSize + 7 bitShift: -3).
- 				 ^0].
- 			 "Stage C, step C.3"
- 			 availableRegisterSpace := 0.
- 			 calloutState floatRegisterIndex: 8]
- 
- 		ifFalse:
- 			[availableRegisterSpace := (NumIntRegArgs - calloutState integerRegisterIndex) * self wordSize].
- 	stackPartSize := structSize.
- 	availableRegisterSpace > 0 ifTrue: 
- 		[structSize <= availableRegisterSpace ifTrue:"all in integer registers"
- 			[self 
- 				memcpy: (self cCoerceSimple: (self addressOf: (calloutState integerRegisters at: calloutState integerRegisterIndex)) to: #'void *') 
- 				_: pointer 
- 				_: structSize.
- 				"Round structSize up and divide by 8 ( NB: _not_ 4 !!)"
- 			 calloutState integerRegisterIndex: calloutState integerRegisterIndex + (structSize + 7 bitShift: -3).
- 			 ^0].
- 		 "If no previous co-processor candidate arg has already been pushed on the stack, then split the struct between registers and stack.
- 		  Otherwise push entire struct on stack."
- 		 calloutState currentArg = calloutState argVector
- 			ifTrue: 
- 		 		[stackPartSize := structSize - availableRegisterSpace.
- 		 		self 
- 					memcpy: (self cCoerceSimple: (self addressOf: (calloutState integerRegisters at: calloutState integerRegisterIndex)) to: 'void *') 
- 					_: pointer 
- 					_: availableRegisterSpace]
- 			ifFalse:
- 				[availableRegisterSpace := 0].
- 		"Stage C, step C.11"
- 		calloutState integerRegisterIndex: NumIntRegArgs].
- 
- 	stackPartSize > 0 ifTrue: 
- 		[roundedSize := stackPartSize + 3 bitClear: 3.
- 		 calloutState currentArg + roundedSize > calloutState limit ifTrue:
- 			 [^FFIErrorCallFrameTooBig].
- 		 self alignCurrentArgOf: calloutState to: 8.
- 		 self memcpy: calloutState currentArg _: (self addressOf: ((self cCoerceSimple: pointer to: #'char *') at: availableRegisterSpace)) _: stackPartSize.
- 		 calloutState currentArg: calloutState currentArg + roundedSize].
- 	^0!

Item was changed:
  ----- Method: ThreadedARM64FFIPlugin>>ffiPushStructure:ofSize:typeSpec:ofLength:in: (in category 'marshalling') -----
  ffiPushStructure: pointer ofSize: structSize typeSpec: argSpec ofLength: argSpecSize in: calloutState
  	<var: #pointer type: #'void *'>
  	<var: #argSpec type: #'unsigned int *'>
  	<var: #calloutState type: #'CalloutState *'>
  	<inline: #always>
+ 	| availableRegisterSpace roundedSize |
+ 	"Stage B
+ 		B.1 If the argument type is a Composite Type whose size cannot be statically determined by both the caller
+ 			and the callee, the argument is copied to memory and the argument is replaced by a pointer to the copy.
+ 			(There are no such types in C/C++ but they exist in other languages or in language extensions).
+ 		B.2 If the argument type is an HFA or an HVA, then the argument is used unmodified.
+ 		B.3 If the argument type is a Composite Type that is larger than 16 bytes, then the argument is copied to
+ 			memory allocated by the caller and the argument is replaced by a pointer to the copy.
+ 		B.4 If the argument type is a Composite Type then the size of the argument is rounded up to the nearest
+ 			multiple of 8 bytes."
- 	| availableRegisterSpace stackPartSize roundedSize |
- 	"Stage B, step B.4 -- composite type larger than 16 bytes copied to caller-allocated memory and replaced by pointer"
- 	self flag: #todo.
  
+ 	"See IHI0055B_aapcs64.pdf sections 4.3.5 & 5.4.2 Stage C"
- 	"See IHI0055B_aapcs64.pdf sections 4.3.5 & 5.4.2 Stage C; we don't yet support HVA's"
  	(self structIsHomogenousFloatArrayOfSize: structSize typeSpec: argSpec ofLength: argSpecSize)
  		ifTrue:
  			[availableRegisterSpace := (NumFloatRegArgs - calloutState floatRegisterIndex) * self wordSize.
  			 structSize <= availableRegisterSpace ifTrue: "Stage C, step C.2, all in floating-point registers (!!!!)"
  				[self 
  					memcpy: (self cCoerceSimple: (self addressOf: (calloutState floatRegisters at: calloutState floatRegisterIndex)) to: #'void *') 
  					_: pointer 
  					_: structSize.
  					"Round structSize up and divide by 8 ( NB: _not_ 4 !!)"
  				 calloutState floatRegisterIndex: calloutState floatRegisterIndex + (structSize + 7 bitShift: -3).
  				 ^0].
  			 "Stage C, step C.3"
  			 availableRegisterSpace := 0.
  			 calloutState floatRegisterIndex: 8]
  
  		ifFalse:
  			[availableRegisterSpace := (NumIntRegArgs - calloutState integerRegisterIndex) * self wordSize].
+ 
+ 	"If it's small (16 bytes or less) and will fit in registers it is passed in registers, otherwise it is copied to memory.
+ 	 If it is a Homogenous Short Vector (HVA) (up to 32 bytes long) and will fit it is passed in registers."
+ 	(structSize <= availableRegisterSpace "all in integer registers; we have no way of getting to SIMD registers"
+ 	 and: [structSize <= 16
+ 		or: [self structIsHomogenousIntegerArrayOfSize: structSize typeSpec: argSpec ofLength: argSpecSize]]) ifTrue: 
- 	stackPartSize := structSize.
- 	availableRegisterSpace > 0 ifTrue: 
  		[structSize <= availableRegisterSpace ifTrue:"all in integer registers"
  			[self 
  				memcpy: (self cCoerceSimple: (self addressOf: (calloutState integerRegisters at: calloutState integerRegisterIndex)) to: #'void *') 
  				_: pointer 
  				_: structSize.
  				"Round structSize up and divide by 8 ( NB: _not_ 4 !!)"
  			 calloutState integerRegisterIndex: calloutState integerRegisterIndex + (structSize + 7 bitShift: -3).
+ 			 ^0]].
- 			 ^0].
- 		 "If no previous co-processor candidate arg has already been pushed on the stack, then split the struct between registers and stack.
- 		  Otherwise push entire struct on stack."
- 		 calloutState currentArg = calloutState argVector
- 			ifTrue: 
- 		 		[stackPartSize := structSize - availableRegisterSpace.
- 		 		self 
- 					memcpy: (self cCoerceSimple: (self addressOf: (calloutState integerRegisters at: calloutState integerRegisterIndex)) to: 'void *') 
- 					_: pointer 
- 					_: availableRegisterSpace]
- 			ifFalse:
- 				[availableRegisterSpace := 0].
- 		"Stage C, step C.11"
- 		calloutState integerRegisterIndex: NumIntRegArgs].
  
+ 	"If small and won't fit in registers, copy to the stack.
+ 	 N.B. my (eem) reading of IHI0055B_aapcs64.pdf  is that unlike the 32-bit PCS, aggregates are never split between memory and registers."
+ 	structSize <= 16 ifTrue: 
+ 		[roundedSize := structSize + 7 bitClear: 7.
- 	stackPartSize > 0 ifTrue: 
- 		[roundedSize := stackPartSize + 3 bitClear: 3.
  		 calloutState currentArg + roundedSize > calloutState limit ifTrue:
  			 [^FFIErrorCallFrameTooBig].
+ 		 self alignCurrentArgOf: calloutState to: 8.
+ 		 self memcpy: calloutState currentArg _: pointer _: structSize.
- 		 self memcpy: calloutState currentArg _: (self addressOf: ((self cCoerceSimple: pointer to: #'char *') at: availableRegisterSpace)) _: stackPartSize.
  		 calloutState currentArg: calloutState currentArg + roundedSize].
+ 
+ 	"If it is not small it is passed as a pointer. N.B. Spur guarantees only 8-byte alignment. IHI0055B_aapcs64.pdf is vague on the memory's alignment.
+ 	 Arguably the memory should be pinned in case of a callback. Don't bother for now. eem 3/11/2023"
+ 	^self ffiPushPointer: pointer in: calloutState!
- 	^0!

Item was changed:
  ----- Method: ThreadedARM64FFIPlugin>>structIsHomogenousFloatArrayOfSize:typeSpec:ofLength: (in category 'marshalling') -----
  structIsHomogenousFloatArrayOfSize: structSize typeSpec: argSpec ofLength: argSpecSize
+ 	"See IHI0055B_aapcs64.pdf 
+ 	4.3.5.1 Homogeneous Floating-point Aggregates (HFA)
+ 		An Homogeneous Floating-point Aggregate (HFA) is an Homogeneous Aggregate with a
+ 		Fundamental Data Type that is a Floating-Point type and at most four uniquely addressable members."
  	<var: #argSpec type: #'unsigned int *'>
  	| firstField typeOfFirstField |
  	(structSize <= (4 * (self sizeof: #double))
  	 and: [argSpecSize <= 5]) "header plus up to four fields" ifFalse:
  		[^false].
  	typeOfFirstField := self atomicTypeOf: (firstField := argSpec at: 1).
  	(typeOfFirstField ~= FFITypeSingleFloat and: [typeOfFirstField ~= FFITypeDoubleFloat]) ifTrue:
  		[^false].
  	2 to: argSpecSize - 1 do:
  		[:idx|
  		firstField ~= (argSpec at: idx) ifTrue:
  			[^false]].
  	^true!

Item was added:
+ ----- Method: ThreadedARM64FFIPlugin>>structIsHomogenousIntegerArrayOfSize:typeSpec:ofLength: (in category 'marshalling') -----
+ structIsHomogenousIntegerArrayOfSize: structSize typeSpec: argSpec ofLength: argSpecSize
+ 	"See IHI0055B_aapcs64.pdf 
+ 	4.1.2 Short Vectors
+ 		A short vector is a machine type that is composed of repeated instances of one fundamental integral
+ 		or floating- point type. It may be 8 or 16 bytes in total size. A short vector has a base type that is the
+ 		fundamental integral or floating-point type from which it is composed, but its alignment is always the
+ 		same as its total size. The number of elements in the short vector is always such that the type is fully
+ 		packed. For example, an 8-byte short vector may contain 8 unsigned byte elements, 4 unsigned half-word
+ 		elements, 2 single-precision floating-point elements, or any other combination where the product of
+ 		the number of elements and the size of an individual element is equal to 8. Similarly, for 16-byte short
+ 		vectors the product of the number of elements and the size of the individual elements must be 16.
+ 
+ 		Elements in a short vector are numbered such that the lowest numbered element (element 0) occupies
+ 		the lowest numbered bit (bit zero) in the vector and successive elements take on progressively
+ 		increasing bit positions in the vector. When a short vector transferred between registers and memory
+ 		it is treated as an opaque object. That is a short vector is stored in memory as if it were stored with
+ 		a single STR of the entire register; a short vector is loaded from memory using the corresponding LDR
+ 		instruction. On a little-endian system this means that element 0 will always contain the lowest
+ 		addressed element of a short vector; on a big-endian system element 0 will contain the highest-addressed
+ 		element of a short vector.
+ 
+ 		A language binding may define extended types that map directly onto short vectors. Short vectors
+ 		are not otherwise created spontaneously (for example because a user has declared an aggregate
+ 		consisting of eight consecutive byte-sized objects)."
+ 	<var: #argSpec type: #'unsigned int *'>
+ 	| firstField typeOfFirstField sizeOfType |
+ 	structSize <= (4 * (self sizeof: #long)) ifFalse:
+ 		[^false].
+ 	typeOfFirstField := self atomicTypeOf: (firstField := argSpec at: 1).
+ 	(typeOfFirstField between: FFITypeUnsignedInt8 and: FFITypeSignedInt64) ifFalse:
+ 		[^false].
+ 	"N.B. This is as of pluginVersion: 1, c.f. initializeTypeConstants_v1.
+ 	 pluginVersion: 2, c.f. initializeTypeConstants_v2 needs different code"
+ 	sizeOfType := 1 << (typeOfFirstField >> 1 - 1).
+ 		"{	FFITypeUnsignedInt8. FFITypeUnsignedInt16. FFITypeUnsignedInt32. FFITypeUnsignedInt64.
+ 			FFITypeSignedInt8. FFITypeSignedInt16. FFITypeSignedInt32. FFITypeSignedInt64} collect:
+ 				[:typeOfFirstField| 1 << (typeOfFirstField >> 1 - 1)]"
+ 	sizeOfType * argSpecSize > (4 * (self sizeof: #long)) ifTrue:
+ 		[^false].
+ 	2 to: argSpecSize - 1 do:
+ 		[:idx|
+ 		firstField ~= (argSpec at: idx) ifTrue:
+ 			[^false]].
+ 	^true!

Item was changed:
  ----- Method: ThreadedFFIPlugin>>ffiCall:ArgArrayOrNil:NumArgs: (in category 'callout support') -----
  ffiCall: externalFunction ArgArrayOrNil: argArrayOrNil NumArgs: nArgs
  	"Generic callout. Does the actual work.  If argArrayOrNil is nil it takes args from the stack
  	 and the spec from the method.  If argArrayOrNil is not nil takes args from argArrayOrNil
  	 and the spec from the receiver."
  	| flags argTypeArray address argType oop argSpec argClass err theCalloutState calloutState requiredStackSize stackSize allocation result primNumArgs |
  	<inline: #always>
  	<var: #theCalloutState type: #'CalloutState'>
  	<var: #calloutState type: #'CalloutState *'>
  	<var: #allocation type: #'char *'>
  
  	primNumArgs := interpreterProxy methodArgumentCount.
  	(interpreterProxy is: externalFunction KindOfClass: interpreterProxy classExternalFunction) ifFalse:
  		[^self ffiFail: FFIErrorNotFunction].
  	"Load and check the values in the externalFunction before we call out"
  	flags := interpreterProxy fetchInteger: ExternalFunctionFlagsIndex ofObject: externalFunction.
  	interpreterProxy failed ifTrue:
  		[^self ffiFail: FFIErrorBadArgs].
  
  	"This must come early for compatibility with the old FFIPlugin.  Image-level code
  	 may assume the function pointer is loaded eagerly.  Thanks to Nicolas Cellier."
  	address := self ffiLoadCalloutAddress: externalFunction.
  	interpreterProxy failed ifTrue:
  		[^0 "error code already set by ffiLoadCalloutAddress:"].
  	
  	argTypeArray := interpreterProxy fetchPointer: ExternalFunctionArgTypesIndex ofObject: externalFunction.
  	"must be array of arg types"
  	((interpreterProxy isArray: argTypeArray)
  	and: [(interpreterProxy slotSizeOf: argTypeArray) = (nArgs + 1)]) ifFalse:
  		[^self ffiFail: FFIErrorBadArgs].
  	"check if the calling convention is supported"
  	self cppIf: COGMTVM
  		ifTrue:
  			[(self ffiSupportsCallingConvention: (flags bitAnd: FFICallTypesMask)) ifFalse:
  				[^self ffiFail: FFIErrorCallType]]
  		ifFalse: "not masking causes threaded calls to fail, which is as they should if the plugin is not threaded."
  			[(self ffiSupportsCallingConvention: flags) ifFalse:
  				[^self ffiFail: FFIErrorCallType]].
  		
  	requiredStackSize := self externalFunctionHasStackSizeSlot
  							ifTrue: [interpreterProxy
  										fetchInteger: ExternalFunctionStackSizeIndex
  										ofObject: externalFunction]
  							ifFalse: [-1].
  	interpreterProxy failed ifTrue:
  		[^interpreterProxy primitiveFailFor: (argArrayOrNil isNil
  												ifTrue: [PrimErrBadMethod]
  												ifFalse: [PrimErrBadReceiver])].
  	stackSize := requiredStackSize < 0 ifTrue: [DefaultMaxStackSize] ifFalse: [requiredStackSize].
  	self cCode: [] inSmalltalk: [theCalloutState := self class calloutStateClass new].
  	calloutState := self addressOf: theCalloutState.
  	self cCode: [self memset: calloutState _: 0 _: (self sizeof: #CalloutState)].
  	calloutState callFlags: flags.
  	"Fetch return type and args"
  	argType := interpreterProxy fetchPointer: 0 ofObject: argTypeArray.
  	argSpec := interpreterProxy fetchPointer: 0 ofObject: argType.
  	argClass := interpreterProxy fetchPointer: 1 ofObject: argType.
+ 	"Witten this way to allow Slang to inline ffiCheckReturn:With:in:"
+ 	err := self ffiCheckReturn: argSpec With: argClass in: calloutState.
+ 	err ~= 0 ifTrue:
- 	(err := self ffiCheckReturn: argSpec With: argClass in: calloutState) ~= 0 ifTrue:
  		[^self ffiFail: err]. "cannot return"
  	"alloca the outgoing stack frame, leaving room for marshalling args, and including space for the return struct, if any.
  	Additional space reserved for saving register args like mandated by Win64 X64 or PPC ABI, will be managed by the call itself"
  	allocation := self alloca: stackSize + calloutState structReturnSize + self cStackAlignment.
  	self mustAlignStack ifTrue:
  		[allocation := self cCoerce: (allocation asUnsignedIntegerPtr bitClear: self cStackAlignment - 1) to: #'char *'].
  	calloutState
  		argVector: allocation;
  		currentArg: allocation;
  		limit: allocation + stackSize.
  	(self nonRegisterStructReturnIsViaImplicitFirstArgument
  	 and: [calloutState structReturnSize > 0
  	 and: [(self returnStructInRegisters: calloutState) not]]) ifTrue:
  		[err := self ffiPushPointer: calloutState limit in: calloutState.
  		 err ~= 0 ifTrue:
  			[self cleanupCalloutState: calloutState.
  			 self cppIf: COGMTVM ifTrue:
  			 [err = PrimErrObjectMayMove negated ifTrue:
  				[^PrimErrObjectMayMove]]. "N.B. Do not fail if object may move because caller will GC and retry."
  			 ^self ffiFail: err]].
  	1 to: nArgs do:
  		[:i|
  		argType := interpreterProxy fetchPointer: i ofObject: argTypeArray.
  		argSpec := interpreterProxy fetchPointer: 0 ofObject: argType.
  		argClass := interpreterProxy fetchPointer: 1 ofObject: argType.
  		oop := argArrayOrNil
  				ifNil: [interpreterProxy stackValue: nArgs - i]
  				ifNotNil: [interpreterProxy fetchPointer: i - 1 ofObject: argArrayOrNil].
  		err := self ffiArgument: oop Spec: argSpec Class: argClass in: calloutState.
  		err ~= 0 ifTrue:
  			[self cleanupCalloutState: calloutState.
  			 self cppIf: COGMTVM ifTrue:
  			 [err = PrimErrObjectMayMove negated ifTrue:
  				[^PrimErrObjectMayMove]]. "N.B. Do not fail if object may move because caller will GC and retry."
  			 ^self ffiFail: err]]. "coercion failed or out of stack space"
  	"Failures must be reported back from ffiArgument:Spec:Class:in:.
  	 Should not fail from here on in."
  	self assert: interpreterProxy failed not.
  	self ffiLogCallout: externalFunction.
  	(requiredStackSize < 0
  	 and: [self externalFunctionHasStackSizeSlot]) ifTrue:
  		[stackSize := calloutState currentArg - calloutState argVector.
  		 interpreterProxy storeInteger: ExternalFunctionStackSizeIndex ofObject: externalFunction withValue: stackSize].
  	"Go out and call this guy"
  	result := self ffiCalloutTo: address SpecOnStack: argArrayOrNil notNil in: calloutState.
  	self cleanupCalloutState: calloutState.
  	"Can not safely use argumentCount (via e.g. methodReturnValue:) since it may have been changed by a callback."
  	interpreterProxy pop: primNumArgs + 1 thenPush: result. 
  	^result!



More information about the Vm-dev mailing list