MagmaCollectionReader behavior

List overview All Threads
Download

newer

older

...

Re: MagmaCollectionReader behavior

Miguel Enrique Cobá Martínez

1 Apr 2009 1 Apr '09

2:04 a.m.

It is not clear from the magmaseaside tutorial, but the code from http://wiki.squeak.org/squeak/6021:

initialize | users | users := MagmaCollection new. users addIndex: (MaSearchStringIndex attribute: #email) beAscii. self at: #users put: users

findUserByEmail: anEmail ^ (self users where: [ :each | each email equals: anEmail ] ) firstOrNil

without any doubt suggests that the where: method and the

each email equals: anEmail

gives a *exact* or *equal* match, but that is not the case. In fact, the where: send returns a MagmaCollectionReader that stands for the *set* or *collection* of objects that matched the equals: method in direct relation with the index created for the MagmaCollection.

In this example, the index is created with the default (no keySize: especified) of 32 bits that merely gives you 4 meaningful characters when searching for a string, i.e. if you have users with emails like:

user email 1 'miguel@domain1.com' 2 'miguel@domain2.com' 3 'miguel.coba@domain3.com'

a message send like:

findUserByEmail: 'miguel@domain1.com'

will give you a MagmaReader that represents the 3 users in the database, because they all share the same 4 initial characters. After that, the firstOrNil message, ensure that the user # 1 will *always* be returned, no matter what argument you are passing to findUserByEmail. So, the answer from

findUserByEmail: 'miguel@domain1.com' findUserByEmail: 'miguel@domain2.com' findUserByEmail: 'miguel.coba@domain3.com'

will be always user #1.

In summary, the method doesn't has a right behavior, because it can't be used for finding a specific user, that is the intended action.

After reading the Index documentation from the magma site, it was clear that this MagmaCollectionReader can't give accurate and exact results and by itself it can't be used for finding objects. You *always* have to apply some kind of searching over the already reduced collection represented by MagmaCollectionReader in order to find the *exact* match you are trying to locate.

So the code should be something like:

findUserByEmail: anEmail | user | "Here you are working over the entire magma repo" user := (self users where: [:each | each email equals: anEmail]) "Here you are working over the reduced set returned by the where and represented by a MagmaCollectionReader" detect: [:each | "Here you are working on a plain Collection" each email = anEmail ] ifNone: [nil]. " ^ user

After changing the code this way, the example correctly can find the users with emails 'miguel@domain1.com', 'miguel@domain2.com' and 'miguel.coba@domain3.com'.

Can someone confirm that this is the correct way to use a MagmaCollectionReader?

P.D. I tried with a larger keySize: at index creation (I even try 400 bits) but this only postponed the point where the string matching stop working. Also, it is not efficient and with 400, squeak throws an error. So that was not the way to go.

Thank for your comments, Miguel Cobá

Show replies by date

Chris Muller

1 Apr 1 Apr

4:55 a.m.

Hi Miguel, MagmaCollections can absolutely

...

give accurate and exact results and by itself

, *can* be used for finding objects. You *don't* always

...

have to apply some kind of searching over the already reduced collection represented by MagmaCollectionReader in order to find the *exact* match you are trying to locate.

The key concept you've stumbled on here is that MagmaCollectionIndexes do only provide a *finite* key space. But how you decide to utilize that key-space, e.g., convert your objects into an integral key value, as well as the size of the key-space in bits, determines whether duplicate keys will occur or not.

If you want to use a big, fat, e-mail address as a "unique identifier", you will be better served to use most of a 64 or 128-bit key-space than a small percentage of a 400-bit key-space. Using only the alpha range:

(MaSearchStringIndex attribute: #email) keySize: 128; beAlpha; yourself

provides 27 meaningful characters, enough for probably 99% of e-mail addresses. A 256-bit alpha index would provide 54 meaningful characters but using the post-detect: on a 128-bit is a better choice; since even that will probably only detect through one element 99% of the time).

Please don't say "but alpha does not support the @ or . character." To maximize efficiency, you really need to make your own MagmaEmailIndex subclass which defines its own character map and uses it appropriately and efficiently.

Or, another thing you could do is break apart the email into three separate entries and small key-space indexes for all three:

miguel@gmail.com

becomes the entries #('miguel' 'gmail' 'com') and index each user at all three. Then, to find your user you could simply perform an appropriately conjuncted where:

myUsersMagmaCollectoin where: [ : each | (each first = 'miguel') & (each second = 'gmail') & (each third = 'com') ]

There are other solutions to be sure...

- Chris

2009/3/31 Miguel Enrique Cobá Martínez miguel.coba@gmail.com:

...

It is not clear from the magmaseaside tutorial, but the code from http://wiki.squeak.org/squeak/6021:

initialize | users | users := MagmaCollection new. users addIndex: (MaSearchStringIndex attribute: #email) beAscii. self at: #users put: users

findUserByEmail: anEmail ^ (self users where: [ :each | each email equals: anEmail ] ) firstOrNil

without any doubt suggests that the where: method and the

each email equals: anEmail

gives a *exact* or *equal* match, but that is not the case. In fact, the where: send returns a MagmaCollectionReader that stands for the *set* or *collection* of objects that matched the equals: method in direct relation with the index created for the MagmaCollection.

In this example, the index is created with the default (no keySize: especified) of 32 bits that merely gives you 4 meaningful characters when searching for a string, i.e. if you have users with emails like:

user email 1 'miguel@domain1.com' 2 'miguel@domain2.com' 3 'miguel.coba@domain3.com'

a message send like:

findUserByEmail: 'miguel@domain1.com'

will give you a MagmaReader that represents the 3 users in the database, because they all share the same 4 initial characters. After that, the firstOrNil message, ensure that the user # 1 will *always* be returned, no matter what argument you are passing to findUserByEmail. So, the answer from

findUserByEmail: 'miguel@domain1.com' findUserByEmail: 'miguel@domain2.com' findUserByEmail: 'miguel.coba@domain3.com'

will be always user #1.

In summary, the method doesn't has a right behavior, because it can't be used for finding a specific user, that is the intended action.

After reading the Index documentation from the magma site, it was clear that this MagmaCollectionReader can't give accurate and exact results and by itself it can't be used for finding objects. You *always* have to apply some kind of searching over the already reduced collection represented by MagmaCollectionReader in order to find the *exact* match you are trying to locate.

So the code should be something like:

findUserByEmail: anEmail

| user | "Here you are working over the entire magma repo" user := (self users where: [:each | each email equals: anEmail]) "Here you are working over the reduced set returned by the where and represented by a MagmaCollectionReader" detect: [:each | "Here you are working on a plain Collection" each email = anEmail ] ifNone: [nil]. " ^ user

After changing the code this way, the example correctly can find the users with emails 'miguel@domain1.com', 'miguel@domain2.com' and 'miguel.coba@domain3.com'.

Can someone confirm that this is the correct way to use a MagmaCollectionReader?

P.D. I tried with a larger keySize: at index creation (I even try 400 bits) but this only postponed the point where the string matching stop working. Also, it is not efficient and with 400, squeak throws an error. So that was not the way to go.

Thank for your comments, Miguel Cobá _______________________________________________ Magma mailing list Magma@lists.squeakfoundation.org http://lists.squeakfoundation.org/mailman/listinfo/magma

Miguel Enrique Cobá Martínez

8:06 a.m.

Chris Muller wrote:

First, I didn't mean to be rude. I have Magma in a very high regard and it is one of the most interesting projects and codebase I have ever learn from by reading the code. I just think that the example in the wiki page cited can make the users expect a very distinct behavior than the actual one.

See it this way. Most of the developers, including me, have used RDBMs for almost all our professional work. The change to an ODB can be at first very appealing, and at the same time, very dificult to grasp.

To the point. When we hear about indexes we, without exception, think about RDBSs indexes, that we create for a column in a table and then, where querying the table and if there is only a record with the value that we are looking for, the query returns only that value. No need to especify other parameters for this to work. This is the way that we expect them to work. Call it inertia, if you want.

Now, suppose that one of this developers, think of me again, want to know what's this thing about ODBs, and finds the wiki page mentioned, follows it and reads that the collection has an index over the #email instance variable. Without knowing the conceptual diferences between the indexes in RDBMs and Magma, he (me) will asume that for a single value on the collection, a search for that value will give a single result.

That is not the case, and that is the source of my confusion. I think that in the wiki page (that for sure is one of the most visited of the magma related ones) a warning should be made to the developer pointing that:

- he needs to create the index with a greater keySize in order to alleviate the problem with the finite key space and the impact that a low keySize has over it. - he needs to read in more deep the index wiki page to understand the features and the differences with RDBMs indexes. - he needs to think about querys in a very distinct way when dealing with ODBs than with RDBMs or/and he needs to massage or prepare the data to index in order to have the correct results (like the query for the 3 part email index you propose below)

...

Hi Miguel, MagmaCollections can absolutely

...
give accurate and exact results and by itself

I don't have any doubt, but depends, in this case of the keySize.

...

, *can* be used for finding objects. You *don't* always

...
have to apply some kind of searching over the already reduced collection represented by MagmaCollectionReader in order to find the *exact* match you are trying to locate.

The key concept you've stumbled on here is that MagmaCollectionIndexes do only provide a *finite* key space. But how you decide to utilize that key-space, e.g., convert your objects into an integral key value, as well as the size of the key-space in bits, determines whether duplicate keys will occur or not.

I think that this is a point where the requirements for the developers can be a little too much. See it this way, we are only indexing a string. It shouldn't be necessary to implement a hash code for converting them to numbers. Besides, most of us, or at least me, are not experts in the hashing techniques and for sure we can't be certain that the hash is correct or that don't clash.

...

If you want to use a big, fat, e-mail address as a "unique identifier", you will be better served to use most of a 64 or 128-bit key-space than a small percentage of a 400-bit key-space. Using only the alpha range:
(MaSearchStringIndex attribute: #email) keySize: 128; beAlpha; yourself

This line it is perfect for the wiki page, instead of the current one, because it will give the expected results most of the time. Of course the note about the limit cases should be present.

...

provides 27 meaningful characters, enough for probably 99% of e-mail addresses. A 256-bit alpha index would provide 54 meaningful characters but using the post-detect: on a 128-bit is a better choice; since even that will probably only detect through one element 99% of the time).

Please don't say "but alpha does not support the @ or . character." To maximize efficiency, you really need to make your own MagmaEmailIndex subclass which defines its own character map and uses it appropriately and efficiently.

Same case here. If you are only searching on an attribute, the code should be easy. I agree that will be cases when a subclass of MagmaIndex will be unavoidable, but not in the common use case.

...

Or, another thing you could do is break apart the email into three separate entries and small key-space indexes for all three:

miguel@gmail.com

becomes the entries #('miguel' 'gmail' 'com') and index each user at all three. Then, to find your user you could simply perform an appropriately conjuncted where:

myUsersMagmaCollectoin where: [ : each | (each first = 'miguel') & (each second = 'gmail') & (each third = 'com') ]

There are other solutions to be sure...

Chris

Thank you very much for your explanation, this has enlighted me a lot and I think that to others too. Again, your work it is amazing and I will be using it without a single doubt. Until now has been a pleasure and the development boost hasn't been achieved with any other database.

Miguel Cobá

...

2009/3/31 Miguel Enrique Cobá Martínez miguel.coba@gmail.com:

...
It is not clear from the magmaseaside tutorial, but the code from http://wiki.squeak.org/squeak/6021:

initialize | users | users := MagmaCollection new. users addIndex: (MaSearchStringIndex attribute: #email) beAscii. self at: #users put: users

findUserByEmail: anEmail ^ (self users where: [ :each | each email equals: anEmail ] ) firstOrNil

without any doubt suggests that the where: method and the

each email equals: anEmail

gives a *exact* or *equal* match, but that is not the case. In fact, the where: send returns a MagmaCollectionReader that stands for the *set* or *collection* of objects that matched the equals: method in direct relation with the index created for the MagmaCollection.

In this example, the index is created with the default (no keySize: especified) of 32 bits that merely gives you 4 meaningful characters when searching for a string, i.e. if you have users with emails like:

user email 1 'miguel@domain1.com' 2 'miguel@domain2.com' 3 'miguel.coba@domain3.com'

a message send like:

findUserByEmail: 'miguel@domain1.com'

will give you a MagmaReader that represents the 3 users in the database, because they all share the same 4 initial characters. After that, the firstOrNil message, ensure that the user # 1 will *always* be returned, no matter what argument you are passing to findUserByEmail. So, the answer from

findUserByEmail: 'miguel@domain1.com' findUserByEmail: 'miguel@domain2.com' findUserByEmail: 'miguel.coba@domain3.com'

will be always user #1.

In summary, the method doesn't has a right behavior, because it can't be used for finding a specific user, that is the intended action.

After reading the Index documentation from the magma site, it was clear that this MagmaCollectionReader can't give accurate and exact results and by itself it can't be used for finding objects. You *always* have to apply some kind of searching over the already reduced collection represented by MagmaCollectionReader in order to find the *exact* match you are trying to locate.

So the code should be something like:

findUserByEmail: anEmail

| user | "Here you are working over the entire magma repo" user := (self users where: [:each | each email equals: anEmail]) "Here you are working over the reduced set returned by the where and represented by a MagmaCollectionReader" detect: [:each | "Here you are working on a plain Collection" each email = anEmail ] ifNone: [nil]. " ^ user

After changing the code this way, the example correctly can find the users with emails 'miguel@domain1.com', 'miguel@domain2.com' and 'miguel.coba@domain3.com'.

Can someone confirm that this is the correct way to use a MagmaCollectionReader?

P.D. I tried with a larger keySize: at index creation (I even try 400 bits) but this only postponed the point where the string matching stop working. Also, it is not efficient and with 400, squeak throws an error. So that was not the way to go.

Thank for your comments, Miguel Cobá _______________________________________________ Magma mailing list Magma@lists.squeakfoundation.org http://lists.squeakfoundation.org/mailman/listinfo/magma

Brent Pinkney

9:51 a.m.

New subject: MagmaCollectionReader behavior - try Lava

Hi,

...

First, I didn't mean to be rude.

Don't worry, wlecome to Magma. I am one of the other Magma developers who works with Chris to develop Magma. We use Magma in a telco environment and it surpassed our expectations.

...

See it this way. Most of the developers, including me, have used RDBMs for almost all our professional work. The change to an ODB can be at first very appealing, and at the same time, very dificult to grasp.

May I suggest you install the 'Lava' and 'Lava testing' packages from SqueakSource. Lava is a (beta) extension of Magma to provide a SQL interface to Magma. That is, Magma collections can be wrapped in a SQL table schema and then queried using SQL. The SQL is converted into Magma queries.

Now, obviously Magma offers more sophisticated collections and indices than a rectangular RDBMS table, but the SUnit tests in the lava package have proven to be a useful tutorial for those coming from a RDBMS as it shows how irregular data can be stored, indexed and retrieved.

Feel free to ask the list.

Brent

Chris Muller

5:58 p.m.

Hi again! I'd just like to say one more thing; In Smalltalk, Strings do not have any upper-bound size-limit, so they are as flexible as a CLOBS in a RDBMS. The premise of your indexing comparison would seem require an RDBMS to index and search entire CLOB values, which I don't think they can do without application-developer help. (Please correct me if I'm wrong, I haven't used RDBMS in years..)

OTOH, if you define a indexed VARCHAR column, you are specifying an upper-bound on the size of the strings it can index (not to mention, store!). To me, this is analagous to specifying a key-size..

If you are willing in your application, to restrict the length of e-mail addresses to the length of whatever you would make the VARCHAR column, then you can choose an appropriate key-size to handle that length and not have to do the post-detect.

Finally,

...

I think that this is a point where the requirements for the developers can be a little too much. See it this way, we are only indexing a string. It shouldn't be necessary to implement a hash code for converting them to numbers.

As you get deeper into it, I hope you'll find the MaByteSequenceIndex hierarchy included in the base Magma package is flexible enough for use "out of the box" (but if not, it isn't hard to define your own custom index type).

But yes, the key point of this whole thread is the post-detect: required to index "CLOB" values (read: Smalltalk Strings), which is what I would do rather than restricting the user..

Regards, Chris

2009/4/1 Miguel Enrique Cobá Martínez miguel.coba@gmail.com:

...

Chris Muller wrote:

First, I didn't mean to be rude. I have Magma in a very high regard and it is one of the most interesting projects and codebase I have ever learn from by reading the code. I just think that the example in the wiki page cited can make the users expect a very distinct behavior than the actual one.

See it this way. Most of the developers, including me, have used RDBMs for almost all our professional work. The change to an ODB can be at first very appealing, and at the same time, very dificult to grasp.

To the point. When we hear about indexes we, without exception, think about RDBSs indexes, that we create for a column in a table and then, where querying the table and if there is only a record with the value that we are looking for, the query returns only that value. No need to especify other parameters for this to work. This is the way that we expect them to work. Call it inertia, if you want.

Now, suppose that one of this developers, think of me again, want to know what's this thing about ODBs, and finds the wiki page mentioned, follows it and reads that the collection has an index over the #email instance variable. Without knowing the conceptual diferences between the indexes in RDBMs and Magma, he (me) will asume that for a single value on the collection, a search for that value will give a single result.

That is not the case, and that is the source of my confusion. I think that in the wiki page (that for sure is one of the most visited of the magma related ones) a warning should be made to the developer pointing that:

he needs to create the index with a greater keySize in order to alleviate

the problem with the finite key space and the impact that a low keySize has over it.

he needs to read in more deep the index wiki page to understand the

features and the differences with RDBMs indexes.

he needs to think about querys in a very distinct way when dealing with

ODBs than with RDBMs or/and he needs to massage or prepare the data to index in order to have the correct results (like the query for the 3 part email index you propose below)

...
Hi Miguel, MagmaCollections can absolutely

...
give accurate and exact results and by itself

I don't have any doubt, but depends, in this case of the keySize.

...
, *can* be used for finding objects. You *don't* always

...
have to apply some kind of searching over the already reduced collection represented by MagmaCollectionReader in order to find the *exact* match you are trying to locate.

The key concept you've stumbled on here is that MagmaCollectionIndexes do only provide a *finite* key space. But how you decide to utilize that key-space, e.g., convert your objects into an integral key value, as well as the size of the key-space in bits, determines whether duplicate keys will occur or not.

I think that this is a point where the requirements for the developers can be a little too much. See it this way, we are only indexing a string. It shouldn't be necessary to implement a hash code for converting them to numbers. Besides, most of us, or at least me, are not experts in the hashing techniques and for sure we can't be certain that the hash is correct or that don't clash.

...
If you want to use a big, fat, e-mail address as a "unique identifier", you will be better served to use most of a 64 or 128-bit key-space than a small percentage of a 400-bit key-space. Using only the alpha range:

(MaSearchStringIndex attribute: #email) keySize: 128; beAlpha; yourself

This line it is perfect for the wiki page, instead of the current one, because it will give the expected results most of the time. Of course the note about the limit cases should be present.

...
provides 27 meaningful characters, enough for probably 99% of e-mail addresses. A 256-bit alpha index would provide 54 meaningful characters but using the post-detect: on a 128-bit is a better choice; since even that will probably only detect through one element 99% of the time).

Please don't say "but alpha does not support the @ or . character." To maximize efficiency, you really need to make your own MagmaEmailIndex subclass which defines its own character map and uses it appropriately and efficiently.

Same case here. If you are only searching on an attribute, the code should be easy. I agree that will be cases when a subclass of MagmaIndex will be unavoidable, but not in the common use case.

...
Or, another thing you could do is break apart the email into three separate entries and small key-space indexes for all three:

miguel@gmail.com

becomes the entries #('miguel' 'gmail' 'com') and index each user at all three. Then, to find your user you could simply perform an appropriately conjuncted where:

myUsersMagmaCollectoin where: [ : each | (each first = 'miguel') & (each second = 'gmail') & (each third = 'com') ]

There are other solutions to be sure...

- Chris

Thank you very much for your explanation, this has enlighted me a lot and I think that to others too. Again, your work it is amazing and I will be using it without a single doubt. Until now has been a pleasure and the development boost hasn't been achieved with any other database.

Miguel Cobá

...
2009/3/31 Miguel Enrique Cobá Martínez miguel.coba@gmail.com:

...
It is not clear from the magmaseaside tutorial, but the code from http://wiki.squeak.org/squeak/6021:

initialize | users | users := MagmaCollection new. users addIndex: (MaSearchStringIndex attribute: #email) beAscii. self at: #users put: users

findUserByEmail: anEmail ^ (self users where: [ :each | each email equals: anEmail ] ) firstOrNil

without any doubt suggests that the where: method and the

each email equals: anEmail

gives a *exact* or *equal* match, but that is not the case. In fact, the where: send returns a MagmaCollectionReader that stands for the *set* or *collection* of objects that matched the equals: method in direct relation with the index created for the MagmaCollection.

In this example, the index is created with the default (no keySize: especified) of 32 bits that merely gives you 4 meaningful characters when searching for a string, i.e. if you have users with emails like:

user email 1 'miguel@domain1.com' 2 'miguel@domain2.com' 3 'miguel.coba@domain3.com'

a message send like:

findUserByEmail: 'miguel@domain1.com'

will give you a MagmaReader that represents the 3 users in the database, because they all share the same 4 initial characters. After that, the firstOrNil message, ensure that the user # 1 will *always* be returned, no matter what argument you are passing to findUserByEmail. So, the answer from

findUserByEmail: 'miguel@domain1.com' findUserByEmail: 'miguel@domain2.com' findUserByEmail: 'miguel.coba@domain3.com'

will be always user #1.

In summary, the method doesn't has a right behavior, because it can't be used for finding a specific user, that is the intended action.

After reading the Index documentation from the magma site, it was clear that this MagmaCollectionReader can't give accurate and exact results and by itself it can't be used for finding objects. You *always* have to apply some kind of searching over the already reduced collection represented by MagmaCollectionReader in order to find the *exact* match you are trying to locate.

So the code should be something like:

findUserByEmail: anEmail

| user | "Here you are working over the entire magma repo" user := (self users where: [:each | each email equals: anEmail]) "Here you are working over the reduced set returned by the where and represented by a MagmaCollectionReader" detect: [:each | "Here you are working on a plain Collection" each email = anEmail ] ifNone: [nil]. " ^ user

After changing the code this way, the example correctly can find the users with emails 'miguel@domain1.com', 'miguel@domain2.com' and 'miguel.coba@domain3.com'.

Can someone confirm that this is the correct way to use a MagmaCollectionReader?

P.D. I tried with a larger keySize: at index creation (I even try 400 bits) but this only postponed the point where the string matching stop working. Also, it is not efficient and with 400, squeak throws an error. So that was not the way to go.

Thank for your comments, Miguel Cobá _______________________________________________ Magma mailing list Magma@lists.squeakfoundation.org http://lists.squeakfoundation.org/mailman/listinfo/magma

Miguel Cobá

6:53 p.m.

2009/4/1 Chris Muller ma.chris.m@gmail.com:

...

Hi again! I'd just like to say one more thing; In Smalltalk, Strings do not have any upper-bound size-limit, so they are as flexible as a CLOBS in a RDBMS. The premise of your indexing comparison would seem require an RDBMS to index and search entire CLOB values, which I don't think they can do without application-developer help. (Please correct me if I'm wrong, I haven't used RDBMS in years..)

Oh, yes, that explains the problem. I was comparing a string in Smalltalk to a varchar with a fixed lenght. And indeed, reading the documentation for indexes on MySQL, for example, the index creation over text fields (blobs alike for strings) *requires* a index prefix to create the index. Something analogous to the MagmaIndexes for strings.

...

OTOH, if you define a indexed VARCHAR column, you are specifying an upper-bound on the size of the strings it can index (not to mention, store!). To me, this is analagous to specifying a key-size..

If you are willing in your application, to restrict the length of e-mail addresses to the length of whatever you would make the VARCHAR column, then you can choose an appropriate key-size to handle that length and not have to do the post-detect.

I don't want to create a new class for the index, so my options are:

- limit the size of the input string for the #email field and choose a keySize for the MagmaIndex that have enough meaningful characters - use the post-detect to obtain the wanted value from the collection.

...

Finally,

...
I think that this is a point where the requirements for the developers can be a little too much. See it this way, we are only indexing a string. It shouldn't be necessary to implement a hash code for converting them to numbers.

As you get deeper into it, I hope you'll find the MaByteSequenceIndex hierarchy included in the base Magma package is flexible enough for use "out of the box" (but if not, it isn't hard to define your own custom index type).

But yes, the key point of this whole thread is the post-detect: required to index "CLOB" values (read: Smalltalk Strings), which is what I would do rather than restricting the user..

I agree, I prefer no to limit the user and do a post-detect.

...

Regards, Chris

Regards, Miguel Cobá

Miguel Cobá

6:54 p.m.

On Wed, Apr 1, 2009 at 10:53 AM, Miguel Cobá miguel.coba@gmail.com wrote:

...

2009/4/1 Chris Muller ma.chris.m@gmail.com:

...
Hi again! I'd just like to say one more thing; In Smalltalk, Strings do not have any upper-bound size-limit, so they are as flexible as a CLOBS in a RDBMS. The premise of your indexing comparison would seem require an RDBMS to index and search entire CLOB values, which I don't think they can do without application-developer help. (Please correct me if I'm wrong, I haven't used RDBMS in years..)

Oh, yes, that explains the problem. I was comparing a string in Smalltalk to a varchar with a fixed lenght. And indeed, reading the documentation for indexes on MySQL, for example, the index creation over text fields (blobs alike for strings) *requires* a index prefix to create the index. Something analogous to the MagmaIndexes for strings.

I forgot the link to the MySQL documentation of indexes:

http://dev.mysql.com/doc/refman/5.0/en/indexes.html

...

...
OTOH, if you define a indexed VARCHAR column, you are specifying an upper-bound on the size of the strings it can index (not to mention, store!). To me, this is analagous to specifying a key-size..

If you are willing in your application, to restrict the length of e-mail addresses to the length of whatever you would make the VARCHAR column, then you can choose an appropriate key-size to handle that length and not have to do the post-detect.

I don't want to create a new class for the index, so my options are:

limit the size of the input string for the #email field and choose a

keySize for the MagmaIndex that have enough meaningful characters

use the post-detect to obtain the wanted value from the collection.

...
Finally,

...
I think that this is a point where the requirements for the developers can be a little too much. See it this way, we are only indexing a string. It shouldn't be necessary to implement a hash code for converting them to numbers.

As you get deeper into it, I hope you'll find the MaByteSequenceIndex hierarchy included in the base Magma package is flexible enough for use "out of the box" (but if not, it isn't hard to define your own custom index type).

But yes, the key point of this whole thread is the post-detect: required to index "CLOB" values (read: Smalltalk Strings), which is what I would do rather than restricting the user..

I agree, I prefer no to limit the user and do a post-detect.

...
Regards, Chris

Regards, Miguel Cobá

5511

Age (days ago)

5511

Last active (days ago)

magma@lists.squeakfoundation.org

6 comments

5 participants

tags (0)

participants (5)

Brent Pinkney
Chris Muller
Chris Muller
Miguel Cobá
Miguel Enrique Cobá Martínez