I've been playing with the newer release of Magma. So far my tests have gone well. However, I have a few questions of applicability that I hope someone can help me answer.
First, I have a need to deploy an application for a small call center, where every call is recorded. Currently, the system creates an entry in a MySQL table indicating the source of the call (caller- id in case of someone calling in, or the extension of the agent in case the agent is making an outbound call), the destination (again, in the case of an outbound call, it would be the destination number, or in the case of an incoming call, it would be the agent's extension number), a timestamp, and a file name.
When a call comes in or is made, the system makes an entry in this table and stores an MP3 file in a directory of the system. What I was wondering if it would be applicable and make sense, is to store all this information in a Magma database. Meaning, store the call meta information as well as the actual audio content. At first, I don't think there would be a problem. However, I was looking at the current system and they have approximately 200GB of audio content stored in over 100,000 files (in a period of 6 months).
As additional requirements, the system should allow for the "easy" search of call records by either date, agent, or phone number. Also, the actual call "object" should be stored in the customer profile so that when reviewing the customer record, historical call recordings can be pulled.
Is this too much to ask for? Will searches be too slow for such volume of objects being stored? The way I see it, there really isn't a complex object hierarchy being stored. It's mostly flat list.
The second question of applicability I have is something that came to mind when reading the book "The Data Model Resource Book" by Len Silverston. This two-volume book shows universal data models applicable to most industries. Their target is relational databases but their models are easier represented as complex object models. As a matter of fact, they go through very complex relational data models in order to represent things I think would be much easier represented with object models. Anyway, just as I went through the book, I figured it would be interesting to model some of their ideas using Squeak and Magma, which brought me to the following questions based on some generalized ideas of an object model:
The book "models" people, companies, entities, etc as a general object called a Party. Therefore, a party could be "sub-typed" as a Person, Organization, etc, where a Person can be further "sub-typed" into entities like Employee, Contractor, Family Member, Contact, Customer, Prospect, etc. Organizations can also be further sub-typed into things like Internal Organizations, Competitors, Agents, Distributors, Supplier, Customers, etc. You get the idea. Basically, each of these entities can be continuously be specialized into more and more specific models, as the need arises, and the application's business rules should know how these objects should/could interact. Not that I'm planning on using these types of models for real life applications, but it is certainly joyful reading about it. They definitely made me think about how I could have modeled some of the apps I've written in the past.
Keeping the above general description of such a tiny portion of the object model in mind, I thought, well, I could definitely create a Party class with many different subclasses that can be each further subclassed and so on and so forth. Party class could store general attributes and methods common to all, including knowledge of how to persist itself and search, etc. At first, it sounds interesting, but then when the needs of searching comes to surface, there may be a chocking point in performance. Assuming a busy company uses this system, they could end up with several hundreds of thousands or millions of Party objects created relatively quickly. A user wanting to search for a customer, could specify a search criteria by, say, last name begins with "XXXX". How do you think Magma would perform? The way I think I would do this is to store all Party objects in a large collection with additional collections acting as "indices" for customers, etc. However, reading up on Magma's docs, I came across the MagmaCollectionReader and the ability to create customer indices. Would this work better? Could I create a customer index in Magma that would only store sub-classes of Party? For example, an index that would allow me to search by phone number of the class Customer, which is a sub-class of Party?
Sorry for the long post. I just have a lot in my mind and was hoping someone with more experience could either answer my questions or at least point me in the right direction.
Thanks, Daniel
Hi Daniel,
... When a call comes in or is made, the system makes an entry in this table and stores an MP3 file in a directory of the system. What I was wondering if it would be applicable and make sense, is to store all this information in a Magma database. Meaning, store the call meta information as well as the actual audio content. At first, I don't think there would be a problem. However, I was looking at the current system and they have approximately 200GB of audio content stored in over 100,000 files (in a period of 6 months).
As additional requirements, the system should allow for the "easy" search of call records by either date, agent, or phone number. Also, the actual call "object" should be stored in the customer profile so that when reviewing the customer record, historical call recordings can be pulled. ...
I'm glad you brought this up because I've contemplated this very question before. While Magma's ability to build and provide access to an arbitrarily large object model includes large-and-flat collections of objects, it does not include large streams of bytes. This severely limits Magma's ability to work with large multimedia files.
While you could, theoretically, commit and retrieve up to a 16-meg byte object, it was never really intended to do "files" like this. A streaming protocol directly to the Magma server would be needed, i.e., something that could nextPut: bytes chunk-by-chunk *during the call* rather than requiring the client to buffer the whole thing and dump it all at once on the network to the server as soon as you hung up..
However, I was looking at the current system and they have approximately 200GB of audio content stored in over 100,000 files (in a period of 6 months).
Squeaks maximum file-size addressability is currently 2GB I think. So we see here that any potential implementation to support this would need to keep each file separate, rather than embedding it into the objects file (which contains the serialized bytes for every object in the database).
Perhaps Magma could emulate a "file-system" and provide access to certain objects as files too? i.e., what if the repository could be seen as a "drive" and you could create "directories" and save arbitrary files in them. The VTOC tree would be kept in the system-area of the repository..
On the other hand, I am hard-pressed to believe you would want to use this over the (surely faster) standard file-system provided by the OS.. I could add in a bunch of code to support this and then it might end up not meeting your needs exactly. For example, perhaps the mp3 files would need to be accessible by another application in the company? So it would be accessing the files inside the directory where the Magma database resides, which could cause contention..
So from that argument, it seems appropriate for each application to set up their own location in the real file-system and write their program to access the files from their own application based and referenced by location-objects from their own app..
I hope others on list will share comments..
- Chris
Chris,
Thanks for the response. I guess I could always create some sort of hashing algorithm to store the mp3 files in a multi-level directory structure (instead of a single directory as it is now) and then I could simply store the metadata information in Magma. This would give me two benefits:
1) The application that creates the mp3 files could work independently of Squeak/Magma and simply use the same hashing algorithm to store the media file in the OS file system directly and log somewhere the new medial file created. This would allow that part of the system to work independently and guarantee the media files will always be there regardless of the status of the Squeak VM and/or application 2) This would also allow me to overcome the stream transfer limitation of Magma as well as the 2GB file limit (if any) of Squeak. Then I could simply serve the media file using HTTP protocol when requested.
I know that in general OODBs are targeted for both complex object structures as well as large objects. I would hope the latter is something Magma can improve in the future.
In the meantime, I'm happy with Magma and will continue using it. Hopefully, I can put it all in production very soon.
BTW, any comments on the other issue from my original post?
Thanks, Daniel
On Aug 15, 2005, at 2:00 PM, Chris Muller wrote:
Hi Daniel,
... When a call comes in or is made, the system makes an entry in this table and stores an MP3 file in a directory of the system. What I was wondering if it would be applicable and make sense, is to store all this information in a Magma database. Meaning, store the call meta information as well as the actual audio content. At first, I don't think there would be a problem. However, I was looking at the current system and they have approximately 200GB of audio content stored in over 100,000 files (in a period of 6 months).
As additional requirements, the system should allow for the "easy" search of call records by either date, agent, or phone number. Also, the actual call "object" should be stored in the customer profile so that when reviewing the customer record, historical call recordings can be pulled. ...
I'm glad you brought this up because I've contemplated this very question before. While Magma's ability to build and provide access to an arbitrarily large object model includes large-and-flat collections of objects, it does not include large streams of bytes. This severely limits Magma's ability to work with large multimedia files.
While you could, theoretically, commit and retrieve up to a 16-meg byte object, it was never really intended to do "files" like this. A streaming protocol directly to the Magma server would be needed, i.e., something that could nextPut: bytes chunk-by-chunk *during the call* rather than requiring the client to buffer the whole thing and dump it all at once on the network to the server as soon as you hung up..
However, I was looking at the current system and they have approximately 200GB of audio content stored in over 100,000 files (in a period of 6 months).
Squeaks maximum file-size addressability is currently 2GB I think. So we see here that any potential implementation to support this would need to keep each file separate, rather than embedding it into the objects file (which contains the serialized bytes for every object in the database).
Perhaps Magma could emulate a "file-system" and provide access to certain objects as files too? i.e., what if the repository could be seen as a "drive" and you could create "directories" and save arbitrary files in them. The VTOC tree would be kept in the system-area of the repository..
On the other hand, I am hard-pressed to believe you would want to use this over the (surely faster) standard file-system provided by the OS.. I could add in a bunch of code to support this and then it might end up not meeting your needs exactly. For example, perhaps the mp3 files would need to be accessible by another application in the company? So it would be accessing the files inside the directory where the Magma database resides, which could cause contention..
So from that argument, it seems appropriate for each application to set up their own location in the real file-system and write their program to access the files from their own application based and referenced by location-objects from their own app..
I hope others on list will share comments..
- Chris
BTW, any comments on the other issue from my original post?
Brent called it. Use MagmaCollections.
Here's a little info:
http://minnow.cc.gatech.edu/squeak/2639
http://minnow.cc.gatech.edu/squeak/2668
http://minnow.cc.gatech.edu/squeak/2985
Let me know if you have questions.
- Chris
magma@lists.squeakfoundation.org