whateverblog.
And the winner is... Prevayler
Thursday, February 06, 2003 12:25 PM
(This post has some redundancy with the "Mailstore..." post I made in the wee hours of the morning. Sorry 'bout that.)

I think the IMAP server is going to use Prevayler as the persistence mechanism for message/mailbox metadata. RFC2683 encourages the server developer to make metadata querying very responsive, while searching against content (i.e. message header values and message bodies) can be much slower. That's nice because the metadata tends to be very small and well-structured and the content tends to be large and a mess.

The messages themselves will be kept in plain-text, each in their own file, with the filename being derived from the message UID that IMAP requires you to maintain. So even if your installation of Poorman kicks the bucket, your messages are fully viewable. Heck, if I give those files the .eml extension, double-clicking them will open them right up in Outlook Express! :)

Anyway, each mailbox (or each user... not sure yet) will have its own prevaylent system (an analog to "database"). So at time of folder selection (or user login), whichever, the prevaylent system will be initialized (e.g. serialized snapshot read into memory) and whenever the user leaves the folder (or logs off) or calls the CHECK command the command log will be rolled into the snapshot.

Since I will make sure the metadata has a very very small footprint (for example, implementing message flags as bitmasks on an int or long), and only one prevaylent subsystem will be active per user at a time, memory usage should be perfectly acceptable. For example, loading up 10,000 messages' worth of metadata (that's a lot of messages for one folder, don't you think?), at say 50 bytes per message, would conservatively take on the order of 1MB. That's more than acceptable for me, considering the phenomenal metadata querying performance that ought to result.

I was also a little concerned that the snapshot deserialization (i.e. prevaylent subsystem starting up) that will have to occur while the system is live (as opposed to during startup) would be a problem. If it does take on the order of seconds to load up each mailbox it may be necessary to load all the mailboxes for the user at login time. I guess, as always, it's the time/space tradeoff.

One thing I have not yet considered is how and when to deal with incoming messages. If snapshot deserialization is a factor I could store them in a queue until the user logs in next time. Once the snapshot is loaded it should be almost instantaneous (in terms of user-perceived system responsiveness) to process even hundreds of e-mails.

Suh-weet. ;)