-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 =pod There was an article on news.yc today called L<"Gay marriage: the database engineering perspective"|http://news.ycombinator.com/item?id=371987>. It was some politics mixed with relational database schemas for modeling marriage. As I read through the article, I remembered that I wanted to write an article about object databases, since we now have a really good one for Perl called L. I have used Kioku instead of a relational database for my last few applications, and it has changed my life. (Relational theory is interesting from a mathematical standpoint, but it just doesn't fit with how computer programs are written these days. Mixing objects and relations is a hack, and I don't feel the need to build my application on a bunch of hacks.) Anyway, I'll show you what I mean in the rest of this article. I am actually going to use L, an object database for Common Lisp, in this article, since the code is a lot smaller and looks nicer on a blog. Perl classes are kind of wordy, you have to create a bunch of files... not fun. Elephant's semantics are the same as Kioku, though, so the concepts you learn here can easily be applied to Kioku. (The real reason for using Lisp is that social news sites automatically upvote anything that contains the word "Lisp", and I want some of that sweet sweet Reddit karma. I can taste it already!) I am also going to omit some of the very early examples from the news.yc article. I don't know what would be going through the mind of someone that thinks that Enums should be modeled by using separate tables (i.e. one table for males, one for females; modeling the enum C), so I am not even going to talk about those. I am going to jump right to a model that won't make people cry. But before we do anything fun, we need to do a bit of housekeeping. We need to load Elephant. Since this app is a toy, I just ran the C command in SLIME to do that. You can also write C<(asdf:operate 'asdf:load-op :elephant)>, or create a system that loads it. We also need to connect to a database. I wrote a small function to do that: lang:Common Lisp (defun connect () (let ((database-directory #P"/tmp/people.db/")) (ensure-directories-exist database-directory) (elephant:open-store (list :BDB database-directory)))) Type C<(connect)> in your REPL and you are ready to experiment. Now on to our model. The author of the relational database article eventually argues that marriages should be possible between two or more non-identical people. That is fine with me, so let's model that. In normal CLOS, we would say something like: (defclass person () ((name :initarg :name :reader name) (relationships :initform '() :reader relationships))) (defclass relationship () ((members :initform '() :reader relationship-members))) When using Elephant, we make a few changes. We create classes with C (for "define persistent class") instead of C, and we will use "psets" instead of lists. (In Lisp, we normally model sets with plain old lists and use functions like C to interact with them. Psets have a similar API, but are loaded lazily from the database and are a lot more efficient that normal lists. The Elephant documentation goes into detail about this, if you care. On the Perl side, we use Cs instead of ArrayRefs or C objects.) So, here's the Elephant-ized version: (elephant:defpclass person () ((name :initarg :name :reader name) (relationships :initform (elephant:make-pset) :reader relationships))) (elephant:defpclass relationship () ((members :initform (elephant:make-pset) :reader relationship-members))) Now when we make an instance of person or relationship, it will be stored in the database, transparently. At this point, we have two classes; a person class for modeling people, and a relationship class for modeling things like marriages and divorces. Each person has a set of relationships that they are in, and each relationship has a set of people that are members of the relationship. Let's flesh out the various types of relationships that people can have: (elephant:defpclass marriage (relationship) ((marriage-date :initform (get-universal-time) :reader marriage-date))) (elephant:defpclass divorce (marriage) ((divorce-date :initform (get-universal-time) :reader divorce-date) (reason :initarg :reason :reader divorce-reason))) This gives us all the data structures we need to model marriage and divorce between an arbitrary number of people. You'll notice that there are no "hacks" here. If we were using a relational database, we'd probably shove all the relationships into one table with columns like "marriage_date" and "divorce_date" that determine the type of the row. If divorce_date is null, then we know the record is a marriage. If both are non-null, then we know the record is a divorce. If marriage is null and divorce is non-null, then... well, our database is corrupted. We don't have that problem with an OO model because there is no way a divorce couldn't have been a marriage. Polymorphism is where OO excels and where the relational model fails, and we exploit that to make our model robust. Anyway, now we need to add some behavior to these objects. Let's add an operation to marry an arbitrary number of people: (defgeneric marry (&rest people)) (defmethod marry (&rest people) (declare (type list people)) (setf people (remove-duplicates people)) ; can't marry yourself (when (< (length people) 2) (error "A marriage without any people is pretty boring.")) (elephant:with-transaction () (let ((relationship (make-instance 'marriage))) (loop for person in people do (progn (elephant:insert-item person (relationship-members relationship)) (elephant:insert-item relationship (relationships person))))))) All this does is add every person to the marriage, and add the marriage to every person's list of relationships. We can use it like: (defparameter alice (make-instance 'person :name "Alice")) (defparameter bob (make-instance 'person :name "Bob")) (marry alice bob) If we wanted to marry more people, that would be fine; the model doesn't care how many people are in a marriage -- if you happen to have some weird religious beliefs against that, you can enforce that in the code. We'll see an example later. After an unfortunate incident involving Eve and some unencrypted email, however, Alice and Bob no longer love each other. We will need to implement the divorce function. This is actually very easy: (change-class marriage 'divorce :reason "death threats") Since divorce is a subclass of marriage, we preserve all the information about the marriage, but also add some information about the divorce. And since we are changing an existing instance, everything that had a reference to the marriage now sees that it's a divorce. That's basically all there is to this object database thing. You don't really see the database, you just interact with your objects and they persist. The model your program uses is the same as the data model. Now let's see how we can extend this model to deal with marriages in California. Over there, the voters were brainwashed into making gay marriage illegal. (Supposedly to "protect traditional family values". If you want to protect the family values, shouldn't you make I illegal?) Unfortunately, object databases make it easy to enforce this constraint. We'll start by redefining our person class to have a gender. The law says that a person can be either male or female ("not male"), and that a marriage consists of exactly one of each. Here are our new classes: (elephant:defpclass person () ((name :initarg :name :reader name) (relationships :initform (elephant:make-pset) :reader relationships) (malep :initarg :malep :initform NIL :reader malep))) (elephant:defpclass fundamentalist-marriage (marriage)) Then we can write a function to marry two fundamentalists: (defmethod fundamentally-marry (guy girl) (when (eq (malep guy) (malep girl)) ; eq is like xor here (error "OMG YOU ARE AN ABOMINATION AGAINST GOD")) (elephant:with-transaction () (let ((m (make-instance 'fundamentalist-marriage))) (elephant:insert-item guy (relationship-members m)) (elephant:insert-item girl (relationship-members m)) (elephant:insert-item m (relationships guy)) (elephant:insert-item m (relationships girl))))) That's basically it. When people marry with this method, the information is encoded as the type of the object in the relationships set. The key is that when laws change, we don't need to change the schema. We just create a subclass of something that already exists, and start using that instead. (You can migrate or delete old data if you want to, but you don't have to. The database and the data model don't care.) The interesting thing is that we never really noticed the database. We spelled defclass differently, and now our classes persist and can be queried. Nice! So, if you haven't tried an object database yet, why not try one out in your next application? The advantages are numerous -- you express your data as the fundamental types in your language (lists, hash tables, sets, objects), instead of as tables, constraints, and relationships. This makes the data easier to model correctly, and you don't write any code to glue your database to your object-oriented app; with an OODB, you never even think about the database. If you want to add constraints to your data, you just write code. If your program is never allowed to have an object that doesn't make sense, the database can never have data that doesn't make sense. A common misconception is that relational databases somehow perform better than object databases. There is no fundamental reason for that to be true. In the end, databases' speed come from knowing what object you want, and quickly looking it up. Fast lookups come from balanced trees (which guarantee O(log n) lookups -- for a database with a record for every observable particle in the Universe, you would only need to do about 250 comparisons to find the object you're looking for. You could do that I pretty quickly.) The hard part is determining which object you want to lookup. In our database, that usually means looking at a set of objects and inflating them -- fast. In a relational database, it means looking at the index -- equally fast. With an object database, you even have an opportunity to optimize things simply by writing application code. If you wanted to track how many people were in the system, for example, you could just write a special metaclass to do that. When you create an instance of the person class, the count in the person metaclass is incremented. When you destroy an instance, the count is decremented. Now you can count instances in O(1) time, transparently. With a relational database, the solution would be much less elegant, and it couldn't track in-memory objects. The point is, you have the flexibility to make things really fast if you need to. There's really no reason to reject object databases for performance reasons. If your object database is performing slowly, it's a bug in your implementation -- not a fundamental limitation of object databases. Another complaint is that there isn't a way to enforce integrity at the database level. This is true; the actual database on disk is just a dumb object store. If you want integrity, make sure your application doesn't create invalid objects. An object oriented application shouldn't rely on the database to validate the objects, anyway. If you allow in-memory objects to be invalid and only enforce validity when storing to the database, you are Doing It Wrong. Make sure your application doesn't have invalid objects, and your database won't have them either. If you want to share a database between many applications written in many languages (such that you can't just share the library code), slap an RPC server in front of your classes, and make all accesses go through that. Quick and easy. And oh yeah, if you have to ask your government for permission to marry... are you really free? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkkoDscACgkQ2rw+dVvzZm1vzQCfV31r5HGSDdUWxOhn+dNwnnKl wGUAniNdkjlw7l5+Bt7jVaQtzug7r/sK =CWHv -----END PGP SIGNATURE-----