Try-Catch-FAIL

Failure is inevitable.

Announcing Esenterate – a clean .NET API for ESENT

clock September 22, 2009 09:24 by author Matt

I have blogged previously about my disdain for relational databases, and recently I’ve been looking harder and harder at ESENT.  In my two previous posts on the ESENT, I covered how you could use ESENT from .NET as well as why you might want to.  Because of the ugliness of the API, today you are probably better off using a different data store than ESENT.

Tomorrow may be a different story though (not THE tomorrow, I mean just some “tomorrow” in the future).  I just opened the Esenterate project on Google Code.  The purpose of Esenterate is to provide a clean, .NET-friendly API around ESENT that allows developers to focus on their application instead of persistence.  Eventually I plan for the API to support all major ESENT functionality, but the first release will target simple key/value storage.  No code has been committed yet, but I am working on designing the API.  I’ll post more snippets as I make progress and need feedback, but here’s one to get you started:

EsenterateFactory.AddClass<Widget>(w => w.ID,
    indexes =>
    {
        indexes.AddIndex(w => w.Name);
        indexes.AddIndex(w => w.Count);
    }
).CreateAt(@"path\to\esent");

The above snippet is used to create a new repository (you won’t have to call this code each time, only once to create the initial repository structure).  It simply says “Hey, make a repository that can store widgets, Widget.ID is the primary key, and I want to add indexes to Widget.Name and Widget.Count so that I can query across those columns quickly.” 

If you have any suggestions, please feel free to leave them here in the comments or on the project site at Google Code.

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


When to consider ESENT

clock September 14, 2009 03:42 by author Matt

Rob pointed out that my last post didn’t really make a strong case for using ESENT.  Why would someone want to jump through the extra hoops when they could just use NHibernate to store their data?  In it’s current form, I would have to agree that NHibernate is probably a better solution than using the very low-level ESENT API.  In its current form, there are two times when I would consider using it:

When I need blazing-fast access

Relational database operate in the millisecond range (if you are lucky).  That sounds fast, but compared to most operations in your application, it’s probably the slowest thing you are doing.  On the other hand, ESENT operates in the microsecond range, and it does so while still providing a lot of the core functionality you would expect from a data store: ACID compliance, transactions, etc.  When milliseconds are unacceptable, ESENT might be a good fit.

When I need to store large objects

Relational databases suck at storing blobs.  It doesn’t matter if they’re blobs of  binary data or blobs of text data, they aren’t good at it.  People have resorted to a variety of hacks, the most popular of which is to serialize the data to the file system and store only the path in the DB.  I don’t like this approach because you lose some of the benefits of a relational DB and now have to make two requests to load your data: one to the DB, and then one to the filesystem. 

ESENT is quite good at storing large objects.  Individual columns can have values of up to 2 GB, and it has been used in applications that have terabyte-sized datasets. 

Moving forward

For me, the major limiting factor in using ESENT for other applications is the API.  If the API was more abstracted, I would be much more likely to use it.  I’d like to see an API that supports a fluent or convention-based approach for creating the database, and I’d like to see operations generalized to the point that storing an object is nothing more than Put(id, object).  In what little free time I’ve had lately, I’ve been thinking about building such an API.  Features I’m considering are:

  • Fluent database specification
  • Implement the generic repository pattern
  • Basic LINQ support

As always, time is not something I have lots of, but I’m very interested in this topic right now, so hopefully that will motivate me to at least get something basic out there for use.  I’ve already done the most important step in any software project, and that’s picked a clever name: Jufresto, derived from the goal of this project, which is to Just Freakin’ Store it (it being anything you want to throw at it).

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Alternatives to Relational DBs - ESENT

clock September 11, 2009 05:12 by author Matt

As I have ranted about previously, I’m not a big fan of relational databases.  They have their uses, but they also have some very serious drawbacks, and I think they should be employed selectively.  There are lots of other ways you can handle persistence for your application.  This post kicks off what I hope will be a short series about some of the alternatives that I’ve used.  We’ll see how long it is before I get derailed into something more interesting.

Today, we’re going to look at ESENT, an embedded database that actually ships with every recent version of Windows (dating back, I believe, to Windows Server 2000, though ). ESENT is an extremely performant engine that is used by Microsoft Exchange and Active Directory (and probably other things, too).  Ayende is using it to power DivanDB, a .NET document-oriented database.  Since Ayende first blogged about it, there’s been a considerable rise in interest for ESENT.  There’s even talk of a LINQ to ESENT API, though I would agree that the underlying API needs a lot of work first.

ESENT has a fairly solid feature set:

  • ACID transactions with savepoints, lazy commits and robust crash recovery.
  • Snapshot isolation.
  • Record-level locking — multi-versioning provides non-blocking reads.
  • Highly concurrent database access.
  • Flexible meta-data (tens of thousands of columns, tables and indexes are possible).
  • Indexing support for integer, floating point, ASCII, Unicode and binary columns.
  • Sophisticated index types including conditional, tuple and multi-valued.
  • Individual columns can be up to 2GB in size. A database can be up to 16TB in size.
  • Can be configured for high performance or low resource usage.
  • No administration required (even the database cache size can adjust itself automatically).
  • No download. Your application uses the esent.dll which comes with the operating system.

One thing relational database lovers will notice is the lack of SQL support.  Indeed, you won’t be writing “SELECT * FROM…” with ESENT.  The API it exposes is much lower-level.  Another thing that will probably cause many developers to turn and run away screaming is the word “Jet”.  Yes, ESENT is indeed also known as JET Blue, but this is not the Jet that Microsoft Access uses.  Access actually uses a completely separate thing, JET Red.  The two are not related as far as code base goes. 

Using ESENT from .NET is very easy.  There’s a managed API available here.  Unfortunately, the API does resemble the underlying C API very closely, but that appears to be by design.  Still, the API is quite usable, just a bit verbose. 

Creating a database

Before you can use a database, you have to create it.  For this example, I’m going to be making a very simple document database that simply stores an XML-serialized Document object.  I’m just using ESENT as a key-value repository and not taking advantage of many of its other features.

using (Instance instance = new Instance("createdb"))
{
    instance.Init();

    using (Session session = new Session(instance))
    {
        JET_DBID dbid;

        //Overwrite the existing DB.
        Api.JetCreateDatabase(session, DATABASE_NAME, null, out dbid, CreateDatabaseGrbit.OverwriteExisting);

        using (Transaction transaction = new Transaction(session))
        {
            JET_TABLEID tableid;

            Api.JetCreateTable(session, dbid, TABLE, 16, 100, out tableid);

            JET_COLUMNID columnid;
            
            JET_COLUMNDEF idColumn = new JET_COLUMNDEF { coltyp = JET_coltyp.Binary, cp = JET_CP.None};
            JET_COLUMNDEF docColumn = new JET_COLUMNDEF {coltyp = JET_coltyp.LongText, cp = JET_CP.Unicode};

            Api.JetAddColumn(session, tableid, DOC_ID_COLUMN, idColumn, null, 0, out columnid);
            Api.JetAddColumn(session, tableid, DOC_BODY_COLUMN, docColumn, null, 0, out columnid);

            string keyDescriptor = string.Format("+{0}\0\0", DOC_ID_COLUMN);

            Api.JetCreateIndex(session, tableid, "DocumentID", CreateIndexGrbit.IndexPrimary, keyDescriptor,
                               keyDescriptor.Length, 100);

            Api.JetCloseTable(session, tableid);

            transaction.Commit(CommitTransactionGrbit.None);
        }

    }

}

Yeah, like I said, the API is very verbose.  You have your instance creation (the string parameter there can be anything you want to name your session), your session, the creation of the physical database file (note in this case that I’m overwriting the DB if it already exists), and then a transaction.  Inside the transaction, a single table is created with two columns.  A primary key is then created across the ID column. 

Inserting a document

Assuming our database exists, the next thing we will probably want to do is put some data into it.  (Side note: a subcontractor once said, and this is an exact quote, “So you put data in your database?” in response to me describing how we persisted objects in the project.  It was one of the funniest things I’ve ever heard anyone say.  Anyway…  )  This method assumes that you have an Instance object as a field that’s been initialized previously.  Instances are thread-safe, but sessions are not, so be sure to create a new session even if you are reusing an instance. 

protected override void InsertDocument(Document document)
{
    using (Session session = new Session(mInstance))
    {
        JET_DBID dbid;

        Api.JetAttachDatabase(session, DATABASE_NAME, AttachDatabaseGrbit.None);
        Api.JetOpenDatabase(session, DATABASE_NAME, null, out dbid, OpenDatabaseGrbit.None);

        using (Table table = new Table(session, dbid, TABLE, OpenTableGrbit.None))
        {
            var columnIds = Api.GetColumnDictionary(session, table);

            using (Transaction transaction = new Transaction(session))
            {
                using (Update update = new Update(session, table, JET_prep.Insert))
                {
                    Api.SetColumn(session, table, columnIds[DOC_ID_COLUMN], document.DocumentID);
                    Api.SetColumn(session, table, columnIds[DOC_BODY_COLUMN], document.ToXml(), Encoding.Unicode);

                    update.Save();
                }

                transaction.Commit(CommitTransactionGrbit.None);
            }
        }
    }
}

Again, we open a session, then we both attach and open the database.  After that, it’s safe to request access to the table, where we can create a transaction to perform the actual insert.  You need the underlying JET_COLUMNID objects when you are populating columns, which can be obtained using Api.GetColumnDictionary.  The basic pattern is still the same as you would use for a relational DB though: open a connection, begin a transaction, make a command (in this case, an insert), set some properties, save and commit.  Did I mention that the ESENT API was verbose?

Retrieving documents

You will find no SQL in ESENT.  In the unlikely event that you want your data back (<----- JOKE), you will have to use a lower-level API than you are probably used to.

protected override Document SelectDocument(Guid documentId)
{
    using (Session session = new Session(mInstance))
    {
        JET_DBID dbid;

        Api.JetAttachDatabase(session, DATABASE_NAME, AttachDatabaseGrbit.None);
        Api.JetOpenDatabase(session, DATABASE_NAME, null, out dbid, OpenDatabaseGrbit.None);

        using (Table table = new Table(session, dbid, TABLE, OpenTableGrbit.None))
        {
            using (Transaction transaction = new Transaction(session))
            {
                Api.JetSetCurrentIndex(session, table, null);

                Api.MakeKey(session, table, documentId, MakeKeyGrbit.NewKey);

                //NOTE: This will throw an exception if the key isn't in the index.  It might
                //be better to use JetTrySeek instead.
                Api.JetSeek(session, table, SeekGrbit.SeekEQ);

                string xml = Api.RetrieveColumnAsString(session, table, Api.GetColumnDictionary(session, table)[DOC_BODY_COLUMN]);

                return XmlSerialization.CreateInstance<Document>(xml);
            }
        }
    }
}

Again, we start by opening a session, attaching and opening the database, then requesting access to the table.  To do a look-up, we have to use an index, which is what Api.JetSetCurrentIndex does.  The last parameter is the name of the index; passing in null tells it to use the primary index.  Next, you have to make a key for the lookup, which I’m doing based on the document ID field.  After that, you have to seek to the record.  JetSeek will throw an exception if it can’t find a match, so use JetTrySeek if you aren’t positive the record is going to be present.  After jumping through all those hoops, you can finally get your data back using one of the Api.RetrieveColumnAs methods.  The final step there just uses a helper class I’ve made to go from the XML-serialized representation of my document back to an actual Document object.

Gotchas

I ran into a threading problem while I was prototyping with ESENT.  I decided to be clever, and I started trying to use the various OpenTableGrbit and OpenDatabaseGrbit enum members to tell the database what kind of access I needed (read-only, write, etc).  It turns out that I fail at being clever though, because apparently that causes it to go from record-level locking to table-level locking.  That’s not a problem if you have a single-threaded application, but it’s going to bite you hard if you have a few threads trying to work.  Instead, just play it safe and use the default settings (None) unless you know what you’re doing.

The End… FOR NOW

I really like ESENT.  I’m running some tests now to assess its feasibility on a project that we’re working on where we need a super-fast key-value storage engine.  Depending on how that goes, you may be hearing more about ESENT in the future.  In any case, I’d *love* to see a more generic API created around ESENT.  Maybe I’ll tackle that in my spare time.  Oh right, I don’t have any spare time. :(  Oh well, back to work…

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Relational databases – the hammer-to-a-screw of software development?

clock September 2, 2009 04:43 by author Matt

Relational databases make me sad.  They are slow, they pose a serious impedance mismatch when used with an object-oriented language, they are heavy, they have non-trivial hardware requirements, and they can cost (a lot) of money.  Did I mention that they’re SLOW?  Unfortunately, some 95%* of software applications use relational databases for persistence, which means 95%* of applications are dealing with the problems I just mentioned.  Research has shown that nearly 40% of effort on most software development projects goes towards persistence-related issues.** 

*yes, that statistic is made up, but I would be surprised if it wasn’t close to accurate.

**Not a made up statistic, see Keene’s “Data Services for Next-Generation SOAs” article in SOA Web Services Journal.

As developers, I think we’ve basically come to accept that persistence is a hard problem, and that the solution is the relational database.  But wait, we’re spending up to 40% of effort on persistence?!?  That doesn’t sound like we solved it very well.  It’s taken me many years, but I’ve come to realize that while persistence is indeed a solved problem, the answer isn’t always the relational database.  I dare say that in most cases, the relational database is the wrong solution. 

There is the old saying that you shouldn’t use a hammer to drive a screw is quite true.  You need to pick the right tool for the right job.  I wish I’d seen it sooner, but I’ve actually been doing it *wrong* for a number of years.  This is a bold statement, but using a relational database, in most applications, is akin to driving a screw with a hammerIt’s simply not the right tool for the job.

“YOU ARE DUMB, MATT, RELATIONAL DATABASES PROVIDE FEATURES A-Z!!!  YOU HAVE TO HAVE THEM?  WHAT AM I SUPPOSED TO DO, JUST WRITE TO A FILE?”

Relational databases do indeed have a lot of features, and I will discuss the pros of relational databases in a future article (maybe), but I bet you actually need far fewer of them in practice than you think.   And no, having your app write directly to the filesystem for its persistence needs is probably not the right solution (though it might be).  The thing is that relational databases are far from the only tools that you can use to solve the persistence problem.  You should also be looking at:

In future posts, I plan to go more in depth with some of the alternatives (particularly the solution we’re going with here at my day job).  Right now, we’re trying to decide between Berkeley DB-XML and MongoDB, two very powerful (and very fast) alternatives to our old approach, which involved mapping everything to a relational database.

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


The pleasure of ActiveRecord-based data access code

clock July 24, 2009 07:46 by author Matt

At the beginning of the year, I described the horrible experience I had while removing *TWO* columns from the database of a legacy system here at the day job.  For those too lazy to go back and read, I basically spent nearly an entire day working on this “trivial” task and ended up with a 15,563 line SVN diff spread across 40 files.

Yesterday I was tasked with making major changes to the object model we’re persisting.  The change consisted of *nine* new entities and one modified entity.  Compare this to the simple deletion of two columns/properties, and you might think that this single task would take me the rest of the year.  And you would probably be right, except something is different this time.  This time, I’m working on a project that uses Castle ActiveRecord for its data access.

Again, for those who are too lazy to read, what does that really mean compared to the hand-rolled DAL I described 6 months ago?  The hand-rolled DAL consisted of several layers: a generic (massive) interface, two huge, concrete implementations of that interface (one for unit testing and one that actually talked to SQL Server), stored procedures, and the actual tables.  Oh, and a web service layer on top, just for good measure.

On the ActiveRecord (AR) side, I have my AR objects, which inherit from generic AR base classes, and the database… and that’s it.  All CRUD and querying capabilities are inherited from the generic base classes.  LINQ is supported through the generic base classes.  Unit tests are seamlessly handled by AR via SQLite. I have a simple ad-hoc method that I can run with TestDriven.NET that generates the database schema from my object definitions, and the application itself currently has the capability to upgrade its own schema as the objects change (such functionality is dangerous, but this app is still in the rapid prototype/evaluation stages). 

Clearly, the AR approach sounds a lot simpler with no stored procedures, no massive DAL objects, and automatic schema maintenance, but how much time did it *really* save?

I did the math.  I added things up, and there were over 50 database additions as part of this update.  Looking at my old timesheets, it took me 8 hours to complete the removal of *two* columns.  So, it should have taken me about 200 hours to implement these changes. :D  How long did it really take? 

Less than a day.  That includes time spent designing the new objects on paper, creating the classes, and figuring out how to handle one tricky mapping that involved an interface and a one-to-one relationship (which cost me about 2 hours).  I don’t even want to think about how long this would have taken if I’d done it by hand.

I think ORMs have advanced from “cool tool” to “required knowledge” for software developers.  If you consider yourself a developer but don’t know how to use at least one ORM, now would be a great time to acquaint yourself with one. My (biased) opinion is that Castle ActiveRecord is a great place to start. There are also others checking out, such as NHibernate (which ActiveRecord is built on) and Nvigorate.  At this point, I can’t imagine I will ever build another data-driven application that doesn’t use an ORM.  The alternative is just too painful.

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


The pain and horror of hand-coded data access code

clock January 27, 2009 07:51 by author Matt

At my day job, we finally decided to cull out a couple of columns that were no longer needed in our database schema.  We could have left them in, but since they were in a table that typically ends up with several hundred million rows, we thought it *might* actually save us some space in the long run.  Besides, taking out a couple of columns should be easy, right?

WRONG

Let's see what all this change actually entailed.  Obviously we had some code that was still wrapping these properties in our object model, so I had to remove the properties.  Then I had to update their tests (no, this does not mean that Evil Rob was right).  Then I had to update the data access code, and that's when things took a turn for the hilariously painful. 

Our data access code is very, very old-school.  Our Data Access Layer (DAL) is a single, massive repository that provides access to *everything* in the database.  The interface for the DAL is 1,342 lines long!  THE INTERFACE!  When the decision was made five or six years ago to consolidate all the data access code, we failed to realize that the simple schema we started with was likely to balloon, and that every single module in the system was going to want slightly different views of things... fast forward to today, and yeah, our data access interface is 1,342 lines of code and comments.

So, I had to update the interface.  Do you know how well Visual Studio works on files that are 1,342 lines long?  NOT VERY WELL.  Unfortunately, it gets worse: we have *two* classes that implement the DAL interface.  One is a stub that is used in testing.  The stub is 4,153 lines long.  Yeah, it's a complicated stub.  It tries to mimic a lot of the things that the database does, that way developers working on modules in the system can't do things that the real DAL won't allow.  So, I had to update the stub.  Again, how well do you think Visual Studio works with a file of that size?  I'll give you a hint, there is about a 10 second delay between my fingers hitting the keyboard and a burst of text appearing on my screen. Sidenote: some of this may be due to Resharper, but since using Visual Studio without Resharper is like rubbing a cheese grater on your brain, I refuse to disable it.

Alright, so I update the stub DAL.  Then, I update it's test fixture, which is 4,107 lines.  /scream

That's out of the way, so now I'm ready to hit the real DAL, a 11,882 line monster.  Again, we did things old-school: nothing goes directly to the tables, everything goes through stored procedures.  Recall that I just removed two columns/properties.  Well, I have to change things in four places in the DAL!  I have to change the code (removing references to the deleted columns/properties, removing all the 100% obsolete methods, etc).  Then I have to change all the stored procedures, deleting those that are completely obsolete and altering those that just referenced the deleted columns.  Then I can actually delete the columns from the table.  THEN I have to update the unit tests (which was actually the easy part, thank God we write decent unit tests). 

Finally, it is done, and I'm ready to commit.

Oh, wait, I'm NOT finished, there's a third class that implements the DAL interface: a web service wrapper class!  See, web services were all the rage five or six years ago, and we had to provide a mechanism for people to hook in to our application and access its data, so we stuck a web service on top of it.  This seemed like a great idea when we had about 5 tables.  It doesn't seem like such a good idea now that we have about 30...

Fortunately, updating the web service was the easy part.  The end result?  My SVN commit is going to be a 15,563 line diff that changes 40 files (some classes were rendered obsolete by the removal of these properties; it's kinda complicated).  All of this so that I could take two columns out of the database. 

The moral of this story is simple: DO NOT HAND-CODE YOUR DATA ACCESS LAYER.  It is error-prone, it is hard to maintain, and it will come back to bite you.  Even if you think "meh, I only have four tables, I don't need an ORM or code generation or anything", DO NOT DO IT.  Use some sort of tool that will make your life easy when it comes time to make changes down the road.

I think I'll try repeating the same process on another project here that uses ActiveRecord, and post my experiences with that.  I suspect that it will be a very different story.

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Alternatives to Relational Storage, Part 1 - Introduction

clock January 26, 2009 03:07 by author Matt

As you might can tell from Friday's post, my recurring obsession with persistence has flared up again.  In all my years of development, I have still yet to find a solution that feels "perfect".  Perhaps the holy grail of storage is impossible to obtain.  Perhaps there is no perfect solution.  Maybe it's some Matrix-style philosophical issue, and the idea of a perfect persistence solution is a dream that us primitive developers refuse to accept.  In any case, I thought it was time to (once again) survey the landscape of solutions to see what's changed since I went on this persistence trip a year ago.  Seriously, there must be something about January, because I think I go on this tear every year about this time.  Anyway...

Storage for Developers 101

In the beginning, there were files.  And files were good, we still use the crap out of them today.  But they were lacking something (several somethings, in fact).  How do you read/write to files quickly?  How do you maintain some degree of integrity and consistency in the data?  How do you handle concurrency?  It turns out that these cross-cutting concerns were not application specific, they were universal requirements that needed to be addressed, and most developers just weren't very good at it.  Enter the database.

Early databases were not much more than flat files with some API abstractions built on top of them.  There was no SQL, no joins, nothing of that sort.  It wasn't until the 1970s that the method that would dominate our industry for the next 30 years (and counting) was introduced: the relational database.  By the 1980s, these things had really taken off, and we had commercial systems being rolled out by a variety of vendors.  The Structured Query Language (better known as SQL or "Sequel") became a standard of sorts and was supported by most vendors, and heavyweights IBM, Microsoft, and Oracle rose to power to dominate the scene.

While relational databases were on the rise, alternative systems were also being proposed.  Object-oriented databases began to gain some traction (though mostly in academia) due in large part to the paradigm shift towards object-oriented programming.  There was also this little thing called XML that showed up on the scene, and with it XML databases, that provided the flexible schema and hierarchical data support that relational databases are not very good at.  Recently, there has also been a shift towards document-oriented databases, which have an even less strict structure than XML databases. 

Despite all these advances, most of us in the .NET world still immediately fall back to relational databases as soon as we need to store something.  For many (myself included, for the most part), the alternatives are too untested, immature, and unproven to be considered.  Many "architects" even start building new applications from the database up, beginning with an ER diagram and only creating the object-oriented view once all the entities are well-defined.

Why the focus on storage?

If it often seems like a disproportionate amount of development time is spent on database access code, that's because it is.  Studies have shown that as much as 40% of total effort in a software project is spent on writing and maintaining data access code.  That's 40% of your effort that's not meeting requirements, that's not adding useful features, that's not adding value to the application.  "But Matt, the database drives everything, without it we have no system, so our data access code is actually adding more value than anything else!"  WRONG.  Do you really think that the customer cares whether you use NHibernate or ActiveRecord or SubSonic or MyHomeGrownDAL or whatever?  No, they don't, what they care about is whether the application meets their needs.  Sure, there are exceptions, and sometimes the underlying database itself actually fulfills business requirements, but often times it doesn't.  So why spend 40% of your time on something that isn't actually improving your application?  If you are using a relational database, the answer is because you have to.

The Impedance Mismatch

One of the reasons that data access code takes up so much of the development effort is because there is a gigantic mismatch between the view of relational databases and the object-oriented view that our application code takes.  This square-peg-in-a-round-hole scenario is referred to as the impedance mismatch, and it is the bane of modern development.  Translating between the relational world and the object world is expensive, painful, and time consuming.  The sheer number of tools that exist to help overcome this barrier is a testament to the scope of this problem.

The Alternatives

So if relational databases are such a pain, what's the alternative?  I'm still not convinced that there is one, but there are a couple of areas that I'm looking in to further.  The first is object-oriented database systems, such as db4Objects.  With an object-oriented database, the impedance mismatch disappears completely, you just send your object to the database as-is, and the database stores it.  There are downside to object-oriented databases though: aggregate queries, such as totaling sales by employee in a retail store database, are much, much more expensive than their relational counterparts.  They also don't have a widely adopted and understood query language like relational databases have with SQL. 

Another alternative is an XML database, like eXist.  XML databases can't natively store objects, but they can store hierarchical data easily, and many object-oriented languages (.NET included) have solid support for converting object graphs to and from XML representations.  An upside to XML databases is that there are actually somewhat widely-adopted languages for querying them, such as XQuery and XPath. 

Yet another alternative is a document-oriented database, such as the popular newcomer CouchDB.  Document-oriented storage is very (very) flexible, but it is also very immature.  I do not know of any enterprise-critical applications that are built on top of it (that doesn't mean they aren't out there, it just means I don't know about them).  There are no industry standards on how to query these things yet.

The Future

In future entries to this series, I'm going to look at as many of the alternatives to relational databases as I can.  If you have suggestions, please send them to me.  I will be using Castle ActiveRecord as the measuring-stick for the relational database in terms of ease-of-use (I have yet to see anything that matches both its flexibility and ease-of-use at the same time), but I am willing to look at alternative approaches if/when appropriate.  I'm going to focus on three main factors: feasibility (which includes performance, tool support, reliability, etc), ease-of-use (how quickly can I go from nothing to fully-working storage layer), and flexibility in terms of meeting other business requirements (such as reporting and business intelligence).  If anyone has suggestions on other criteria that should be considered, let me know.

Share or Bookmark this post…
  • del.icio.us
  • DotNetKicks
  • Digg
  • msdn Social
  • Reddit
  • StumbleUpon

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


About Matt

I am an overworked (and apparently overpaid) software developer with aspirations of acquiring a PhD in Computer Science. I started off coding in C over a decade ago.  Since then, I've migrated from C to C++ and branched out to C#, PHP, VB.NET, JavaScript, and worked with a wide assortment of other languages that I hope to never deal with again (I'm looking at you, COBOL). Oh, and yes, I've written some Java.  Does that make me a bad person?

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in  anyway.

© Copyright 2009

Sign in