Relational databases make me sad. They are slow, they pose a serious impedance mismatch when used with an object-oriented language, they are heavy, they have non-trivial hardware requirements, and they can cost (a lot) of money. Did I mention that they’re SLOW? Unfortunately, some 95%* of software applications use relational databases for persistence, which means 95%* of applications are dealing with the problems I just mentioned. Research has shown that nearly 40% of effort on most software development projects goes towards persistence-related issues.**
*yes, that statistic is made up, but I would be surprised if it wasn’t close to accurate.
**Not a made up statistic, see Keene’s “Data Services for Next-Generation SOAs” article in SOA Web Services Journal.
As developers, I think we’ve basically come to accept that persistence is a hard problem, and that the solution is the relational database. But wait, we’re spending up to 40% of effort on persistence?!? That doesn’t sound like we solved it very well. It’s taken me many years, but I’ve come to realize that while persistence is indeed a solved problem, the answer isn’t always the relational database. I dare say that in most cases, the relational database is the wrong solution.
There is the old saying that you shouldn’t use a hammer to drive a screw is quite true. You need to pick the right tool for the right job. I wish I’d seen it sooner, but I’ve actually been doing it *wrong* for a number of years. This is a bold statement, but using a relational database, in most applications, is akin to driving a screw with a hammer. It’s simply not the right tool for the job.
“YOU ARE DUMB, MATT, RELATIONAL DATABASES PROVIDE FEATURES A-Z!!! YOU HAVE TO HAVE THEM? WHAT AM I SUPPOSED TO DO, JUST WRITE TO A FILE?”
Relational databases do indeed have a lot of features, and I will discuss the pros of relational databases in a future article (maybe), but I bet you actually need far fewer of them in practice than you think. And no, having your app write directly to the filesystem for its persistence needs is probably not the right solution (though it might be). The thing is that relational databases are far from the only tools that you can use to solve the persistence problem. You should also be looking at:
In future posts, I plan to go more in depth with some of the alternatives (particularly the solution we’re going with here at my day job). Right now, we’re trying to decide between Berkeley DB-XML and MongoDB, two very powerful (and very fast) alternatives to our old approach, which involved mapping everything to a relational database.
I think my line of work and your line of work are very different.
Almost every single application I’ve developed needs some sort of real-time statistical reporting. True, I could load up all the objects in memory, do summations that way, then show the results to the user, but in those cases – a relational DB is going to be much more performant.
Granted, persistence to a RDBMS is slower than say, serialization, but in my experience it’s never been a back-breaking speed barrier. Typically the main choke point is sending things over the wire, and with SaaS being so prevalent, that’s something we are (or will be soon) getting used to.
As you said, we’ve been dealing with RDMBS for a long time, and have pretty good solutions ready to go. We know the area. I don’t know that I believe we spend that much time dealing with persistence, at least if you use some of the popular ORM frameworks out today. I have no experience with any other non-RDBMS persistence packages, so I’m admittedly ignorant of how to handle several common day-to-day use cases.
So which will cost more – making sure our RDBMS isn’t a bottle neck for performance, or trying to learn a completely new way of handling persistence and include those features we’ve all come to rely on and take for granted?
The study was done in 2004, and ORMs have definitely seen an increase in use since then, so *hopefully* it’s no longer 40%, but I bet we’re still, as an industry, spending way too much time on persistence (I know I am). Things like Hibernate/NHibernate help alleviate the impedance mismatch, but they don’t remove it.
One of the reasons RDBMS systems are so widely used is their versatility. A good RDBMS can handle just about any persistence scenario, but its versatility is also a weakness: it doesn’t really do much of anything *well*. Storing large amounts of text or binary data? Supported, but performance will suffer. Want to search across fields? WHERE … LIKE technically "works", but again, it’s not good at it. Full-text indexing? Yep, supported on many RDBMSs, but again, most don’t do it well. Need to support thousands of operations per second with millions of rows? Yeah, you can do it, but be prepared to spend a lot of money on hardware, and be prepared to hire at least one DB administrator to keep things running and insure that you’re doing things the best way. RDBMSs are, like I said, a hammer. You can do a lot of things with them, but more specialized tools are often a much better choice.
One thing that relational DBs *are* good at is set-based operations. That’s their strength. So yeah, for applications that require heavy statistical computations, a relational DB may be necessary. But should all your data live there? Do you really need to run reports across text fields? Across binary data? All that extra junk is making your rows larger, which reduces the number that fits on a page, which hampers performance.
Also, the alternatives are getting better at projections and related functionality that’s needed for statistical reporting scenarios. XML databases have a standard querying language (XPath or XQuery), and most of the big ones can create indexes that you could use to support many reporting scenarios. Object-oriented DBs are improving here, too.