I’ve been thinking a lot about persistence and storage lately, and I think I’ve finally come to the conclusion that relational databases are almost universally being used incorrectly, and that many times an alternative persistence medium is actually a much better solution. I think relational databases are good for holding data that needs to be aggregated across (for reporting or other types of heavy analysis), not for things that are inherently object-oriented tasks. We spend so much time these days trying to work around the object-relational impedance mismatch when we could instead remove the mismatch altogether.
What’s happened? Has my brain been negatively impacted by my return to academia? Am I missing some magic explanation for why we developers seem to immediately assume that there’s going to be a SQL database when we create a new application? Why aren’t we using things like db4Objects and CouchDB instead? I’ve used db4Objects and found the performance to be nothing short of spectacular when compared to ORM solutions, and I’ve heard good things about CouchDB. Are we defaulting to relational databases because that’s just "the way things are done"?
I guess maybe it will help if you understand why I’m thinking on persistence. At my day job, our primary product is basically a combination of data warehouse, search engine, deep-web crawler, and data mining toolset. The primary underlying storage mechanism is a SQL Server database. This works pretty well for the reporting types of functions, but not so well for the information retrieval types of functions, such as full-text querying, document viewing, etc; those operations tend to be comparatively slow. And even the reporting performance is pretty slow because the schema is trying to support our object-oriented view of things instead of the data-oriented view that’s really needed for reports. So, I’m going back to the drawing board and trying to come up with a better persistence strategy. I’m thinking Lucene for indexing. I’m thinking db4Objects or something similar for the actual document object storage. I’m thinking SQL Server for storing only the aggregated information collected by processing the documents, no more of this object-relational mismatch garbage.
My question is, am I thinking clearly?