No SQL, No Class, No Thanks

“This is actually the NoSQL Conference” quipped one of the RailsConf 2010 presenters half way through his presentation, offering cutting commentary on the NoSQL fanboy aura of the week. This resonated with me since by Wednesday afternoon I was thoroughly sick of the nonsense, my daily dose of schadenfreude provided by his tale of MongoDB crapping itself unceremoniously and comprehensively upon ingesting too much data. “If you drink the KoolAid and believe the hype then you get what you deserve” he added.

Earlier that day I had sat through HashRocket’s “Million Dollar Mongo” presentation by Durran Jordan and Obie Fernandez. I was thoroughly unimpressed with their arguments in favor of MongoDB. It stunk of bad science and sleight of hand all the way through with a finale of utter juvenility.

Many of their arguments centered on the failings of MySQL. A memorable comment centered on how awesomely the system they built on MongoDB performed and how attrociously the predecessor MySQL-based system performed. It did not occur to them that comparing a poorly implemented solution built atop MySQL to a (possibly) well implemented solution atop MongoDB did not make for good science. Firstly, the folks building the original system may have had poor knowledge of how to use an RDBMS. Secondly, a rewrite of a system can benefit from the hard learned lessons of its predecessor, assuming it avoids the dreaded Second System Effect.

They went on to mention how their migration code that pulled from the existing MySQL system and placed data into MongoDB caused MySQL to saturate the CPU while MongoDB was all chill. The audience was supposed to be impressed that making a poorly designed relational database put together documents was CPU intensive while dumping files to a disk wasn’t. Oh, the insights…

Meanwhile, a common theme centered on how “schema free data makes migrations painless”. Anyone incapable of smelling the BS in this from a mile away has clearly been relieved of his olfactory senses. If one goes galavanting about in a schema free world, one of the following must be true…

  1. new code can’t operate on old data
  2. new code must be able to parse all pre-existing data formats
  3. old data must be migrated to mesh with new code

Of these, the first option would cause any business intelligence analyst to burst into flames, the second would cause massive bloat in the data access layer of the code base, and the third option sounds an awful lot like pretending not to have a schema when you clearly do. I noted that a guy a couple rows up from me was scribbling notes in his laptop, and one bullet was “painless migration? heh… what do you do with you production data?”, so apparently I wasn’t alone.

HashRocket also bemoaned how MySQL locks up the database for acceptably long periods of time when migrating the schema for a large system. This is indeed a known problem. That said, MySQL is not the end all and be all of RDBMSes. Others, such as Postgres, do not have this issue, so comparing MongoDB to MySQL with the intent of proving that document databases are better than relational databases comes across as a pile of logic fail.

I wanted to raise these issues and see how the HashRocket guys would respond, but then the talk took an unexpected turn as the presenters decided to air dirty laundry on stage. Apparently the client for the project under discussion had concerns about MongoDB and asked another firm to give a second opinion. This firm purportedly offered the opinion that HashRocket “had made the desicion to use MongoDB as an academic exercise”. I guffawed at this and then realized I was the only one in the audience laughing. In any case, the amusement of hearing that this other firm had voiced exactly the opinion I’d been holding for the whole presentation was unbearably ironic. For all the disdain coming from HashRocket about this, earlier in the talk Durran had quipped that he liked the idea of doing a MongoDB solution because it was “an opportunity to escape from the typical boring CRUD Rails app where you’ve got your gems and you’re just gluing things together” (paraphrasing as best I can).

HashRocket then went on to out the other firm, saying that they were represented at RailsConf, and naming them as Intridea. Not satisfied with this, they tried to make themselves out as having been horribly wronged by Intridea, saying that they were undermining the community spirit of the Ruby and Rails world. I wanted to barf, but they got a round of applause from the audience. By the end of this nonsense I was too disgusted to ask any technical questions so I just kept my mouth shut.

After the next session I ran into a friend in the hallway and as we walked out of the building to go to lunch I discussed the talk with him. In addition to the generally unprofessional behavior of the presenters, I raised my concern that the whole “painless migrations” thrust was utter nonsense. Riding down the escalator, I noticed that a guy behind us was leaning in close, seemingly to eavesdrop. I continued to talk, not really caring, and then this guy entered the discussion. It turned out he was HashRocket’s Tim Pope and he had voiced concerns similar to mine at the start of the project. I mentioned the three scenarios from above, one of which one must accept if living in a schema free world. “Aren’t you really just living with a schema and refusing to admit it with a document database?”, I asked. “Yeah, when you go to production that advantage pretty much evaporates.”, he responded. When I pressed him further on the issue all I got was “I don’t wish to offer further comment on this.”.

I’m frankly tired of the loads of terrible-at-SQL developers who hope this movement invalidates their weakness. Relational databases have their problems and limitations, but if you know when and how to use them they are pretty awesome. MongoDB itself readily admits that it is “less well suited” to “traditional business intelligence”, among other things.

Pushing and pulling documents may be easy and fit certain use cases just fine, but the general strategy of “embed everything” has onerous implications for many things. How do you update an embedded entity? You have to do it everywhere. How do you do analysis on embedded entities? You have to rip through all of the containing documents.

Let’s not forget that for many tasks computers are far more capable at optimization than application developers. Decades ago when relational databases were young and dinosaurs roamed the earth there was contention over whether the query language for RDBMSes ought be procedural or declarative. SQL, a declarative language, won out in the end. For the vast majority of cases, it is far better to specify the data you want and let a query planner optimize its retrieval. Yet all of the NoSQL zealots seem determined to abandon this knowledge with their document databases and procedural MapReduce jobs.

SQL isn’t going anywhere anytime soon. RDBMSes and document databases can reasonably co-exist in an architecture, possibly with the latter serving as a caching layer for the former as well as a home for very specific data. I’m cool with the idea of document databases, but I can’t wait for NoSQL to die.

— AWG

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s