Gordon Myers

Articles on Life, Truth, Love, Computers, and Music


MongoDB or: How I Learned to Stop Worrying and Love the Database

I come from a SQL background. For seven years, the primary database I worked with was Microsoft SQL Server. I did a little bit of work in MySQL occassionally as well, but in either case, I definitely came from the school of thought that SQL is the only way to store data. This is not without good reason. SQL is powerful. SQL is well-maintained and well-supported. SQL is stable and guarantees data integrity. And SQL is fast, when you know how to use it.

Since transitioning to a new job, though, I found myself flung into a strange new world: the world of NoSQL databases (aka document-oriented databases). In particular, a database called MongoDB. More on that in a second.

When it comes to programming, I consider myself a person who is more interested in solving problems than strict academic purity. With that said, I do tend to be obnoxiously meticulous with my code, but I would much rather throw out the super-careful attention to detail and just get something done, and subsequently tidy up/refactor the code, if it really comes down to it. And I think it is this desire to get something done that has really increased my appreciation for Mongo. Although it is not without its pitfalls, and it is definitely not the right choice in a lot of situations. However, I've been discovering that it is the right choice in a few situations where SQL is not.

My initial reaction to using Mongo was one of disgust. Reading through the design patterns, it seemed like it encouraged huge duplication of data, restricted or impaired  your access to the data, and didn't handle date-math very well at all. Unfortunately, all three of those initial impressions are true. But that doesn't mean you shouldn't use it. I won't get into the document-style approach to data because you can already find plenty of material on that yourself. It does involve duplicating data, which would otherwise just get joined in SQL. And that leads into the next point: there are no JOINs in Mongo. They do not exist; you can only query one table at a time. This is what I meant by restricting your access to data. But actually, as I've been learning, this really won't slow you down if you understand Big O notation and how to write Mongo queries effectively. And lastly, all dates in Mongo are stored in UTC time, no matter what. Mongo offers a way to group by dates, but no good way to transform that data into different timezones before grouping it. Which means your data is probably off by 5 hours.

With those downsides out of the way, let me explain why it's actually pretty cool. One of the applications I worked on at my last job was something called "Form Builder." It was actually a pretty slick application that let you create "forms." Forms could have multiple pages, and each page could have multiple questions. In other words, think Wufoo, but without all the pretty graphics. I created a few tables in SQL to build this: one for the forms themselves, another for the pages contained within those forms, a third for the questions on each page, and then one for receiving answers from users. Of course there are more tables involved than just these four, but you get the basic idea.

The "questions" table contained, among other things, the text of the question itself, an ENUM question type (represented in data as a TINYINT), and a number of different fields such as the maximum number of characters a user might enter. In the end, we came up with probably about 20 different question types, ranging from text input, to multiple choice as a drop down menu, multiple choice as radio buttons, multi-line text, asking the user to upload an image, asking the user to upload a file, a preformatted block of address fields, an email field... the list goes on. The point is, with all these disparate types of questions, each one is going to be configured slightly differently. It makes sense to limit the number of characters for text inputs. It doesn't make sense to limit the characters when it's a dropdown menu, however. It makes sense to configure an allowed list of file extensions when you're asking the user to upload a file, but not when they don't. It makes sense to configure a list of multiple choice options when you're asking a multiple choice question, but not otherwise.

What this means, when designing a database table in SQL, is that you really have two choices. Either you can have a bunch of columns that are sometimes relevant, depending on the particular row, or you can sort of abuse SQL by having some "general purpose" columns that get re-used for different purposes. These types of columns are usually either VARCHAR(8000) or TEXT, and they're considered bad practice by DBAs because you are essentially forfeiting the power of SQL by storing it in a different way. I ended up doing a mix of the two.

Here's a snapshot of the questions table:

As you can see, there's a character limit column on every field, even though not every field actually enforces a character limit. But then there's also this column called extra_info which looks like it contains a bunch of gobbledygook. That's really just a serialized PHP array, which is kind of like JSON, but not as elegant. (It predates JSON.) It's a way of storing any amount of arbitrary information in a quickly-usable-in-code format. Within SQL, storing data that way is a black box. There is no way to search it, but you store it that way because you don't need to search for it; you only need to access it. Lots of people do this, but it is considered an abuse of the database, because you're no longer really using the database.

What I didn't realize at the time, was that there was a different way of thinking about this problem entirely. This type of project was an absolutely perfect candidate for using Mongo, and I'll tell you why.

Mongo has different nomenclature than SQL. "Collections" are the equivalent of tables, "documents" are the equivalent of rows, and "fields" are the equivalent of columns. Every document in a collection is simply a JSON object. And a collection does not have any defined structure to it whatsoever. You read that correctly. It has no structure. There are no column definitions. One row in the table may have an entirely different set of columns than the next row. They're just arbitrary documents. It's all up to you. If you want one row to have one set of columns, fine. The documents are whatever you tell you them to be. They don't really care about each other.

With that said, you will end up having most of the same columns present on every document within a given collection, because otherwise there would be no point. But in Mongo, I could represent the same "questions" table above like this:

So you see here, one row doesn't enforce a character limit, so the field isn't present at all. Another row has an options array, whereas the rest don't have that column present in the first place. The difference between this and SQL is huge. In SQL, you can accomplish the same thing as I described above, but you cut off your own access to the data in the process. Mongo on the other hand is designed for this.

In Mongo, I could run a query like this, to find all the questions that have a character limit:

db.Questions.find({char_limit: {$exists: true}})

Or I could run a query like this, which will locate any questions that have an options array present, with one of those options being the word Red:

db.Questions.find({options: "Red"})

Mongo queries themselves are an off-shoot of JavaScript, which means you can use regular expressions too, which is awesome. (Although those are better suited for ad-hoc queries than actual production ones.)

From one SQL programmer to another, I would recommend giving Mongo a try. It has its annoyances, believe me. But after using it, you start to realize that SQL does too, and the realization that SQL isn't the only approach to anything and everything can be eye-opening. Having more tools in the toolbox helps make you a better programmer. Understanding when to use them, and when not to use them, is key as well. But I have to say, it's worth giving it a shot. I think you'll find there are some applications where it actually is better suited.

And by the way, if you're going to use MongoDB, you are going to need a client to access it as well. The one I've been using, the one that is pictured in the screenshot above, and the one that is hands down the best out there, is called Robomongo. The screenshot shows one of three possible ways of viewing the data (in tabular format). But it also lets you view documents in a hierarchical list, or as straight JSON text.


Post a Comment

Name:
Email:
(Will not be published)
Website:
(Optional)
Comment: