Why Startups Should Not Choose NoSQL

The NoSQL hype is omnipresent. And many startups are tempted to go for Cassandra/MongoDB/HBase/Redis/… . Here I’ll argue why they should rather stick to a SQL solution – MySQL or PostgreSQL.

In my previous post about Cassandra I detailed why I decided not to use it. Now, a dozen presentations watched and several dozen articles read later, I can detail why I think it is not generally a good idea.

NoSQL is great for “web-scale”. That is the mantra of NoSQL evangelists. But an important downside of NoSQL solutions, which is mentioned by most sources (twitter, facebook, rackspace) is that in NoSQL (at least for Cassandra and HBase) you must know what will be the questions that you will be asking upfront. You can’t just query anything you like. On the other hand the relational model allows you to define your model and then ask whatever question (query) comes to your mind. And I can bet that a startup does not yet know all the questions it is about to ask its data store.

Another thing is usability. All developers are familiar with SQL and relational model. And startups must get in the public fast. Why bother learning a new paradigm, a new platform, and new tools (if you are lucky to have tools)?

Now let’s get back to the web-scale. A startup does not need web-scale. Really. You are not getting a million users overnight. Twitter didn’t. Facebook didn’t. If things work out you can gradually upgrade your data model to meet the new demands. That’s how twitter and facebook did. They started with MySQL. Oh, by the way – twitter is still using MySQL for the most important thing – tweets. Now, if you have more data than them, you are.. facebook?

So to summarize – don’t sacrifice flexibility and ease of work for some fictional “trillions of petabytes”. If it happens that you need to handle huge amounts of data, it will be in a way that you will be able to restructure your data model. And at a point when you will know what questions you want to ask.

31 thoughts on “Why Startups Should Not Choose NoSQL”

  1. It defends the opposite. And I don’t agree. For example, for me, with a Java background, the SQL injection point is rather .. pointless. You never worry about SQL injections in Java (unless you are coding your first JDBC hello world).

    The fact that NoSQLs (CouchDB) don’t break, and don’t need more people to hire – well, I don’t think you need to hire people for MySQL as well?

  2. You talk about startups here, not the average middle-size company.
    Startups are different, they need to demarcate themselves from the others, and very quickly.
    If, and only if, NoSQL databases systems gives you a substantial advantage over the competition, then why not use it ?
    I think it is best described in this essay from Paul Graham : http://www.paulgraham.com/avg.html
    If NoSQL is your secret weapon, then use it. Otherwise you will do what other companies do, and you will soon disappear…

  3. No, the difference should not come from the technology used. The difference must come from the user experience and the perfectionism. I think it is not wise to use something because no one else is using it. You should rather evaluate which solution best fits your needs.

  4. Maybe I wasn’t clear enough :).
    Of course you have to use the technology that fits your needs. But if this technology is not “mainstream” and if that clearly gives you an advantage over the others, why not ?
    User experience is clearly not related to the technologies used underneath. A webapp is still a webapp, no matter what programming language, appserver and database is used to power it.
    A good user experience is related to the way you communicate with customers, the way you analyze their needs and you responds to the latter.
    It’s my personal opinion though.

  5. Yup, and my point was that NoSQL (generally) don’t give you an advantage over the others. It takes your time (learning curve) and limits your flexibility.

  6. Good post.

    There seems to be a forming consensus that you cannot get away from schema. When you use a “schema-less” database you simply move responsibility into code.

    And I like your point about scale. We all *wish* our idea was going to require Google’s data centers in containers, but there will be plenty of time to worry about that after you get something working.

  7. Thanks for the post, it adds another voice to counter the hype. I wrote an article for php|Architect in September that said approximately the same thing, that relational data modeling is driven by data, whereas nonrelational data modeling is driven by queries. Either you define your schema, or else you define your usage of data, or else you set yourself up for a lot of laborious database refactoring. There ain’t no such thing as a free lunch!

  8. It’s an interesting vision and I agree with you.

    I’m not going to use noSQL solutions because Facebook and Twitter use it. The argument about startups is very convincing.

  9. agree on most. startup wont get millions users overnight. however, nosql provides a flexibility that sql dbs could not. you would need to design the db structure and map them in java, wherease, nosql, you just do it in java.

    in recent usages of nosql(mongodb), i found out it’s very hard to accomplish some complicated queries. in addition, since it’s not relational, sometimes, extra requests need to be made. thank god most of these were happening in memory….

    computer science is about to balance the trade offs. so, understand what you are asking for, and i believe that’s not enough. you also need to understand what you are not asking for! 🙂

  10. I strongly disagree. I believe opposite you believe.

    – NoSQL is easier than SQL, for everybody, not only for developers.
    – If you are smart, you can do beautiful structures above the key-value basic structure. You can do nice queries, and also you can change your data model easier than in SQL model.
    – Scalability (as others qualities) is a thing that is good to think as soon as possible, because a migration is always painful and very expensive.

  11. really? easier? do you mean 100%?

    how do you do a group by and sort?

    let’s say i have a sql query:

    SELECT *, SUM(unitPrice * quantity) total
    FROM Order_details
    GROUP BY OrderID
    ORDER BY total DESC

    would you like to prove your point of “easier than sql”?

    again, nothing provides the full set of solutions. balance! 🙂

  12. You do group by and count via MapReduce. It’s nice, but not easy. I strongly support the balance point. And btw, one can use two data storages in one app, if it’s worth it

  13. It seems like you are saying “You shouldn’t use a technology can handle web scale because your piddly little startup doesn’t need it”, and that is silly.

    As a user of MongoDB, I will say that there is definitely a learning curve, but the speed of iterative development that comes along with not having a fixed schema is absolutely a positive, as long as you keep in mind the tradeoffs involved. Anyone who is smart and driven enough to launch a startup should be willing to take the time to evaluate whether a technology is good for them without some blanket statement about whether or not startups should use them.

    I’m not sure where the ‘ease-of-use’ argument re: SQL datastores comes from. It’s generally the most painful part, especially when it comes to complex deployments. With nosql, you have some level of flexibility as far as avoiding downtimes by iteratively adapting your schema.

  14. I’m far from the idea that someone should read this and drop nosql as an option. But he should be careful not to choose nosql for the wrong reasons.

    Btw, my way of working with mysql and hibernate is almost schemaless. My objects define the structure, and the schema is automatically created from them. sometimes a bit of cleanup is required, and extra caution is neeeded when deploying on production, but that’s all. I haven’t written a single ddl in my 3 latest projects.

  15. exactly! “as long as you keep in mind the tradeoffs involved”

    Bozho is right. two solutions can always coexist. one solves one set of problems and another solves the rest. if there is some other problems, bringing in the third solution.

    to me, being a software engineer is not much different from being a lawyer. never favor anything based on your own emotion, always be objective. whatever works the best for your client/case/project, it will be the solution.

  16. NoSQL moob does not like normalization, data integrity, modeling, structural updates and all that ‘SQL stuff’. So any excuse is welcomed to avoid it and jump straight into coding.

    And I am guy they hire two years latter to migrate that junk to SQL 🙂

  17. denormalization is definitely one of the best features of nosql if it’s not the best!

    if you don’t need relational stuff, nosql outperform sql solution for sure. for twitter/blog type of system, nosql is def a good solution.

    one thing i really like about mongodb(nosql) is that it’s running in memory(not all, but part), which works like memcached + persistence ability

  18. I have built a new web startup using ObjectDB, which is a pure OO DB – so not sure if it fits into the NoSQL sphere.

    Yes, it is a risk, but if you really want flexibility, then do away with the relational layer is one way to go, and embed the DB natively in the app. Recommended if you want to distribute your software as an enterprise app with little fuss.

  19. Riak with Riak Search answers the ad-hoc querying concern just as well as an RDBMS without forcing ever single record in a data set (table/bucket) to have identical shape.

    Schema is not entirely an app responsibility in Riak if you use XML or JSON since you can read/write what you know an leave the rest as incidental.

    Riak focuses on operational ease of use and flexibility for growth and contraction as demand requires if you need it. Consider Riak, try it and post your thoughts 🙂

  20. Seriously, the biggest evil to Java ever is Hibernate. (Which means we are bound to disagree about everything concerning data storage…) Object-relational mapping is the Vietnam of computer science. We all should know that. All projects using Hibernate (or whatever) are plain wrong.

    One of the biggest advantages of NoSQL databases is that they don’t require such mapping layers. ORM works well for a lot of cases, so if you are a startup in the field of enterprise system, sure, go with SQL and ORM. If you are a startup in the field of web and related, chances are that NoSQL databases (be it document-oriented or graph-oriented) will provide much better ways of modelling your domain than tabular-oriented SQL DBs. And _this_ is the biggest advantage of NoSQL, no that scalability stuff (that I agree almost noone really needs).

    So: being schema-less, NoSQL DBs actually let you evolve much faster than SQL, despite the fact that ad-hoc querying is quite poor. At least that is my experience.

  21. I suggest you investigate graph databases such as InfoGrid (disclosure: I’m involved), another branch of the NoSQL family.

    Much of your reasoning not only does not apply, but works in the opposite way. Perhaps the better title for your post would have been “Why startups should not choose the wrong NoSQL”.

  22. Recently I delivered a project where we had SLA running in milliseconds and we had pull data from more than 200 tables in 2 different locales, process it, enrich the content by applying some business logic and present to user, all in 40 ms or less (tops); We had close to a million hits on the cloud for set of services we were building (Assured as this was an old SOA cloud to be upgraded).

    We used oracle 11g relational DB and wrote something in the order of 60+ queries with a lot of joins (oracle 11g works on RAC so, I am guessing thats a lot of distributed joins); Ultimately to solve the time constraints, we had to batch cache everything. This pre-caching is expensive, very expensive, and forget data grid cache license costs, we are talking about 1 hr downtime to populate cache grid (i know other solutions fit this downtime problem and can solve this one).

    Looking back, if you ask me, dynamic systems like ourselves where possibility of data structure (backing data model) changing very frequently as business desires and the attributes added/updated/deleted to be reflected on site immediately requires some butt load of work. In real world, this calls for looking at CAP. I learned the lesson hard way. If you have a million hits or more, if you have more than a million members, and if you are doing a lot of writes, a lot of reads on massive data (horizontally as well as vertically) think again: use NoSQL if plausible, or if you cannot, plan for caching all content. Otherwise its going to be a nightmare.

    Anirudh
    HeuristicS2
    Innovate, Experience and Discover
    http://www.heuristics2.com

Leave a Reply

Your email address will not be published. Required fields are marked *