Comments on “Putting NoSQL In Its Place”


There is an argunent for Putting NoSQL In Its Place.

The example given for selecting NoSQL over a RDBMS is:

A document = A page. As simple as that. Instead of crawling through nested loops and bringing data together from a multitude of locations, logically storing a chunk of application data in its native format is appealing. Is it always the right way to do things? Of course not. But it’s extremely intuitive, easy to develop against, and can be configured for concurrency, portability, redundancy, and nearly anything else you might want depending on the NoSQL engine you’ve selected.

I think the author has confused presentation with the data model. What is seen on the screen is not the data model.

A data model is how the organisation sees the world. The data model includes rules about how the world behaves, and what is worth knowing about the world. The data model is about how facts are constructed in the world-view of the organisation.

And the relational data model is agile because it allows facts to combined in different and interesting ways. How do I do that with documents without teasing them apart and examining them?

The author’s conclusion is:

NoSQL is not THE answer, but it is absolutely AN answer. In fact, I am a strong advocate of using it together with applications that use or should use a relational model. For transactional use requiring ACID compliance (most notably atomic, sequential operations) data can be stored in the relational database. Metadata, session data, logging, metrics, and other non-transactional information that is not required for the actual end user experience can be easily stored in one of many forms offered by the various NoSQL engines.

I am aghast at this. The data in the database has to be accurate. You cannot be so caviliar to suggest any data in the database is not transactional. Either a logical unit of work was completed, or it was not. Otherwise, you are going to have bits of information hanging around until you run a data integrity check.

Non-transactional databases are not new. I worked on one back in the 1980’s called CA-1. This was the tape management catalogue used in IBM mainframes. It was a pionter based database. And without transactions, we have dangling pointers, and tapes that were allocated but not referenced.

So, if there was a failure (software or hardware), I would have to try to detect the corrupted pointer chains and manually update them after reviewing reams of reports.

DBAs of today are so spoilt with automatic transaction recovery.

Believe me, I am a firm believer in ACID because I know what life is like without it.

Advertisements

5 thoughts on “Comments on “Putting NoSQL In Its Place”

  1. Hello! I wanted to clarify something based on a point you made:

    “The data in the database has to be accurate. You cannot be so caviliar to suggest any data in the database is not transactional. Either a logical unit of work was completed, or it was not.”

    For data that is part of the linear workflow of an application I absolutely agree with you; the clicking of a “submit” button on a purchase, for instance, is 100% required to be confirmed as complete before moving on.

    But I would still say that “eventual consistency” is palatable for other asynchronous data storage: a “like” button clicked, a comment posted, a dump of the completed (and guaranteed safe) transaction data into a storage ground for analytics. I would never suggest throwing away guaranteed immediate consistency for an entire application or any required part of a transaction workflow.

    Thank you for the critique. You make some excellent points and I respect your data architecture experience! I will have to chew over the “presentation vs data model” bit as I was trying to give an example of what developers find appealing about key:value or document storage and this is a frequent argument for it in content-heavy applications.

    • Steve,

      Thank you for your reply.

      I would have to disagree with you about how palatable “eventual consistency” is. To me, it is like saying I do not care about the user. The user has taken the effort to click the “like” button, or write a comment. To not to acknowledge that gesture seems to me to be an insult to their dignity. It is as if I do not really care what they think.

      “Eventual consistency” seems to be about putting my needs first. It might make my operational life easier. But, I think, it sends the message to the user that they do not really matter.

      As for analytics, my experience with Google Analytics seems to show that they only care about the big players. For the big players, a few missing millions of hits does not really matter to them, as long as the trends are evident.

      For sites like my Wiki with its very low hit rate, missing hits can distort the picture about what is happening there. It is not really important to me that the statistics are accurate because I do not monetarise my site. But for other small sites, misleading statistics could lead to misplaced effort.

      Douglas

  2. Douglas,

    On a personal level and as a lover of data and user experience I agree wholeheartedly with you. No piece of data should go unreported; it is tantamount to data manipulation. You know what I think the schism is? Just as you said “a data model is how the organisation sees the world”, I think consistency has become a matter how how the organization sees the world as well.

    A data professional like you or me looks at the wide array of data and how it fits into our organization and adopts an almost militant “never leave a row behind” attitude about it. It is our sacred duty to keep the data safe, secure, and available. But organizations are more and more seeing people — and data — as a means to an end. If a few chunks of it are lost or take time to catch up along the way then so be it. In that regard I am as aghast as you are, yet I think at least for some types of data the move in that direction is unstoppable.

    I like your comparison of small and large organizations. To a large company that embarks on a multi-million or billion dollar big data initiative a few small false positives won’t result in much harm; however, to the small business trying to move into the market and reach critical goals it could be disastrous.

  3. A List of Cognitive Revolutions in my IT Career – Yet Another OCM

  4. Interesting discussion. I think the argument for eventual consistency or even data loss (sacrilege!) goes back to how critical the data really is and what is at stake if user data is lost. If I post a tweet and later find out it never made it to my timeline or if I leave a comment on Facebook and it is lost I’m not going to stop using either web site. Now, if my bank fails to record my paycheck or if a vendor fails to deliver a product because the purchase order was lost, those mistakes are much harder to forgive.

    Mega sites like Twitter and Facebook risk losing data over optimal performance. I think most people don’t care if a few items are lost every once in a while as long as they can use a web site that has excellent response time. People will favor a fast site that may lose data over a slow site that doesn’t. And from a company’s point of view, fast performance = money.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s