Notes on #ACM Webinar by Michael Stonebraker on Fast Data


Notes on Michael Stonebraker’s webinar for the ACM: The Fast Data Challenge and Pickiing the Right Database: Why One Size Doesn’t Fit All.

The Expert Guide to Fast Data

Characteristics of Fast Data

High message rate:

  1. 1,000 transactions per second – use RDBMS
  2. 100,000 transactions per second – interesting

Transactions = messages

According to the TPC, these are the top results (as of 2015/05/02):

Benchmark Top Result Database Manager Performance TPS Report Date
TPC-C SPARC T5-8 Server Oracle 11g Release 2 Enterprise Edition with Oracle Partitioning 8,552,523 tpmC 142,542 2013/03/26
TPC-E System x3950 X6 Microsoft SQL Server 2014 Enterprise Edition 9,145.01 9,145 2014/11/25
TPC-H Dell PowerEdge R720xd using EXASolution 5.0 EXASOL EXASolution 5.0 11,612,395 QphH@100000GB 3,225 2014/09/23
TPC-VMS HP Proliant DL380 Gen8 Intel® Xeon® E5-2697v2 C/S with 1 Proliant DL360 G7 Microsoft SQL Server 2014 Enterprise Edition 718.12 VMStpsE 718 2014/04/14

A rate of 100,000 transactions per second (tps) is the same as 6,000,000 tpm (which the Oracle test easily surpassed in 2013). That configuration was not clustered. Thus, there was no protection against a server failure.

Requirements for Fast Data Applications

  • Scale-out is preferential over scale-up.
  • High level language like SQL with windowing aggregates.
  • High availability like Tandem.
  • Zero-data loss.
  • Data consistency — requires ACID transactions.
  • Eventual consistency does not work!

Non-Solutions for Fast Data

RDBMS are not a solution because there is 90% overhead because of:

  • Buffer pool overhead
  • Locking overhead
  • Write-ahead log overhead
  • Threading overhead

Disk-based systems are slow.

NoSQL are not a solution because:

  • Low level language
  • No ACID
  • Buffer pool and threading overhead still present
  • Low performance and low function

Solutions for fast-data

  • High performance main memory SQL-ACID DBMS (VoltDB, hekaton, Hana, …)
  • Complex event processing engine (CEP) (Storm, Streambase, …)

I wonder how the in-memory database option for Oracle RDBMS 12c would stack up under these conditions for fast data.

Characterisation

CEP natural for “little state, big patterns”

Main memory SQL-ACID DBMS natural for “big state, little patterns”

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s