Database performance benchmarks are useless

Can you name the fastest data analytics (databases, log analytics, etc.) product?

That’s a trick question. If you start Googling, you will inevitably stumble upon various companies claiming their product is the fastest. Sometimes you can even grab popcorn and enjoy two vendors fighting each other in the Internet courts defending their claim. But sometimes, marketing teams forget there are no winners in the Internet courts. Perhaps, they know it very well, and that’s the point? (to get more website hits?)

So, what should you do if you are looking for a fast analytics product?
That’s another trick question. The speed of a query should never be the only criterion for picking an analytics product. Just don’t go and choose a product because it is fast. Evaluate holistically-

  • How easy is it to get data in? Do I need to create tables and manage schema? Or is the product schema-less?
  • What does the query experience look like? Does it provide a SQL interface? What if I need to query nested JSON? What about JSON arrays?
  • What does it take to operationalize it?
    • What’s the backup and restore experience like?
    • What metrics does it expose that you can use for optimizing queries/schema?
  • What’s the RBAC story?

Those are just a few things you ask about product selection.
The performance benchmarks are useless for another vital reason- they don’t reflect performance on your datasets Although there are so many standardized datasets available- from NYC taxi to TPC etc I can’t recommend enough that when you benchmark, use your own dataset. Much like every living being has its unique personality- datasets are like that too- they have their own personality. They grow (or shrink) at a different pace. Cardinality is different for every dataset. They age differently. Some people are corrupt, and much like that, sometimes datasets get corrupt.

And yes, this applies to my start-up Dassana too. You should never pick Dassana cloud logging solution because it is fast and cheap, there are other reasons. And there are reasons for not using it too. And that’s something I will blog about next time.