Why I think Big Data IS “the new style of IT”

Actually I’m playing with you here as the title of this post is just a rip-off of Luigi Tiano‘s post: Why I think Big Data is NOT “the new style of IT”. In essence Luigi tell us that, although the commercial value of “collecting, analysing and mining data” the Big Data vendors are selling us, is without a doubt really important, from the whole of IT it’s merely a marginal use case. Not everything depends on Big Data analytics.

It’s not the use case, stupid

Laughing at myself now as my previous blogpost literally said “it’s the use case, stupid“. This time I want to talk about how technology does blend through different use cases. My baseline here is that the technology we had to develop just to be able to do Big Data crunching helps in all other aspects of IT.

Delorean wallpaper

Relational DB-storage

If we wouldn’t have hit the boundaries of structured databases we wouldn’t really have needed No-SQL or Hadoop Clusters. SQL as a language and RDBMS in a whole have been the basic of storing and using data for ages now. But for these very large use cases it wasn’t sufficient. Did you know for example that Nutanix uses Cassandra, a scaleout DB that scales to hundreds of TBs and cluster nodes, as a backend for their data protection model? This wasn’t really what it was designed for but it makes one hell of e new use-case.

Physical Storage

There is a second type of Big Data which is the unstructured big files type. It has nothing to do with the analytics type but it had the same type of significant development needs. In this type we are storing millions of big files like movies, pictures, MRI scans, … that can add up easily to petabytes of storage in a single namespace. Sometimes even with a need of too many nines (99,99999…), physically impossible of making at that point.

For those use cases we have seen that for example storing this data in RAID protection models was not going to make it. People started building Object Storage with Erasure codings instead of RAID. And although Erasure is far more interesting from a footprint point of view than RAID, it’s not up to speed yet performance wise. But the development doesn’t stand still!

The rise of the Flash

Five years ago deduplication was only possible post-process because it took too much processing power. So the result was that we could only use it for backup targets and saving footprint. Today the evolution of flash and sometimes with the use of an FPGA (ie SimpliVity) we can do inline deduplication on production workloads so we are not only saving on footprint but on bandwidth as well as we can use deduplicated data for active replication.

Summary

To come back to Luigi’s post; he is absolutely right that the use case of Big Data analytics is just a corner case of IT. But … the technology we develop today to solve corner cases are the foundation of tomorrow’s IT solutions!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.