Continuing our ad hoc series of Hadoop Summit speaker interviews. This short interview is with Paul Groom, Chief Innovation Officer, Kognitio. You can register for Hadoop Summit here, and see the detailed schedule here.
HS: You’ve likened the current state of Hadoop adoption to the old “Wild West.” Why?
PG: It’s largely based on the pervasive awareness that we’re seeing in this space. We’ve moved well past the “cool!” point, from where Apache Hadoop was seen as the latest toy to be played with by gearheads to a key component in the next generation data architecture. Business users are now seeking to implement Hadoop as a part of their infrastructure for new insights and new applications that weren’t possible or practical before.
But that’s not necessarily a good thing, if they’re doing it only to be seen as running with the pack. It’s almost like 20 years ago, where everyone insisted they needed a website, but few could articulate the reason why they needed one.
So, we see a wide variety of localized approaches to implementation, with business users leading the way out of need and trying to create their own destiny – a new frontier.
HS: From Kognitio’s perspective, how should they be approaching it?
PG: There are a number of factors in play here, and when properly considered, they all converge.
For example, new data types and new applications are putting pressure on data warehouses as we know them, especially among smaller mid-tier organizations. They often struggle to adapt quickly enough to the three V’s of Big Data. By contrast, Hadoop says, “We’ll grab the data and sort it out later, at far less cost.” At the same time, even its proponents…and I’m one of them…acknowledge that interactive speed in a native Hadoop environment is not where it yet needs to be, especially in Big Data settings, where you’re dealing with large numbers of terabytes and ever more billions of rows of information coming from numerous sources.
In general, BI vendors must improve and deepen their integration with the Hadoop platform so that it plays well within the BI ecosystem. In principle, they connect, but not with the attributes and capabilities that business users expect, such as rapid access and freedom of access to drill on-demand.
Which is where we, at Kognitio, have been advocating true in-memory computing for advanced analytics. Merely taking a copy of data from an existing disk-based store and placing it into another disk-based store for analysis does not make sense. We advocate pulling the required data directly into RAM and keeping all of the queries and analytics within memory at all times. That gives BI users the raw performance they want, as well as helps serve the need for low latency and high frequency ad-hoc access.
The good news is that Kognitio is a full scale-out technology that can easily manage multiple terabytes of RAM, yet utilizes the same infrastructure Hadoop users require. (It runs best on a set of independent nodes alongside the Hadoop cluster, but can be installed on, and run on, RAM-intensive Hadoop nodes.)
From the beginning, we’ve taken the approach that data pinned into memory is not just a crude page cache. Taking that approach has overheads. We believe you have to lay the data out in RAM in structures that truly take advantage of its random access capabilities and modern CPU instructions. And by doing that, you can significantly cut access and processing times. In today’s Big Data world…in an environment where more and more companies are taking a serious look at Hadoop as a back end for their BI infrastructure…this is an imperative.
HS: You’re not suggesting, however, that companies are going to re-engineer their existing BI environments, are you?
PG: Not in the least. They’ve made significant investments in those infrastructures, and they still deliver significant benefits. Kognitio’s approach fits between the BI applications and Hadoop, providing all the well-known data interfaces to BI applications – they just change the connection. DBAs and power users can then create data model definitions that map straight onto the underlying data in the Hadoop data store.
Simple commands in Kognitio can then pull that data into RAM using full multi-threaded access, with the data remaining in RAM for as long as it’s needed. This allows users to define models as they need them, and obtain data from many supporting systems, including the data warehouse. This flexibility is perfect for the rapidly changing world of the data scientists. This is why we say that Kognitio is the perfect analytical accelerator capable of supporting complex SQL and NoSQL processing.
This can be of particular value when they’re using Hadoop. Unlike the less flexible data warehouse model, Hadoop data models are built on the concept of meeting the demand for data within minutes, using that data for a period of time and then discarding it. When you need it again, you take a copy of it again, moving it from a large, slow store to a much more nimble store, and build another model. I’m biased, but I think Kognitio meets that requirement to a “T.”
In that context, we provide what believe to be the perfect bridge between the traditional BI world and Hadoop stores.
HS: With speed and enhanced analytical capabilities being the end result?
PG: Precisely. The world of business intelligence is evolving from queries to complex analytics. It’s moving from computing traditional aggregates to forecasting. Put another way, it’s driving the car looking out the windshield to see where you’re headed, instead of looking in the rear-view mirror, seeing where you’ve been. This requires computational effort and more than just plain SQL, Kognitio has put a lot of recent engineering effort into making any other language fully MPP and using SQL as the data management language – the language of choice for all BI users.
Going back to the analogy that we mentioned off the top…keep in mind that once the railroads came in, they established a sense of order that numbered the days of the Wild West. In the same fashion, you can think of Hadoop as establishing a similar sense of forward-thinking order, which is democratizing BI, and making it available to far more people and firms.
We’ll be at the Hadoop Summit, where Kognitio is a Platinum Sponsor. Feel free to drop by our booth. We’ll be talking about the Wild West, but we’ll leave the six-shooters at home.
HS: Thanks! And best of luck with your session.
Paul Groom is a Big Data Advocate (he hates the word “Guru”) with 20+ years working with MPP Database Design, Business Intelligence, Data Warehousing, and what was called simply ‘VLDBs’ (Very Large Databases) since the time of Briton Lee, a foundational technology that became part of Teradata. A cartographer by education, he has worked for the British Government and has broken the Amazon Cloud (no major damage). Paul brings practical client experience from tens of thousands of hours building new analytical solutions with MPP and in-memory DBs plus ingenuity to his role as Chief Innovation Officer at Kognitio.