HS: Tell us a little about your session.
SK: In my session I will present Genie, which is Netflix’s Hadoop Platform as a Service in the cloud. Genie abstracts away the physical details of various (potentially transient) Hadoop resources in the cloud, and provides REST-ful APIs to submit and monitor Hadoop, Hive and Pig jobs without having to install any Hadoop clients. It is being used in production at Netflix for processing 100s of terabytes of data everyday, running thousands of ETL and hundreds of ad-hoc analytics jobs.
HS: What made you to want to talk about Genie?
SK: It is exciting for a number of reasons. From an engineering perspective, it is unique in the sense that there are no other open source alternatives that have been proven to work at such scale. From a users’ perspective, it is cool as it is simple – users simply submit Hive or Pig scripts (or Hadoop jars) to Genie, and don’t worry about where or how the job is run. And finally, it is exciting because we are planning on open sourcing Genie soon – so hopefully it will be of use to the whole community.
HS: What other sessions are most exciting to you?
SK: Apart from the other Netflix session (“Watching Pigs Fly with the Netflix Hadoop Toolkit”), there are many other sessions of interest. I am always interested in the Applications & Data Science track to see how people are using such tools in real life. And with YARN, Stinger, Impala and the likes, the landscape of Hadoop is changing quite rapidly – hence, I am also very interested in the Future of Apache Hadoop track.
HS: What has changed in the world of Hadoop compared to last year?
SK: A lot seems to be changing in the Hadoop world. There is a big push towards real-time or near real-time performance, with initiatives such as Stinger and Impala. I am looking forward to hearing about the latest, greatest and fastest at the Hadoop Summit.
HS: Thanks! And best of luck with your session.
Sriram Krishnan is a Senior Software Engineer at Netflix in the Data Science & Engineering Platform team, working on the next-generation service-oriented ETL and big data analytics infrastructure in the cloud. Prior to Netflix, he was a Senior Distributed Systems Researcher and Group Leader at the San Diego Supercomputer Center. He holds a Ph.D. in Computer Science from Indiana University, where he focused on grid computing, and distributed component and web service technologies.