Continuing our series of interviews and viewpoints from speakers at Hadoop Summit Europe, this interview is with Allen Wittenauer who will be speaking on “Let`s Talk Operations!” as part of the ‘Deployment & Operations’ track on Day 2, at 2:20pm. You can see the detailed schedule here.
HS: Tell us a little about your session.
It is going to be a bit of an experiment! At a lot of other conferences, there are sessions where someone with senior level experience in that field steps forward and says “Practitioners! We need to get in a room and talk about the common issues we have.” Many people have talked about Hadoop Summit having a similar session around operations. So this is the year it is going to happen. For this first one of hopefully many, I get the honor of handling the petri dish.
Like those other conferences with similar sessions, this one will not be recorded. (If you are not in the room, then you may just be stuck reading someone’s live posting!) This should give people who are shy a bit more confidence knowing that there will not be a permanent record of their question or answer. Someone once told me that “if you have the question, so do five other people.” Want to know why Hadoop has so many configurations? Or the history of the memory limit capabilities? Perhaps interested in good practices for working with distcp? How about secret commands like distch? This is the time to ask! There are a lot of these loose ends that do not really fit an organized talk.
HS: What made you want to talk about Hadoop operations?
I’ve been wanting to do this type of session for a while. For a long time, I did not think the community was quite ready yet. But two events happened that sparked my desire to do it this year.
First was the Q&A that happened after the presentation I gave at Hadoop Summit last year. We went over time with a room that was still standing room only! All the questions were great! It proved that people are engaged but struggling to get the most out of their systems. The community is ready to start talking about what their setups look like and the mutual questions we all share.
Second was a blog post by a company that I will refrain from naming. They did a lot of relatively low-level OS configuration work across hundreds of nodes to handle a particularly common problem… one that could have been solved with a single setting on one node coupled with a minor job configuration change to one job. I was shocked!
The good/better practices for the Hadoop space are still unknown and mysterious to a large portion of people, despite blogs, books, online forums, mailing lists, etc. We need to get that information out there! My hope is that sessions such as these will open the communication channels to lift everyone up.
HS: What sessions are you most interested in seeing?
I usually attend all the other operations talks that I can. It is always interesting to see the deployment problems and the follow-on solutions that teams have developed. It is important that we learn from each other and expand the state of the art.
My second priority is to attend the security talks. That topic is always fascinating. The security industry is just starting to really look at Hadoop so there is still a lot of fumbling while they find their way. It is always fun and challenging to see if the speaker truly thought of everything. Is the information being presented smoke and mirrors or is it transformational? Currently there is a lot of misunderstanding around some of the core components and usual expected behavior. As usual, the devil is in the details.
HS: Thanks! Good luck with your session, and we’ll see you in Amsterdam.
Allen Wittenauer has been involved with Apache Hadoop since May 2007, when he was hired by Yahoo! to bring large-scale operational experience to the fledgling project. His work there helped create the basic blueprints that almost all Hadoop deployments follow today. At LinkedIn, his experience provided key insight and a foundation to its award-winning data science team.