Continuing our series of interviews and viewpoints from speakers at Hadoop Summit Europe, this interview is with Adam Kawa, Data Engineer, Spotify who will be speaking on “Hadoop operations powered by … Hadoop” as part of the ‘Deployment & Operations’ track on Day 1, at 4:20pm. You can see the detailed schedule here.
HS: Tell us a little about your session.
Simply speaking, I will show a couple of examples of how to analyze various metrics, logs and files generated by Hadoop. Interestingly, because these metrics and files can be huge, we can also use Hadoop to process this data, what is kind of cool. Thanks to that, we can learn Hadoop better, avoid guesstimates and make data-driven decisions when doing configuration changes, expanding the cluster and optimizing our workload.
HS: What made you want to talk about using Hadoop to analyze Hadoop?
It is exciting for a number of reasons. First our all, Hadoop sends us many important signals. If we interpret them correctly, then we will be able to understand Hadoop’s behavior better. Yes, so very often Hadoop is right there in the room, it says something to us, it complains about something, but sometimes no one even acknowledges it!
Many companies realize that analyzing Hadoop’s behaviour is also a business problem. This brings multiple benefits. For example, we can save money by removing useless datasets, we can iterate and provide insights faster by processing data quicker, we can avoid downtimes that interrupt all data analysts, by making data-driven decisions when doing configuration changes. And I know that many people like analyzing Hadoop!
HS: What sessions are you most interested in seeing?
I am mostly interested in “Deployment and Operations” track, and especially I would like to attend “Capacity Planning in Multi-tenant Hadoop Deployments”, “7 Deadly Hadoop Misconfigurations” and “Let`s Talk Operations!” talks. Because there seem to be many great talks delivered at the same time, I will have a really hard nut to crack!
HS: Thanks! Good luck with your session, and we’ll see you in Amsterdam.
Adam Kawa works as Data Engineer at Spotify, where his main responsibility is to maintain one of the largest Hadoop-YARN clusters in Europe. Every so often, he implements and troubleshoots Python MapReduce, Hive and Pig jobs. Adam is a frequent speaker at Hadoop conferences and Hadoop User Groups meetups. He co-organizes Stockholm and Warsaw Hadoop User Groups. He regularly blogs about the Hadoop ecosystem at HakunaMapData.com.