Hadoop Crash Course

Internet of Things Crash Course Workshop at Hadoop Summit

April 14, 2:30pm - 5:30pm

Room: Ecocem, Level 2
Introduction: This workshop will provide a hands on introduction to the Hadoop stack powering the Internet of Things (IoT) using a Sandbox on students’ personal machines.
Format: A short introductory lecture about IoT components used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to IoT. In the lab, you will use the following IoT components: NiFi, Storm, Kafka, HDFS, Hive, HBase. You will learn how to consume streaming sensor data into HDFS, explore the data, apply real time processing to streaming data and then issue some SQL queries to analyze historical data.
Pre-requisites: Registrants will receive an email one week before the event on prerequisites and lab setup. You must bring a machine that can run the Hortonworks Sandbox.
Seating is limited for these sessions. Register now.

 

Hadoop Crash Course Workshop at Hadoop Summit

April 13, 11am - 1:30pm and April 14, 11am - 1:30pm

Room: Ecocem, Level 2
Introduction: This workshop will provide a hands on introduction to Hadoop using the HDP Sandbox on students’ personal machines.
Format: A short introductory lecture about Hadoop components used in the lab followed by a demo, lab exercises and a Q&A session.
Objective: To provide a quick and short hands-on introduction to Hadoop. This lab will use the following Hadoop components: HDFS, YARN, Pig, Hive, Spark, and Ambari User Views. You will learn how to move data into HDFS, explore the data, clean the data, issue SQL queries and then build a report with Zeppelin.
Pre-requisites: Registrants will receive an email one week before the event on prerequisites and lab setup. You must bring a machine that can run the Hortonworks Sandbox.
Seating is limited for these sessions. Register now.

Spark Crash Course Workshop at Hadoop Summit

April 13, 3pm - 6pm

Room: Ecocem, Level 2
Introduction: This workshop will provide a hands on introduction to Spark using the HDP Sandbox on students’ personal machines.
Format: A short introductory lecture about Spark components used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Spark. This lab will use the following Spark and Hadoop components: Spark, Spark SQL, HDFS, YARN, ORC, and Ambari User Views. You will learn how to move data into HDFS using Spark APIs, create Hive tables, explore the data with Spark and Spark SQL, transform the data and then issue some SQL queries.
Pre-requisites: Registrants will receive an email one week before the event on prerequisites and lab setup. You must bring a machine that can run the Hortonworks Sandbox.
Seating is limited for these sessions. Register now.

 

MeetUps

Group: BDOOP: Big Data Operations on Performance Barcelona
Title: Automating Big Data Benchmarking and Performance Analysis with open source tools
Registration Link: http://www.meetup.com/BDOOP-BigData-Operations-On-Perfomance-Barcelona/events/229695026/
Room: Ecocem, Level 2

Optimizing Big Data execution environments often requires extensive benchmarking and manually fine-tuning configurations parameters according to the underlying hardware and hours analyzing results. This workshop will give a hands-on experience on the different aspects to fully automate Big Data Benchmarking and Analysis of Hadoop and ecosystem applications using open source tools. To save tedious hours of manually processing data, doing it more efficiently, and at the same time to get the most value of Big Data infrastructures.

Tools and results for the workshop comes from the ALOJA open source project (http://aloja.bsc.es), an initiative of the Barcelona Supercomputing Center and Microsoft Research. ALOJA provides tools to automate the benchmarking-to-knowledge process, as well an online service to explore over 50k ready results featuring different applications, software configurations, data sizes, and more than 100 deployment options. Using a combination of slides and online demo, the talk will guide Big Data practitioners first over the benchmark repository, where users can quickly search for already performed benchmarks that resemble their infrastructures. Then on how to implement new benchmarks in the system or run custom jobs. The talk will end by briefly presenting the research Predictive Analytics features for modeling applications and predicting best deployment configurations to further automate the optimization process.

Speaker: Nicolas Poggi (@ni_po), is an IT researcher with focus on performance and scalability of Data intensive applications and infrastructures. He is currently leading a research project on upcoming architectures for Big Data at the Barcelona Supercomputing (BSC) and Microsoft Research joint center. Nicolas received his PhD in Distributed Systems and Computer Architecture at UPC/BarcelonaTech, where he is part of the HPC and of the Data Centric Computing research groups. He has also been a Research Scholar at IBM Watson, working in Big Data and system performance topics. Nicolas can usually be found speaking and organizing local IT meetup events.


Group: Future of Data: Dublin
Title: Hands-on Introduction to Spark & Zeppelin
Registration Link: http://www.meetup.com/futureofdata-dublin/events/229793869/
Room: Wicklow Hall 2A, Level 2

Join us for an intro and overview of Apache Spark, SparkSQL and Spark Streaming using Apache Zeppelin notebooks. We will cover Spark RDDs, Dataframes and Datasets. We will use Spark SQL to explore and visualize data in a Zeppelin notebook and we will write and run a simple Spark Streaming application. To participate in the Hands-on-Labs, you will need to bring your own laptop with Hortonworks HDP Sandbox pre-loaded.

Speaker: Robert Hryniewicz


Group: HUG Ireland
Title: Winning Against All Odds: Big Data for the Budget Travel Industry
Registration link: http://www.meetup.com/BUILD-Business-Networking-Group-Dublin/events/229766430/
Room: Wicklow Hall 2B, Level 2

Travel is one of the most competitive industries in terms of online advertising and digital marketing strategies. It is also a battlefield for giant players against which carving and defending a niche is beyond difficult. Can a small company gain and retain marketshare in this difficult landscape? The answer is yes - through fast iteration, data-driven decisions and the will to experiment in a pragmatic manner with the Hadoop ecosystem.

Speaker: Silviu Preoteasa, Head of Marketing Technology, Hostelworld.com


Group: Future of Data: London
Title: Data Flow using Apache Nifi
Registration Link: http://www.meetup.com/futureofdata-london/events/229827779/
Room: Liffey Hall 2, Level 1

Come learn and discuss the Apache NiFi project and how it works. Apache NiFi is an extensible data processing and integration framework. NiFi can construct highly structured data flows with connectors into many traditional and Hadoop-related technologies. https://nifi.apache.org/

Speakers: Bryan Bende & Simon Ball

 

Birds of a Feather Sessions

Hortonworks will sponsor several Birds of Feather (BoF) sessions, hosted by Hortonworks' architects, tech-leads, committers, and engineers. Come share your experiences, challenges, interests and requirements on key Apache projects and discuss what's on the roadmap and future design options. These sessions are not restricted to conference attendees; they're open to everyone.

Date: Thursday April 14, 2016
Time: 5:50pm - 7:00pm
Venue: Convention Centre Dublin


Topic: Spark & Data Science
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets. Come learn and discuss Spark and Data Science innovations and future directions.
Hosts: Owen O’Malley (Hadoop Committer), Vinay Shukla (Hortonworks Product Manager) and Robert Hryniewicz (Hortonworks Data Science Advocate)
Room: Liffey B, Level 1


Topic: Hive
Hive is the de facto standard for SQL queries in Hadoop. The next phase of the Stinger.next initiative, the Apache community has greatly improved Hive’s speed, scale and SQL semantics. Come learn and discuss Hive 2.0.
Hosts: Alan Gates (Hive Committer) and Carter Shanklin (Hortonworks Product Manager)
Room: Liffey A, Level 1


Topic: Cloud & Operations
Ambari is a completely open source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. Cloudbreak facilitates provisioning Hadoop in the cloud. Come learn and discuss the latest cloud & operations innovations and future directions.
Hosts: Tim Hall (Hortonworks Product Manager), Sanjay Radia (Hadoop Committer) and Janos Matyas (Cloudbreak Architect)
Room: Liffey Hall 2, Level 1


Topic: Streaming & Data Flow
Real-time data processing with NiFi, Kafka, Storm and Spark Streaming provides the foundation for IoAT. Come learn and discuss the latest streaming & data flow innovations and future directions.
Host: Bryan Bende (NiFi Committer)
Room: Wicklow Hall 2, Level 2


Topic: YARN BoF
YARN is the architectural center of Hadoop that allows multiple data processing engines to handle data stored in a single platform, unlocking an entirely new approach to analytics. Come learn and discuss the latest YARN innovations and future directions.
Host: Arun Murthy (Hadoop Committer)
Room: Wicklow Hall 2B, Level 2


Topic: HDFS BoF
HDFS is a distributed Java-based file system for storing large volumes of data. Come learn and discuss the latest HDFS innovations and future directions.
Hosts: Jitendra Pandey (Hadoop Committer)
Room: Wicklow Hall 1, Level 2


Topic: HBase BoF
HBase is the NoSQL store for Hadoop. Come learn and discuss HBase 2.0, Phoenix, Spark integration and more.
Host: Enis Soztutar (HBase Committer)
Room: Liffey Hall 1, Level 1


Topic: Security & Governance
Knox and Ranger provide Hadoop security while Atlas provides a Hadoop metadata store and enterprise compliance. Come learn and discuss security & governance innovations and future directions.
Hosts: Balaji Ganesan (Ranger Committer) and Andrew Ahn (Atlas Committer)
Room: Ecocem, Level 2


 

sponsor purchase
community partners