Training

HDP Analyst: Data Science

Overview: Learn Data Science techniques and best practices leveraging the Hadoop ecosystem and tools.

Duration: Two Days - Sunday, June 26 - Monday, June 27

Objectives:

  • Recognize use cases for data science
  • Describe the architecture of Hadoop and YARN
  • Explain the differences between supervised and unsupervised learning
  • List the six machine learning tasks
  • Recognize use cases for clustering, outlier detection, affinity analysis, classification, regression, and recommendation
  • Use Mahout to run a machine learning algorithm on Hadoop 
  • Write Pig scripts to transform data on Hadoop
  • Use Pig to prepare data for a machine learning algorithm
  • Write a Python script 
  • Use NumPy to analyze big data
  • Use the data structure classes in the pandas library
  • Write a Python script that invokes a SciPy machine learning algorithm
  • Explain the options for running Python code on a Hadoop cluster
  • Write a Pig User Defined Function in Python
  • Use Pig streaming on Hadoop with a Python script
  • Write a Python script that invokes a scikit-­-learn machine learning algorithm
  • Use the k-­-nearest neighbor algorithm to predict values based on a data set
  • Run the k-­-means clustering algorithm on a distributed data set on Hadoop
  • Describe use cases for Natural Language Processing (NLP)
  • Run an NLP algorithm on a Hadoop cluster
  • Run machine learning algorithms on Hadoop using Spark MLlib

Labs:

  • Describe the architecture of Hadoop and YARN
  • Explain the differences between supervised and unsupervised learning
  • Recognize use cases for clustering, outlier detection, affinity analysis, classification, regression, and recommendation
  • Write Pig scripts to transform data on Hadoop
  • Use Pig to prepare data for a machine learning algorithm
  • Write a Python script using NumPy, Scipy, Matplotlib, Pandas, and Scikit-learn to analyze big data
  • Exercise the options for running Python code on a Hadoop cluster
  • Write a Pig User Defined Function in Python
  • Use Pig streaming on Hadoop with a Python scriptRun a Hadoop Streaming job
  • Understand some key tasks in Natural Language Processing (NLP)
  • Run an NLP algorithms on IPython
  • Run machine learning algorithms on Hadoop using Spark MLlib

Target Audience: Developers and Analysts who would like to learn more about developing data produces using Hadoop tools such as Pig and Spark and how to use common Data Science tools like python their Hadoop system.

Pre-requisites: No previous Hadoop or programming knowledge is required. It is helpful to have some college level mathematics (such as linear algebra and statistics). Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs. Students are required to bring their own laptop


HDP Developer: Apache Pig and Hive

Overview:  This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive.  Introductory SPARK content will also be presented.

Duration:  Two Days - Sunday, June 26 - Monday, June 27

Objectives:

  • Describe Hadoop ecosystem tools and frameworks
  • Describe the HDFS and YARN architectures
  • Use the Hadoop client to input data into HDFS
  • Transfer data between Hadoop and a relational database
  • Use Pig to explore and transform data in HDFS
  • Understand how Hive tables are defined and implemented
  • Use Hive to explore and analyze data sets
  • Explain and use the various Hive file formats
  • Use Hive to run SQL-like queries to perform data analysis
  • Explain the uses and purpose of HCatalog
  • Present the Spark ecosystem and high-level architecture

Labs:

  • Use HDFS commands to add/remove files and folders
  • Explore, transform, split and join datasets using Pig
  • Use Pig to transform and export a dataset for use with Hive
  • Use HCatLoader and HCatStorer
  • Perform a join of two datasets with Hive
  • Use advanced Hive features: windowing, views, ORC files
  • Use Hive analytics functions
  • Use Spark Core to read files and perform data analysis
  • Create and join DataFrames with Spark SQL

Target Audience: Software developers who need to understand and develop applications for Hadoop

Pre-requisites:  No previous Hadoop knowledge is required, though will be useful.  Students should be familiar with programming principles and have experience in software development.  SQL knowledge is also helpful.  Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs. Students are required to bring their own laptop.


HDP Operations: Hadoop Administration

Overview: This course is designed for administrators who will be managing the Hortonworks Data Platform (HDP) 2.3 with Ambari. It covers installation, configuration, and other typical cluster maintenance tasks.

Duration: Two Days - Sunday, June 26 - Monday, June 27

Objectives:

  • Install HDP
  • Add, Remove, Replace Cluster Nodes
  • Configure Rack Awareness
  • Configure High Availability NameNode and YARN Resource Manager
  • Manage Hadoop Services
  • Manage HDFS Storage
  • Manage YARN
  • Configure Capacity Scheduler
  • Monitor Cluster

Labs:

  • Install HDP
  • Managing Ambari User and Groups
  • Manage Hadoop Services
  • Using Hadoop Storage
  • Managing Hadoop Storage
  • Managing YARN Service using Ambari Web UI
  • Managing YARN Service using CLI
  • Setting UP for Capacity Scheduler
  • Managing YARN Containers and Queues
  • Managing YARN ACLs and User Limits
  • Adding, Decommissioning and Recommissioning Worker Nodes
  • Configuring Rack Awareness
  • Configuring NameNode HA
  • Configuring ResourceManger HA

Target Audience: IT administrators and operators responsible for installing, configuring and supporting an HDP 2.3 deployment in a Linux environment using Ambari.

Pre-requisites: No previous Hadoop knowledge is required, though will be useful. Attendees should be familiar with data center operations and Linux system administration. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs. Students are required to bring their own laptop.


HDP Developer: Apache Spark using Python

Overview: This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Spark. The focus will be on utilizing the Spark API from PYTHON.

Duration: Two Days - Sunday, June 26 - Monday, June 27

Objectives:

  • Describe Spark and Spark specific use cases
  • Explain the differences between Spark and MapReduce
  • Explore data interactively through the spark shell utility
  • Explain the RDD concept
  • Use the PYTHON Spark APIs
  • Create all types of RDDs: Pair, Double, and Generic
  • Use RDD type-specific functions
  • Explain interaction of components of a Spark Application
  • Explain the creation of the DAG schedule
  • Build and package Spark applications
  • Use application configuration items
  • Deploy applications to the cluster using YARN
  • Use data caching to increase performance of applications
  • Implement advanced features of spark
  • Learn general application optimization guidelines/tips
  • Create/transform data using dataframes
  • Read, use, and save to different Hadoop file formats
  • Understand the concepts of Spark Streaming
  • Create a streaming application
  • Use Spark MLlib to gain insights from data

Labs:

  • Create a Spark "Hello World" word count application
  • Use advanced RDD programming to perform sort, join, pattern matching and regex tasks
  • Explore partitioning and the Spark UI
  • Increase performance using data caching
  • Build/package a Spark application using Maven
  • Use a broadcast variable to efficiently join a small dataset to a massive dataset
  • Use an accumulator for reporting data quality issues
  • Create a dataframe and perform analysis
  • Load/transform/store data using Spark with Hive tables
  • Create a point-in-time spark stream application
  • Create a spark stream application using window functions

Target Audience: Developers, Architects, and Admins who would like to learn more about developing data applications in Spark, how it will affect their environment, and ways to optimize application.

Pre-requisites:  No previous Hadoop knowledge is required, though will be useful. Basic knowledge of PYTHON is required. Previous exposure to SQL is helpful, but not required. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs. Students are required to bring their own laptop.


HDP Operations: Security

Overview: This course is designed for experienced administrators who will be implementing secure Hadoop clusters using authentication, authorization, auditing and data protection strategies and tools.

Duration: Two Days - Sunday, June 26 - Monday, June 27

Objectives:

  • Describe 5 Pillars of Security
  • Choose security tool for use case
  • Security Prerequisites
  • Ambari Server Security
  • Apache Ranger
  • Apache Ranger KMS
  • Using Ranger to Secure Access
  • Perimeter Security - Apache Knox

Labs:

  • Accessing Your Cluster
  • Configure Name Resolution and Certificate to Active Directory
  • Setup Ambari to Active Directory Sync
  • Kerberize the Cluster
  • Setup AD/OS Integration via SSSD
  • Configure Ambari Server for Kerberos
  • Ranger Prerequisites
  • Ranger Install
  • Ranger KMS/Data Encryption Setup
  • Ranger KMS/Data Encryption Exercise
  • Knox Configuration

Target Audience: Experienced IT administrators who will be implementing security on an existing HDP 2.3 cluster using Ambari.

Pre-requisites: Students should be experienced in the management of Hadoop using Ambari and Linux environments. Completion of the Hadoop Administration I course is highly recommended. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs.


HDP Developer: Apache Spark using Scala

Overview:  This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Spark. The focus will be on utilizing the Spark API from Python.

Duration: Two Days - Sunday, June 26 - Monday, June 27

Objectives:

  • Understand core Spark concepts
  • Leverage the Spark Core API for developing applications
  • Use Spark shared variables
  • Use Spark + Hive effectively together

Format: 50% Lecture/Discussion, 50% Hands-on Labs.

Target Audience: Developers, Architects, and Admins who would like to learn more about developing data applications in Spark, how it will affect their environment, and ways to optimize application.

Pre-requisites: No previous Hadoop knowledge is required, though will be useful. Basic knowledge of Scala is required. Previous exposure to SQL is helpful, but not required. Students will need to bring their wi-fi enabled laptop pre-loaded with Chrome or Firefox browser in order to complete hands-on labs. Students are required to bring their own laptop.


HDF Operations: Hortonworks DataFlow

Overview: This condensed course is designed for "Data Stewards" or "Data Flow Managers" who are looking forward to automate the flow of data between systems.

Duration: Two Days - Sunday, June 26 - Monday, June 27

Objectives:

  • Understand what is HDF and what is Nifi, Core concepts and use cases
  • Understanding Nifi Architecture and Key features
  • Learn deep about Nifi User interface and how to build a Data flow
  • Understanding a Nifi Processor, Connection, Process Groups and Remote Process Groups
  • Basic overview of Data Flow Optimization and Data Provenance
  • Understanding Nifi Expression Language 
  • Installing and Configuring a Nifi Cluster 
  • Understanding security and monitoring options for HDF 
  • Integrating HDF and HDP
  • HDF System and Nifi best practices

Labs:

  • Installing and Starting NiFi
  • Building a NiFi Data Flow
  • Working With Processor Group
  • Working With Remote Processor Group [Site-­-to-­-Site]
  • NiFi Expression Language
  • Using Templates
  • Working With NiFi Cluster
  • NiFi Monitoring
  • HDF Integration with HDP [Spark,Kafka,Hbase]
  • Securing HDF with 2-way SSL
  • NiFi User Authentication with LDAP
  • End of the course project

Students are required to bring their own laptop.


HDP Overview: Apache Hadoop Essentials

Overview: This course provides a technical understanding for Business users and Decision makers and an overview of Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course provides an optional primer for those who plan to attend a hands-on, instructor-led course.

Duration: One Day - Sunday, June 26 OR Monday, June 27

Course Objectives:

  • Describe what makes data "Big Data"
  • List data types stored and analyzed in Hadoop
  • Describe how Big Data and Hadoop fit into your current infrastructure and environment
  • Describe fundamentals of: the Hadoop Distributed File System (HDFS)
  • YARN
  • MapReduce
  • Hadoop frameworks: (Pig, Hive, HCatalog, Storm, Solr, Spark, HBase, Oozie, Ambari, ZooKeeper, Sqoop, Flume, and Falcon)
    • Recognize use cases for Hadoop
    • Describe the business value of Hadoop
    • Describe new technologies like Tez and the Knox Gateway

Format: Lecture + Demos

Target Audience: Data architects, data integration architects, managers, C-level executives, decision makers, technical infrastructure team, and Hadoop administrators or developers who want to understand the fundamentals of Big Data and the Hadoop ecosystem

Pre-requisites: No previous Hadoop or programming knowledge is required. Students are encouraged to bring their wi-fi enabled laptop pre-loaded with the Hortonworks Sandbox should they want to duplicate demonstrations on their own machine


Hortonworks Certified Professional Exam

Monday, June 27 at 1:00 pm

Register now at a special price of $199 and take one of our Certification exams at the Summit Pre-training & Certification event.

Hortonwork's certification program is now offering hands-on, performance-based exams. This new approach to Hadoop certification is designed to allow individuals an opportunity to prove their Hadoop skills in a way that is recognized in the industry as meaningful and relevant to on-the-job performance.

As a special offer for attendees of Hadoop Summit, you can take "any" of our Hortonworks certification exams for $199. In addition, you have the unique opportunity to take an exam with a live proctor in the room. This special is only available for candidates who take the exam in person at Summit on June 27.

Please visit our website at http://hortonworks.com/training/certification/ for a list of available exams.

Earn Digital Badges: Hortonworks Certified Professionals receive a digital badge for each certification earned. Display your badges proudly on your resume, LinkedIn profile, email signature, etc. Each badge you earn is issued and verified by BadgeCert, a third-party digital badge authentication provider.

Certification candidates MUST bring their own laptops

Hortonworks Spark Certification Exam Bootcamp

Register now at the special price of $500 and take advantage of this exclusive opportunity to prepare for the Spark certification exam with the author of the exam, as well as attempt the exam at Summit or receive a voucher to take the exam at a later date.

Overview: The Spark Certification Exam Bootcamp is an exclusive opportunity to prepare for the new Hortonworks Spark Certification with the exam author. This hands-on workshop concentrates on the exam objectives for the Spark certification. This unique opportunity is your chance to participate in a “cram session” with the author and other certification candidates before attempting the exam.

You have the option of taking the exam at Summit on Monday, June 27, at 1:00 pm or receiving a voucher to attempt the exam at a later date.

Because of the focused nature of the Spark Certification Exam Bootcamp, it should not be regarded as a substitution for the Hortonworks Spark training course, but rather as a means of enhancing your chances of passing the exam.

Duration: One Day - Sunday, June 26 at 1:00 pm - 5:00 pm

Pre-requisite: The Spark Certification Exam Bootcamp is designed for software engineers who have completed Hortonworks Spark training and already have experience and a good working knowledge of Spark Core and Spark SQL.

Bootcamp participants are required to bring their own laptop.

sponsor purchase