> > > CATAH Detailed outline

Cloudera Administrator Training for Apache Hadoop (CATAH)

Course Description Schedule Course Outline

Detailed Course Outline

Introduction

The Case for Apache Hadoop

  • Why Hadoop?
  • A Brief History of Hadoop
  • Core Hadoop Components
  • Fundamental Concepts

HDFS

  • HDFS Features
  • Writing and Reading Files
  • NameNode Considerations
  • Overview of HDFS Security
  • Using the Namenode Web UI
  • Using the Hadoop File Shell

Getting Data into HDFS

  • Ingesting Data from External Sources with Flume
  • Ingesting Data from Relational Databases with Sqoop
  • REST Interfaces
  • Best Practices for Importing Data

MapReduce

  • What Is MapReduce?
  • Features of MapReduce
  • Basic Concepts
  • Architectural Overview
  • MapReduce Version 2
  • Failure Recovery
  • Using the JobTracker Web UI

Planning Your Hadoop Cluster

  • General Planning Considerations
  • Choosing the Right Hardware
  • Network Considerations
  • Configuring Nodes
  • Planning for Cluster Management

Hadoop Installation and Initial Configuration

  • Deployment Types
  • Installing Hadoop
  • Specifying the Hadoop Configuration
  • Performing Initial HDFS Configuration
  • Performing Initial MapReduce Configuration
  • Log File Locations

Installing and Configuring Hive, Impala, and Pig

  • Hive
  • Impala
  • Pig

Hadoop Clients

  • What is a Hadoop Client?
  • Installing and Configuring Hadoop Clients
  • Installing and Configuring Hue
  • Hue Authentication and Configuration

Cloudera Manager

  • The Motivation for Cloudera Manager
  • Cloudera Manager Features
  • Standard and Enterprise Versions
  • Cloudera Manager Topology
  • Installing Cloudera Manager
  • Installing Hadoop Using Cloudera Manager
  • Performing Basic Administration Tasks
  • Using Cloudera Manager

Advanced Cluster Configuration

  • Advanced Configuration Parameters
  • Configuring Hadoop Ports
  • Explicitly Including and Excluding Hosts
  • Configuring HDFS for Rack Awareness
  • Configuring HDFS High Availability

Hadoop Security

  • Why Hadoop Security Is Important
  • Hadoop’s Security System Concepts
  • What Kerberos Is and How it Works
  • Securing a Hadoop Cluster with Kerberos

Managing and Scheduling Jobs

  • Managing Running Jobs
  • Scheduling Hadoop Jobs
  • Configuring the FairScheduler

Cluster Maintenance

  • Checking HDFS Status
  • Copying Data Between Clusters
  • Adding and Removing Cluster Nodes
  • Rebalancing the Cluster
  • NameNode Metadata Backup
  • Cluster Upgrading

Cluster Monitoring and Troubleshooting

  • General System Monitoring
  • Managing Hadoop’s Log Files
  • Monitoring Hadoop Clusters
  • Common Troubleshooting Issues

Conclusion

 

Accessing our website tells us you are happy to receive all our cookies. However you can change your cookie settings at any time. Find out more.   Got it!