> > > CDTMR Detailed outline

Cloudera Developer Training for MapReduce (CDTMR)

Course Description Schedule Course Outline

Detailed Course Outline

  • The Motivation for Hadoop
    • Problems with Traditional Large-Scale Systems
    • Requirements for a New Approach
    • Introducing Hadoop
  • Hadoop: Basic Concepts
    • The Hadoop Project and Hadoop Components
    • The Hadoop Distributed File System
    • Hands-On Exercise: Using HDFS
    • How MapReduce Works
    • Hands-On Exercise: Running a MapReduce Job
    • How a Hadoop Cluster Operates
    • Other Hadoop Ecosystem Projects
  • Writing a MapReduce Program
    • The MapReduce Flow
    • Basic MapReduce API Concepts
    • Writing MapReduce Drivers, Mappers and Reducers in Java
    • Writing Mappers and Reducers in Other Languages Using the Streaming API
    • Speeding Up Hadoop Development by Using Eclipse
    • Hands-On Exercise: Writing a MapReduce Program
    • Differences Between the Old and New MapReduce APIs
  • Unit Testing MapReduce Programs
    • Unit Testing
    • The JUnit and MRUnit Testing Frameworks
    • Writing Unit Tests with MRUnit
    • Hands-On Exercise: Writing Unit Tests with the MRUnit Framework
  • Delving Deeper into the Hadoop API
    • Using the ToolRunner Class
    • Hands-On Exercise: Writing and Implementing a Combiner
    • Setting Up and Tearing Down Mappers and Reducers by Using the Configure and Close Methods
    • Writing Custom Partitioners for Better Load Balancing
    • Optional Hands-On Exercise: Writing a Partitioner
    • Accessing HDFS Programmatically
    • Using The Distributed Cache
    • Using the Hadoop API’s Library of Mappers, Reducers and Partitioners
  • Practical Development Tips and Techniques
    • Strategies for Debugging MapReduce Code
    • Testing MapReduce Code Locally by Using LocalJobReducer
    • Writing and Viewing Log Files
    • Retrieving Job Information with Counters
    • Determining the Optimal Number of Reducers for a Job
    • Creating Map-Only MapReduce Jobs
    • Hands-On Exercise: Using Counters and a Map-Only Job
  • Data Input and Output
    • Creating Custom Writable and WritableComparable Implementations
    • Saving Binary Data Using SequenceFile and Avro Data Files
    • Implementing Custom Input Formats and Output Formats
    • Issues to Consider When Using File Compression
    • Hands-On Exercise: Using SequenceFiles and File Compression
  • Common MapReduce Algorithms
    • Sorting and Searching Large Data Sets
    • Performing a Secondary Sort
    • Indexing Data
    • Hands-On Exercise: Creating an Inverted Index
    • Computing Term Frequency — Inverse Document Frequency
    • Calculating Word Co-Occurrence
  • Hands-On Exercise: Calculating Word Co-Occurrence (Optional)
  • Hands-On Exercise: Implementing Word Co-Occurrence with a Customer WritableComparable (Optional)
  • Joining Data Sets in MapReduce Jobs
    • Writing a Map-Side Join
    • Writing a Reduce-Side Join
    • Integrating Hadoop into the Enterprise Workflow
    • Integrating Hadoop into an Existing Enterprise
    • Loading Data from an RDBMS into HDFS by Using Sqoop
    • Hands-On Exercise: Importing Data with Sqoop
    • Managing Real-Time Data Using Flume
    • Accessing HDFS from Legacy Systems with FuseDFS and HttpFS
  • Machine Learning and Mahout
    • Introduction to Machine Learning
    • Using Mahout
    • Hands-On Exercise: Using a Mahout Recommender
  • An Introduction to Hive and Pig
    • The Motivation for Hive and Pig
    • Hive Basics
    • Hands-On Exercise: Manipulating Data with Hive
    • Pig Basics
    • Hands-On Exercise: Using Pig to Retrieve Movie Names from Our Recommender
    • Choosing Between Hive and Pig
  • An Introduction to Oozie
    • Introduction to Oozie
    • Creating Oozie Workflows
    • Hands-On Exercise: Running an Oozie Workflow
 

Accessing our website tells us you are happy to receive all our cookies. However you can change your cookie settings at any time. Find out more.   Got it!