> > > CDTMR Detailed outline

Cloudera Developer Training for MapReduce (CDTMR)

Course Description Schedule Course Outline

Detailed Course Outline

  • The Motivation for Hadoop
    • Problems with Traditional Large-Scale Systems
    • Requirements for a New Approach
    • Introducing Hadoop
  • Hadoop: Basic Concepts
    • The Hadoop Project and Hadoop Components
    • The Hadoop Distributed File System
    • Hands-On Exercise: Using HDFS
    • How MapReduce Works
    • Hands-On Exercise: Running a MapReduce Job
    • How a Hadoop Cluster Operates
    • Other Hadoop Ecosystem Projects
  • Writing a MapReduce Program
    • The MapReduce Flow
    • Basic MapReduce API Concepts
    • Writing MapReduce Drivers, Mappers and Reducers in Java
    • Writing Mappers and Reducers in Other Languages Using the Streaming API
    • Speeding Up Hadoop Development by Using Eclipse
    • Hands-On Exercise: Writing a MapReduce Program
    • Differences Between the Old and New MapReduce APIs
  • Unit Testing MapReduce Programs
    • Unit Testing
    • The JUnit and MRUnit Testing Frameworks
    • Writing Unit Tests with MRUnit
    • Hands-On Exercise: Writing Unit Tests with the MRUnit Framework
  • Delving Deeper into the Hadoop API
    • Using the ToolRunner Class
    • Hands-On Exercise: Writing and Implementing a Combiner
    • Setting Up and Tearing Down Mappers and Reducers by Using the Configure and Close Methods
    • Writing Custom Partitioners for Better Load Balancing
    • Optional Hands-On Exercise: Writing a Partitioner
    • Accessing HDFS Programmatically
    • Using The Distributed Cache
    • Using the Hadoop API’s Library of Mappers, Reducers and Partitioners
  • Practical Development Tips and Techniques
    • Strategies for Debugging MapReduce Code
    • Testing MapReduce Code Locally by Using LocalJobReducer
    • Writing and Viewing Log Files
    • Retrieving Job Information with Counters
    • Determining the Optimal Number of Reducers for a Job
    • Creating Map-Only MapReduce Jobs
    • Hands-On Exercise: Using Counters and a Map-Only Job
  • Data Input and Output
    • Creating Custom Writable and WritableComparable Implementations
    • Saving Binary Data Using SequenceFile and Avro Data Files
    • Implementing Custom Input Formats and Output Formats
    • Issues to Consider When Using File Compression
    • Hands-On Exercise: Using SequenceFiles and File Compression
  • Common MapReduce Algorithms
    • Sorting and Searching Large Data Sets
    • Performing a Secondary Sort
    • Indexing Data
    • Hands-On Exercise: Creating an Inverted Index
    • Computing Term Frequency — Inverse Document Frequency
    • Calculating Word Co-Occurrence
  • Hands-On Exercise: Calculating Word Co-Occurrence (Optional)
  • Hands-On Exercise: Implementing Word Co-Occurrence with a Customer WritableComparable (Optional)
  • Joining Data Sets in MapReduce Jobs
    • Writing a Map-Side Join
    • Writing a Reduce-Side Join
    • Integrating Hadoop into the Enterprise Workflow
    • Integrating Hadoop into an Existing Enterprise
    • Loading Data from an RDBMS into HDFS by Using Sqoop
    • Hands-On Exercise: Importing Data with Sqoop
    • Managing Real-Time Data Using Flume
    • Accessing HDFS from Legacy Systems with FuseDFS and HttpFS
  • Machine Learning and Mahout
    • Introduction to Machine Learning
    • Using Mahout
    • Hands-On Exercise: Using a Mahout Recommender
  • An Introduction to Hive and Pig
    • The Motivation for Hive and Pig
    • Hive Basics
    • Hands-On Exercise: Manipulating Data with Hive
    • Pig Basics
    • Hands-On Exercise: Using Pig to Retrieve Movie Names from Our Recommender
    • Choosing Between Hive and Pig
  • An Introduction to Oozie
    • Introduction to Oozie
    • Creating Oozie Workflows
    • Hands-On Exercise: Running an Oozie Workflow
 

Cookies help us deliver our services. By using our services, you agree to our use of cookies.   Got it!