Taught bya 4 person team including 2Stanford-educated, ex-Googlers and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with Java and with billions of rows of data. This course is a zoom-in, zoom-out, hands-on workout involving Hadoop, MapReduce and the art of thinking parallel. Lets parse that. Zoom-in, Zoom-Out: This course is both broad anddeep. It covers the individual components of Hadoop in great detail, and alsogives you a higher level picture of how they interact with each other. Hands-on workout involving Hadoop, MapReduce: This course will get you hands-on with Hadoop very early on. You’ll learn how toset up your owncluster using both VMs and the Cloud. All the major features of MapReduce are covered - including advanced topics like Total Sort and Secondary Sort. The art of thinking parallel: MapReduce completelychanged the way people thought about processing Big Data. Breaking down any problem into parallelizable units isan art. The examples in this coursewill train you to “think parallel”.What’s Covered: Lot’s of cool stuff. Using MapReduce toRecommend friends ina Social Networking site: Generate Top 10 friend recommendations using a Collaborative filtering algorithm. Build an Inverted Index for Search Engines: Use MapReduce to parallelize the humongous task of building an inverted index for a search engine. GenerateBigrams from text: Generate bigrams and computetheir frequency distribution in a corpus of text. Build yourHadoop cluster: InstallHadoop in Standalone, Pseudo-Distributed and Fully Distributed modesSetup a hadoop cluster using Linux VMs. Set up a cloud Hadoopcluster on AWSwith Cloudera Manager. UnderstandHDFS, MapReduce and YARNand their interactionCustomize your MapReduce Jobs: Chain multiple MRjobs togetherWrite your ownCustomized PartitionerTotal Sort: Globally sorta large amount of data by sampling input filesSecondary sortingUnit tests with MRUnitIntegrate with Python using the Hadoop Streaming API. and of course all the basics: MapReduce: Mapper, Reducer, Sort/Merge, Partitioning, Shuffle and SortHDFS & YARN: Namenode, Datanode, Resource manager, Node manager, the anatomy of a MapReduce application, YARNScheduling, Configuring HDFSand YARNto performance tuneyour cluster.

Learn By Example: Hadoop, MapReduce for Big Data problems

Recommended products