Contents   
Page-10   
Prev   
Next   
Page+10   
Index   
   Performance Techniques in MapReduce 
-  The Google File System (GFS) stores multiple copies (typically 3)
of data files on different computers for redundancy and availability.
 
-  Master assigns workers to process data such that the data
is on the worker's disk, or near the worker within the same rack.
This reduces network communication; network bandwidth is scarce.
 
-  Combiner functions can perform partial reductions (adding
"1" values) before data are written out to disk, reducing
both I/O and network traffic.
 
-  Master can start redundant workers to process the same data
as a dead or ``slacker'' worker.  Master will use the result from
the worker that finishes first; results from later workers will
be ignored.
 
-  Reduce workers can start work as soon as some Map workers have
finished their data.