Contents    Page-10    Prev    Next    Page+10    Index   

Map Worker

A Map Worker runs the Map program on its assigned data. The Map program receives as input (inputkey, inputvalue) pairs; for example, inputkey could be the IP address of a web page (as a string) and inputvalue could be the contents of that web page (all as one string).

The Map worker emits (outputkey, list(mapvalue)) pairs. outputkey could be the same as inputkey, but often is different. For example, to count links to a web page, outputkey could be the IP address of a page that is linked to by the page being processed.

If there are R Reduce Workers, the outputkey is hashed modulo R to determine which Reduce Worker will get it; hashing randomizes the assignment of keys to Reduce Workers, providing load balancing.

The Map Worker has R output buffers corresponding to R files that it is producing as output, one for each Reduce Worker. The (outputkey, list(mapvalue)) pair is put into the corresponding output buffer.