MapReduce (paper) conceptpaper

Dean & Ghemawat (Google), 2004.

Dean & Ghemawat (Google), 2004. The paper that launched the big-data era. Distilled large-scale data processing into two functional primitives — map and reduce — and showed how a framework could handle the hard parts (parallelism, fault tolerance, data locality, retries) automatically. Hadoop was the open-source clone; eventually superseded by Spark, Flink, and SQL-on-large-data engines. The original framework is largely dead, but the *abstraction* (treat data processing as a DAG of stateless operators) is now universal.

aliases: MapReduce: Simplified Data Processing on Large Clusters

topics: distributed-systems, data-pipelines

references:

MapReduce paper