GFS (paper) conceptpaper
Ghemawat, Gobioff, Leung (Google), 2003.
Ghemawat, Gobioff, Leung (Google), 2003. The distributed file system underneath MapReduce, BigTable, and most of Google's early infra. Optimized for large files, append-mostly workloads, commodity hardware (failure-tolerant by design). Single-master architecture for metadata; chunkservers hold the data. HDFS is the open-source clone. The paper that taught the industry that 'tolerate failures via replication and a coordinator' was a viable alternative to 'don't fail.'