recall

← recall

data lake term

raw, unstructured storage of all your data, queried later

Cheap object storage (S3, GCS) holding files in open formats (Parquet, ORC, Avro). Schema-on-read. Compared to warehouses, more flexible but harder to query well.

topics: data-pipelines

references: