Well-Architected architecture study
Governed Data Lake for Analytics
A data lake design that separates raw, curated, and governed zones while controlling access, retention, encryption, and query cost.
Application exports
S3 raw zone
S3
S3 curated zone
S3
Lake Formation
Athena + dashboards
CloudWatch
Problem
Analytics platforms fail when every team reads raw data directly, sensitive fields are not classified, and query cost is treated as someone else's problem.
Design
- S3 prefixes separate raw, curated, and governed datasets.
- Glue crawlers and jobs catalog and transform data into partitioned Parquet.
- Lake Formation grants table and column access based on roles.
- KMS keys separate sensitive datasets from general analytics data.
- Macie helps identify sensitive data exposure.
- Athena workgroups enforce query limits and output locations.
Well-Architected lens
- Security: column permissions, encryption boundaries, sensitive-data discovery, and audit events.
- Cost optimization: Parquet, partitioning, lifecycle rules, and Athena workgroup limits.
- Operational excellence: data-quality checks and catalog ownership.
- Reliability: immutable raw data allows rebuilding curated datasets.
Why it is not live here
A real governed lake needs meaningful datasets and access roles. Without that, the running demo would be mostly empty buckets and catalog metadata.