Well-Architected architecture study

Governed Data Lake for Analytics

A data lake design that separates raw, curated, and governed zones while controlling access, retention, encryption, and query cost.

Status Architecture study
AWS focus
S3GlueLake FormationAthenaMacie
AWS
Application exports
S3
S3 raw zone S3
S3
S3 curated zone S3
AWS
Lake Formation
CW
Athena + dashboards CloudWatch

Problem

Analytics platforms fail when every team reads raw data directly, sensitive fields are not classified, and query cost is treated as someone else's problem.

Design

  • S3 prefixes separate raw, curated, and governed datasets.
  • Glue crawlers and jobs catalog and transform data into partitioned Parquet.
  • Lake Formation grants table and column access based on roles.
  • KMS keys separate sensitive datasets from general analytics data.
  • Macie helps identify sensitive data exposure.
  • Athena workgroups enforce query limits and output locations.

Well-Architected lens

  • Security: column permissions, encryption boundaries, sensitive-data discovery, and audit events.
  • Cost optimization: Parquet, partitioning, lifecycle rules, and Athena workgroup limits.
  • Operational excellence: data-quality checks and catalog ownership.
  • Reliability: immutable raw data allows rebuilding curated datasets.

Why it is not live here

A real governed lake needs meaningful datasets and access roles. Without that, the running demo would be mostly empty buckets and catalog metadata.