Amazon Redshift supports querying data stored in Apache Iceberg tables managed by Amazon S3 Tables, with a focus on production environments and centralized governance for data access and permissions.
The post demonstrates how to set up an Apache Iceberg data lake catalog using Amazon S3 Tables, enabling fine-grained access controls and unified analytics with Amazon Redshift.
It covers steps like creating an S3 Table bucket, loading data using Amazon EMR, granting permissions with Lake Formation, and running SQL analytics on the data.
Prerequisites include adding a Redshift service-linked role, creating an Amazon EC2 key pair, and utilizing various AWS services like Redshift Serverless, S3 Tables, Glue Data Catalog, Lake Formation, and Spark with EMR.
Users are guided to create resources using a CloudFormation template, load sample datasets into S3 buckets, and connect Amazon Redshift to query Apache Iceberg data stored in Amazon S3 Tables.
Detailed steps are provided for creating S3 Tables, loading data, granting permissions to IAM users, and querying the data in both Redshift and S3 Tables.
The post concludes by showcasing how data can be combined from S3 Tables and local Amazon Redshift tables in a single query for a seamless analytics experience.
It emphasizes cleanup steps to delete deployed resources using AWS CloudFormation and invites feedback on the features presented.
Authors of the post include Satesh Sonti, a Sr. Analytics Specialist Solutions Architect with expertise in data platforms, and Jonathan Katz, a Principal Product Manager on the Amazon Redshift team and Core Team member of PostgreSQL.