Amazon SageMaker Lakehouse enables a unified, open, and secure lakehouse platform on your existing data lakes and warehouses.
SageMaker Lakehouse enables interoperability by providing open source Apache Iceberg REST APIs to access data in the lakehouse.
SageMaker Lakehouse now provides secure and fine-grained access controls on data in both data warehouses and data lakes.
In this post, we show how tables cataloged in Data Catalog and stored on Amazon S3 can be consumed from Databricks compute using Glue Iceberg REST Catalog with data access secured using Lake Formation.
To follow along with the solution presented in this post, you need the following AWS prerequisites.
Create a cluster and configure it to connect with a Glue Iceberg REST Catalog endpoint.
We have showed you how to manage a dataset centrally in AWS Glue Data Catalog and make it accessible to Databricks compute using the Iceberg REST Catalog API.
The solution also enables you to use Databricks to use existing access control mechanisms with Lake Formation.
Srividya Parthasarathy is a Senior Big Data Architect on the AWS Lake Formation team.
Venkatavaradhan (Venkat) Viswanathan is a Global Partner Solutions Architect at Amazon Web Services.