Amazon DataZone helps organizations catalog, discover, share, and govern data stored across AWS, on-premises systems, and third-party sources.
Customers can extend the streamlined data discovery and subscription workflows in Amazon DataZone to unstructured data, according to a recent AWS blog post.
The post provides a step-by-step tutorial on how to implement a custom subscription workflow using Amazon DataZone, Amazon EventBridge, and AWS Lambda to automate fulfillment for unmanaged data assets such as unstructured data stored in Amazon S3, enhancing governance and simplifying access to unstructured data assets across the organization.
The solution includes creating a custom subscription workflow that uses the event-driven architecture of Amazon DataZone to handle relevant EventBridge events that will create, cancel, or revoke bucket policies for subscribed S3 assets using an AWS Lambda function.
The function will ensure that unmanaged S3 asset policies reflect the requests and access control specified in the custom environment with the subscription target.
Amazon DataZone publishes EventBridge event details about activities within a user's data portal such as subscription requests, comments, and system events based on key activities such as subscription requests, updates, comments, and system events.
Users search for assets in the custom environment, ask for subscription, and access their data in Amazon SageMaker via their IAM roles.
The tutorial begins by publishing an unstructured S3 based data asset as S3ObjectCollectionType to Amazon DataZone, creating an AWS service environment, and setting up an IAM role attached to a SageMaker notebook instance.
After implementing a custom workflow and approving subscriptions, the environment will have access to the unstructured S3 asset.
Organizations using this custom workflow can now extend the streamlined data discovery and subscription workflows of Amazon DataZone to their unstructured S3 data while maintaining governance and data access control to enhance discovery and access to unstructured data assets across the enterprise.