menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Big Data News

>

AWS Glue D...
source image

Amazon

4w

read

196

img
dot

Image Credit: Amazon

AWS Glue Data Catalog supports automatic optimization of Apache Iceberg tables through your Amazon VPC

  • The AWS Glue Data Catalog supports automatic table optimization of Apache Iceberg tables, including compaction, snapshots, and orphan data management.
  • The data compaction optimizer constantly monitors table partitions and kicks off the compaction process when the threshold is exceeded for the number of files and file sizes.
  • The Iceberg table compaction process starts and will continue if the table or any of the partitions within the table has more than the configured number of files (default five files), each smaller than 75% of the target file size.
  • The snapshot retention process runs periodically (default daily) to identify and remove snapshots that are older than the specified retention configuration from the table properties.
  • Similarly, the orphan file deletion process scans the table metadata and the actual data files, identifies the unreferenced files, and deletes them to reclaim storage space.
  • To help achieve such requirements, we provide the capability where the Data Catalog optimizes Iceberg tables to run in your specific VPC.
  • By default, a table optimizer is not associated with any of your VPCs and subnets.
  • With this new capability of supporting data access from VPCs, you can associate a table optimizer with an AWS Glue network connection to run in a specific VPC, subnet, and security group.
  • This feature is available today in all AWS Glue supported AWS Regions.
  • The post includes a sample AWS CloudFormation template that enables a quick setup of the solution resources.

Read Full Article

like

11 Likes

For uninterrupted reading, download the app