menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Big Data News

>

Introducin...
source image

Amazon

3w

read

153

img
dot

Image Credit: Amazon

Introducing AWS Glue Data Catalog automation for table statistics collection for improved query performance on Amazon Redshift and Amazon Athena

  • AWS Glue Data Catalog now automates generating statistics for new tables, integrated with CBO from Amazon Redshift Spectrum and Amazon Athena.
  • Table statistics are essential in optimizing queries on large datasets for join operations across multiple datasets.
  • Data Catalog previously supported collecting table statistics for table formats like Parquet, ORC, JSON, ION, CSV, and XML and Apache Iceberg tables.
  • The latest update allows administrators to configure weekly statistics collection across all databases and tables, optimizing the platform's cost-efficiency.
  • The feature enables flexible per-table controls, allowing individual data owners to manage table statistics per their requirements.
  • Catalog-level statistics collection can be enabled via the Lake Formation console or the AWS CLI.
  • With this feature, AWS Glue automatically updates column statistics for all columns in each table, using 20% of records to calculate statistics.
  • Individual data owners can configure scheduled collection configurations at the table level and customize settings for individual tables.
  • This feature will help in the efficient management of up-to-date column-level statistics to optimize query processing and cost-efficiency.
  • Try this feature for your use case, and share your feedback in the comments.

Read Full Article

like

9 Likes

For uninterrupted reading, download the app