menu
techminis

A naukri.com initiative

google-web-stories
source image

Cloudblog

1d

read

140

img
dot

Image Credit: Cloudblog

Introducing BigQuery metastore, a unified metadata service with Apache Iceberg support

  • Google has introduced a unified metadata service called BigQuery metastore, which enables a single source of truth for all data processing, is serverless, and supports BigQuery, Apache Spark, Apache Flink and Apache Hive.
  • BigQuery metastore has near-unlimited scaleability and works with multiple analytics engines using the Apache Iceberg table format, enabling query processing and DML for data stored in open and proprietary formats across object stores, BigQuery storage, and across analytics runtimes.
  • The metastore operates in a no-ops environment, reducing total cost of access and democratizing data for analysts, engineers, and scientists.
  • An industry challenge with metadata management is fragmentation across storage systems. However, BigQuery metastore supports open data formats such as Apache Iceberg that are accessible by a variety of processing engines, making it easier for users to find and use data.
  • The metastore also has built-in key governance features, such as automated cataloguing, universal search, business metadata, data profiling, data quality, fine-grained access controls, data masking, sharing, data lineage and audit logging.
  • BigQuery metastore is the solution for individuals who want to modernize from legacy data lakes to a modern lakehouse architecture, which comprises the benefits of data lakes and data warehouses without having to manage both separately.
  • The introduction of the BigQuery metastore means that applications no longer have to maintain multiple copies of data and metadata persisted in various metastores across different processing engines.
  • The PySpark script is a prime example of how to use BigQuery metastore-with it, Spark can interact with a BigQuery storage table, a BigQuery table for Apache Iceberg, and a BigQuery external table.
  • BigQuery metastore is the serverless, modern solution for metadata management, offering cross-engine capabilities and built-in governance, and can be tried out in the documentation.
  • Migration tooling is provided for those interested in migrating from Dataproc Metastore to BigQuery metastore.

Read Full Article

like

8 Likes

For uninterrupted reading, download the app