menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Big Data News

>

Stream rea...
source image

Amazon

1M

read

27

img
dot

Image Credit: Amazon

Stream real-time data into Apache Iceberg tables in Amazon S3 using Amazon Data Firehose

  • Engineering teams are increasingly replacing batch data processing pipelines with real-time streaming, and building data lakes to store their data, adopting open data formats such as Parquet and Apache Iceberg to store their data.
  • This trend is being seen across many industries such as online media, gaming companies, factories monitoring equipment for maintenance and failure, theme parks providing wait times for popular attractions.
  • Apache Iceberg is becoming popular among customers storing their data in Amazon S3 data lakes, because it allows customers to read and write data concurrently using different frameworks.
  • Amazon Data Firehose simplifies the process of streaming data by allowing users to configure a delivery stream, select a data source, and set Iceberg tables as the destination.
  • Firehose is integrated with over 20 AWS services, and supports routing data to different Iceberg tables to have data isolation or better query performance.
  • This post describes how to set up Firehose to deliver data streams into Iceberg tables on Amazon S3, and addresses different scenarios for routing data into iceburg tables.
  • For instance, routing records to different tables based on the content of the incoming data by specifying a JSON Query expression can be accomplished by setting the 'Database expression' and 'Table expression' fields.
  • Alternatively, routing records to different tables based on the content of the incoming data can be achieved by using a Lambda function, as described in use case 4.
  • All of the AWS services used in these examples are serverless, and no infrastructure management is required.
  • Users can query data they’ve written to Iceberg tables using different processing engines such as Apache Spark, Apache Flink, or Trino, or use Amazon Athena.

Read Full Article

like

1 Like

For uninterrupted reading, download the app