Apache Iceberg, the database table format project, is providing full database functionality on top of cloud object stores with open-source tools.
Created by two Netflix engineers to overcome issues with the company's Amazon S3-stored media streaming files, Iceberg demonstrates how the separation of storage from compute in modern data stacks has allowed scalable, cost-effective computing and improved interaction between systems.
Users can work with any analytics engine by employing Iceberg to enable all compute engines to interact with common data foundations.
Meanwhile, open-source Trino distributed query engine, developed to replace Facebook's 300-petabyte Hive data warehouse, responds to the need for fast, ad-hoc analytics queries over big data file systems.
Starburst, the commercial developer of Trino, is taking steps to make it simpler to build apps on data lake architectures by offering a fully managed Icehouse data lake to support near-real-time data ingestion into a managed Iceberg table at a petabyte scale.
Iceberg's modern table format continues to attract enterprise interest, with Google, Snowflake and Databricks among the companies that have announced support for it according to its creator Ryan Blue.
In its earnings call at the end of February, Snowflake noted that customer adoption of Iceberg tables could create “revenue headwinds” for the firm.
Iceberg provides a strong technical foundation and an open community, where it's owned and controlled by the Apache Software Foundation. Blue wants the project to be a foundational layer in data architecture for solving all of the problems.
Assembling the best analytical engines in cost-effective solutions is important because people want to move to more cost-effective solutions when there's too much data to process, according to Dain Sundstrom, chief technology officer of Starburst.
Iceberg combines Trino and Iceberg storage, and offers the best data lake format in Sundstrom's opinion.