Databricks open-sources its declarative ETL framework as Apache Spark Declarative Pipelines at the Data + AI Summit, aiming to enhance the Apache Spark community's capabilities.
The move serves to strengthen Databricks' commitment to open ecosystems and compete with Snowflake's recent Openflow service for data integration.
Spark Declarative Pipelines simplifies data pipeline authoring, automates operations, and supports both batch and streaming workloads.
Engineers define pipelines using SQL or Python, allowing Apache Spark to handle execution, dependency tracking, and operational tasks.
The framework supports various data types and sources, enabling real-time and periodic processing through a unified API with validations before execution.
By leveraging the declarative approach, Databricks aims to streamline the Apache Spark experience and simplify end-to-end pipeline development.
Numerous enterprises have benefited from the framework, reducing development time, maintenance efforts, and achieving optimized performance.
Databricks' open-source strategy makes the Spark Declarative Pipelines accessible to a broader user base beyond its existing customers.
The rollout aligns with Databricks' past contributions to the open-source community and coincides with the commercial version's availability.
Apache Spark Declarative Pipelines will be integrated into the Apache Spark codebase soon, while the commercial version offers additional enterprise features.