menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Big Data News

>

The Modern...
source image

Dzone

1M

read

163

img
dot

Image Credit: Dzone

The Modern Era of Data Orchestration: From Data Fragmentation to Collaboration

  • Data engineering and software engineering have long been at odds, each with their own unique tools and best practices. In this article, we'll explore the role data orchestrators play and how recent trends in the industry may be bringing these two disciplines closer together than ever before.
  • Data orchestration serves to provide a principled approach to composing systems with complexity coming from many sources of data, destinations, stakeholders, and use cases for data products, and heterogeneous tools and processes involved with creating the end product.
  • Orchestration is required to coordinate between three high-level capabilities, ingestion, transformation, and serving.
  • Workflow engines enable data engineers to specify explicit orderings between tasks, running scheduled tasks much like cron, and watching for external events that should trigger a run.
  • The future of data orchestration is moving toward composable data systems. Standardization around open standards for data formats, such as Apache Parquet and Apache Arrow, enables native "data sharing" without all the glue code.
  • Apache Iceberg and other open table formats are building upon the success of Parquet by defining a layout for organizing files so that they can be interpreted as tables, providing governance controls to build an authoritative source of truth while benefiting from the zero-copy sharing that the underlying formats enable.
  • In a closed system, the data warehouse maintains its own table structure and query engine internally, while an open, deconstructed system standardizes its lowest-level details, allowing businesses to pick and choose the best vendor for a service while having the seamless experience that was previously only possible in a closed ecosystem.
  • Orchestration is the backbone of modern data systems and is tasked with untangling complex and interconnected processes. By embracing composability, organizations can simplify governance and benefit from the greatest advances happening in the industry.
  • Cloud providers have been adding compatibility with open data system standards, which is helping pave the way for the best-of-breed solutions of tomorrow.
  • New trends in open standards offer a fresh take on how these dependencies can be coordinated by building systems from the ground up to share data collaboratively, leading organizations to rethink the way that data is orchestrated and build the data products of the future.

Read Full Article

like

9 Likes

For uninterrupted reading, download the app