Articles on: Data Management

How does the platform data pipeline work?

Integration is the process of combining multiple data sources into a unified data format. The platform integrates data by importing it, running it through a series of cleaning and standardization steps, and finally writing it to a final standard data format. This is orchestrated by the data pipeline.


There are several steps in a pipeline:


  • Generate
  • Process
  • Index
  • Validate


In the generate step of the pipeline, data is "generated" by running queries or copying from the source data system.


The processing step takes this data from each source and transforms it into a standard unified data format. The processing step is complex and varied because its behavior depends heavily on the data source. Processing scripts are often casually referred to as "integrations" and contain logic to clean and normalize data from a given source. At the end of the process step, matching and deduplication takes place, joining data records from different data sources into a single unified data source. The matching and deduplication step involves algorithms and heuristics that solve the complicated and error-prone problem of matching datasets without shared primary keys.


The index step takes the unified dataset and puts it into a database. This step is usually straightforward, but database insertion and indexing can take a while.


The validate step compares the indexed dataset to a "source of truth" control dataset, otherwise known as the validation dataset. For each row in the validation dataset, a query is run against the live database. If the query result does not match the expected result, the row is flagged and the dataset fails validation. The result of validation is known as a validation report.


Each time a pipeline runs, data is fetched from its original source, processed and stored in a Druid database. When users then run queries using the Zenysis platform, data is fetched from the Druid database and not the original source.

Updated on: 25/09/2025

Was this article helpful?

Share your feedback

Cancel

Thank you!