Logo
Sign in
Product Logo
Data ObservabilitySoda

Declarative data‑quality checks as code for CI/CD pipelines to catch bad data early.

soda-7cple0xpi6w5shpe.preview.jpg
image.webp
Product details

Overview

Soda Pipeline Testing enables data teams to embed data quality checks at early stages of data pipelines and CI/CD workflows. Using SodaCL—a simple declarative YAML-based checks language—engineers can define rules, integrate with version control (e.g. Git), and automate scans. This ensures data issues are caught before entering production, preventing business impact and enhancing trust in downstream systems.

Features and Capabilities

  • Core language (SodaCL): A human-readable, declaration-first DSL for writing and versioning data checks as code in CI/CD and pipelines.
  • Built‑in checks & metrics: Ready‑to‑use quality validations (e.g. row count, null thresholds) that scale quickly across datasets.
  • Pipeline staging support: Insert checks at multiple stages—post-ingestion, post-transformation—to validate data before it propagates downstream.
  • CI/CD integration: Hook into GitHub Actions (and similar) to run scans on pull requests, automatically annotate PRs with results, and prevent merging bad data.
  • Alerts & incident routing: Integrates with Slack, Microsoft Teams, email, webhooks, ServiceNow, Jira, PagerDuty to notify teams upon failures.
  • Incident management: Surfaces issues early, routing them to the right stakeholders for fast resolution.
  • Integration ecosystem: Supports major orchestration tools and platforms including Airflow, Databricks, Dagster, Azure Data Factory, dbt, Snowflake, Alation, Atlan, Collibra, and BI tools like Looker and Power BI.
  • Anomaly detection: Detects unexpected data issues using dynamic thresholds and statistical profiling beyond static rules.
  • Version control & collaboration: Uses git-managed config and checks files to enable collaborative development and governance of quality rules.