Column-Level Lineage Diff

Overview

The Column-Level Lineage Diff Analysis feature in Paradime enables users to understand the blast radius of their changes directly within pull requests (PRs). By leveraging field-level lineage, this CI check identifies changes to columns in your dbt™ models and creates a report for all impacted downstream objects. This includes renaming or removing columns and changes to the underlying logic of columns in your dbt™ models.

When a PR is opened it GitHub, an automated comment is generated listing all downstream nodes. This allows users to understand the changes introduced at a column level and assess the potential impact on downstream dbt™ models. BI dashboards, and other downstream elements.

Key Features

  • Field-Level Lineage: Identify changes to columns in your dbt™ models and generate a detailed report of all impacted downstream objects.

  • Automated Comments: Receive automated comments in your PRs listing all downstream dbt™ models and BI nodes affected by the changes.

  • Impact Assessment: Understand what nodes and other elements might be impacted by the changes introduced in the PR.

Use cases

  • Assess all downstream nodes nodes impacted by changes both within a dbt project and in downstream application (example: BI)

  • For Data Mesh architectures, see how your current project's changes impact other project changes.

Tutorial

Prerequisites

To use theColumn-Level Lineage Diff Analysis features, ensure the following prerequisites are met:

  1. Git Integration: Install the Paradime GitHub app and authorize access to the dbt™ repository used in Paradime or use alternative methods based on your Git Provider. See setup instructions.

  2. Production Connection: Add a production connection with access to your sources and models generated when running production jobs. This allows Paradime to run information schema queries and build field-level lineage. See connection guide for instructions based on your data warehouse provider.

  3. Have at least one Bolt Schedule configured. This is required to generate field-level lineage for your dbt™ project. See Bolt Scheduler for configuration.

To get the most value out of Lineage Diff Analysis, connect you BI tools (Tableau, Thoughtspot, Looker, etc.) to see all downstream nodes impacted.

Setup Instructions


GitHub Integration: Install the Paradime GitHub app and authorize access to the dbt™ repository used in Paradime. See installation guide for instructions.

Troubleshooting

Unable to find public GitHub email address

If a user GitHub is not configured correctly when opening a PR the user will see the below comment in the Pull Request:

To fix this issue, make sure the user opening the Pull Request completed the GitHub in Paradime.

Lineage Diff Feature - Supported Use Cases

The lineage diff feature analyzes changes in dbt models to track structural modifications that affect downstream dependencies.

The lineage diff feature focuses on structural changes to SELECT statements that affect the schema and column availability for downstream models. It does not track logic changes, data transformations, or modifications to non-SELECT clauses.

Supported Changes

SQL Structural Changes

The lineage diff feature detects and tracks the following structural modifications:

  • Column renaming: When a column is renamed in a SELECT statement

  • Column removal: When a column is removed from a SELECT statement

  • Column addition: When a new column is added to a SELECT statement

Example - Supported Changes

-- Before
SELECT 
    customer_id,
    customer_name,
    email
FROM customers

-- After (column renamed)
SELECT 
    customer_id,
    full_name,  -- renamed from customer_name
    email
FROM customers

Non-Supported Changes

Structural Changes to Non-SELECT Statements

  • WHERE clause modifications: Changes to filtering conditions

  • JOIN modifications: Adding, removing, or changing JOIN conditions

  • GROUP BY changes: Modifications to grouping logic

  • ORDER BY changes: Changes to sorting logic

Structural Changes Used in Non-SELECT Contexts

Even if a change is structural (like renaming a column), the lineage diff feature does not track usage in:

  • WHERE clauses: Column references in filtering conditions

  • JOIN conditions: Column references in table joins

  • GROUP BY clauses: Column references in grouping logic

  • ORDER BY clauses: Column references in sorting logic

Data Changes

  • Column calculation changes: Modifications to how a column value is computed

  • NULL handling changes: Changes in NULL value treatment

  • Data type transformations: Changes that affect data representation but not structure

Example - Non-Supported Changes

-- Before
SELECT 
    customer_id,
    customer_name,
    revenue * 1.1 as adjusted_revenue
FROM customers
WHERE status = 'active'
ORDER BY customer_name

-- After (non-supported changes)
SELECT 
    customer_id,
    customer_name,
    revenue * 1.2 as adjusted_revenue  -- calculation change (not detected)
FROM customers
WHERE status IN ('active', 'pending')  -- WHERE clause change (not detected)
ORDER BY full_name  -- ORDER BY with renamed column (not detected)

Note: Even though customer_name was structurally renamed to full_name, the lineage diff feature only tracks this change in the SELECT statement itself, not its usage in the ORDER BY clause.

Last updated

Was this helpful?