Lineage Diff

Overview

The Lineage Diff Analysis feature in Paradime enables users to understand the blast radius of their changes directly within pull requests (PRs). By leveraging field-level lineage, this CI check identifies changes to columns in your dbt™ models and creates a report for all impacted downstream objects. This includes renaming or removing columns and changes to the underlying logic of columns in your dbt™ models.

When a PR is opened it GitHub, an automated comment is generated listing all downstream nodes. This allows users to understand the changes introduced at a column level and assess the potential impact on downstream dbt™ models. BI dashboards, and other downstream elements.

Key Features

  • Field-Level Lineage: Identify changes to columns in your dbt™ models and generate a detailed report of all impacted downstream objects.

  • Automated Comments: Receive automated comments in your PRs listing all downstream dbt™ models and BI nodes affected by the changes.

  • Impact Assessment: Understand what nodes and other elements might be impacted by the changes introduced in the PR.

Setup Instructions

Prerequisites

To use the Lineage Diff Analysis features, ensure the following prerequisites are met:

  1. GitHub Integration: Install the Paradime GitHub app and authorize access to the dbt™ repository used in Paradime. See installation guide for instructions.

  2. Production Connection: Add a production connection with access to your sources and models generated when running production jobs. This allows Paradime to run information schema queries and build field-level lineage. See connection guide for instructions based on your data warehouse provider.

  3. Have at least one Bolt Schedule configured. This is required to generate field-level lineage for your dbt™ project. See Bolt Scheduler for configuration.

To get the most value out of Lineage Diff Analysis, connect you BI tools (Tableau, Thoughtspot, Looker, etc.) to see all downstream nodes impacted.

Tutorial

Use cases

  • Assess all downstream nodes nodes impacted by changes both within a dbt project and in downstream application (example: BI)

  • For Data Mesh architectures, see how your current project's changes impact other project changes.

Troubleshooting

Unable to find public GitHub email address

If a user GitHub is not configured correctly when opening a PR the user will see the below comment in the Pull Request:

Last updated