Paradime Help Docs
Get Started
  • 🚀Introduction
  • 📃Guides
    • Paradime 101
      • Getting Started with your Paradime Workspace
        • Creating a Workspace
        • Setting Up Data Warehouse Connections
        • Managing workspace configurations
        • Managing Users in the Workspace
      • Getting Started with the Paradime IDE
        • Setting Up a dbt™ Project
        • Creating a dbt™ Model
        • Data Exploration in the Code IDE
        • DinoAI: Accelerating Your Analytics Engineering Workflow
          • DinoAI Agent
            • Creating dbt Sources from Data Warehouse
            • Generating Base Models
            • Building Intermediate/Marts Models
            • Documentation Generation
            • Data Pipeline Configuration
            • Using .dinorules to Tailor Your AI Experience
          • Accelerating GitOps
          • Accelerating Data Governance
          • Accelerating dbt™ Development
        • Utilizing Advanced Developer Features
          • Visualize Data Lineage
          • Auto-generated Data Documentation
          • Enforce SQL and YAML Best Practices
          • Working with CSV Files
      • Managing dbt™ Schedules with Bolt
        • Creating Bolt Schedules
        • Understanding schedule types and triggers
        • Viewing Run History and Analytics
        • Setting Up Notifications
        • Debugging Failed Runs
    • Migrating from dbt™ cloud to Paradime
  • 🔍Concepts
    • Working with Git
      • Git Lite
      • Git Advanced
      • Read Only Branches
      • Delete Branches
      • Merge Conflicts
      • Configuring Signed Commits on Paradime with SSH Keys
      • GitHub Branch Protection Guide: Preventing Direct Commits to Main
    • dbt™ fundamentals
      • Getting started with dbt™
        • Introduction
        • Project Strucuture
        • Working with Sources
        • Testing Data Quality
        • Models and Transformations
      • Configuring your dbt™ Project
        • Setting up your dbt_project.yml
        • Defining Your Sources in sources.yml
        • Testing Source Freshness
        • Unit Testing
        • Working with Tags
        • Managing Seeds
        • Environment Management
        • Variables and Parameters
        • Macros
        • Custom Tests
        • Hooks & Operational Tasks
        • Packages
      • Model Materializations
        • Table Materialization
        • View​ Materialization
        • Incremental Materialization
          • Using Merge for Incremental Models
          • Using Delete+Insert for Incremental Models
          • Using Append for Incremental Models
          • Using Microbatch for Incremental Models
        • Ephemeral Materialization
        • Snapshots
      • Running dbt™
        • Mastering the dbt™ CLI
          • Commands
          • Methods
          • Selector Methods
          • Graph Operators
    • Paradime fundamentals
      • Global Search
        • Paradime Apps Navigation
        • Invite users to your workspace
        • Search and preview Bolt schedules status
      • Using --defer in Paradime
      • Workspaces and data mesh
    • Data Warehouse essentials
      • BigQuery Multi-Project Service Account
  • 📖Documentation
    • DinoAI
      • Agent Mode
        • Use Cases
          • Creating Sources from your Warehouse
          • Generating dbt™ models
          • Fixing Errors with Jira
          • Researching with Perplexity
          • Providing Additional Context Using PDFs
      • Context
        • File Context
        • Directory Context
      • Tools and Features
        • Warehouse Tool
        • File System Tool
        • PDF Tool
        • Jira Tool
        • Perplexity Tool
        • Terminal Tool
        • Coming Soon Tools...
      • .dinorules
      • Ask Mode
      • Version Control
      • Production Pipelines
      • Data Documentation
    • Code IDE
      • User interface
        • Autocompletion
        • Context Menu
        • Flexible layout
        • "Peek" and "Go To" Definition
        • IDE preferences
        • Shortcuts
      • Left Panel
        • DinoAI Coplot
        • Search, Find, and Replace
        • Git Lite
        • Bookmarks
      • Command Panel
        • Data Explorer
        • Lineage
        • Catalog
        • Lint
      • Terminal
        • Running dbt™
        • Paradime CLI
      • Additional Features
        • Scratchpad
    • Bolt
      • Creating Schedules
        • 1. Schedule Settings
        • 2. Command Settings
          • dbt™ Commands
          • Python Scripts
          • Elementary Commands
          • Lightdash Commands
          • Tableau Workbook Refresh
          • Power BI Dataset Refresh
          • Paradime Bolt Schedule Toggle Commands
          • Monte Carlo Commands
        • 3. Trigger Types
        • 4. Notification Settings
        • Templates
          • Run and Test all your dbt™ Models
          • Snapshot Source Data Freshness
          • Build and Test Models with New Source Data
          • Test Code Changes On Pull Requests
          • Re-executes the last dbt™ command from the point of failure
          • Deploy Code Changes On Merge
          • Create Jira Tickets
          • Trigger Census Syncs
          • Trigger Hex Projects
          • Create Linear Issues
          • Create New Relic Incidents
          • Create Azure DevOps Items
        • Schedules as Code
      • Managing Schedules
        • Schedule Configurations
        • Viewing Run Log History
        • Analyzing Individual Run Details
          • Configuring Source Freshness
      • Bolt API
      • Special Environment Variables
        • Audit environment variables
        • Runtime environment variables
      • Integrations
        • Reverse ETL
          • Hightouch
        • Orchestration
          • Airflow
          • Azure Data Factory (ADF)
      • CI/CD
        • Turbo CI
          • Azure DevOps
          • BitBucket
          • GitHub
          • GitLab
          • Paradime Turbo CI Schema Cleanup
        • Continuous Deployment with Bolt
          • GitHub Native Continuous Deployment
          • Using Azure Pipelines
          • Using BitBucket Pipelines
          • Using GitLab Pipelines
        • Column-Level Lineage Diff
          • dbt™ mesh
          • Looker
          • Tableau
          • Thoughtspot
    • Radar
      • Get Started
      • Cost Management
        • Snowflake Cost Optimization
        • Snowflake Cost Monitoring
        • BigQuery Cost Monitoring
      • dbt™ Monitoring
        • Schedules Dashboard
        • Models Dashboard
        • Sources Dashboard
        • Tests Dashboard
      • Team Efficiency Tracking
      • Real-time Alerting
      • Looker Monitoring
    • Data Catalog
      • Data Assets
        • Looker assets
        • Tableau assets
        • Power BI assets
        • Sigma assets
        • ThoughtSpot assets
        • Fivetran assets
        • dbt™️ assets
      • Lineage
        • Search and Discovery
        • Filters and Nodes interaction
        • Nodes navigation
        • Canvas interactions
        • Compare Lineage version
    • Integrations
      • Dashboards
        • Sigma
        • ThoughtSpot (Beta)
        • Lightdash
        • Tableau
        • Looker
        • Power BI
        • Streamlit
      • Code IDE
        • Cube CLI
        • dbt™️ generator
        • Prettier
        • Harlequin
        • SQLFluff
        • Rainbow CSV
        • Mermaid
          • Architecture Diagrams
          • Block Diagrams Documentation
          • Class Diagrams
          • Entity Relationship Diagrams
          • Gantt Diagrams
          • GitGraph Diagrams
          • Mindmaps
          • Pie Chart Diagrams
          • Quadrant Charts
          • Requirement Diagrams
          • Sankey Diagrams
          • Sequence Diagrams
          • State Diagrams
          • Timeline Diagrams
          • User Journey Diagrams
          • XY Chart
          • ZenUML
        • pre-commit
          • Paradime Setup and Configuration
          • dbt™️-checkpoint hooks
            • dbt™️ Model checks
            • dbt™️ Script checks
            • dbt™️ Source checks
            • dbt™️ Macro checks
            • dbt™️ Modifiers
            • dbt™️ commands
            • dbt™️ checks
          • SQLFluff hooks
          • Prettier hooks
      • Observability
        • Elementary Data
          • Anomaly Detection Tests
            • Anomaly tests parameters
            • Volume anomalies
            • Freshness anomalies
            • Event freshness anomalies
            • Dimension anomalies
            • All columns anomalies
            • Column anomalies
          • Schema Tests
            • Schema changes
            • Schema changes from baseline
          • Sending alerts
            • Slack alerts
            • Microsoft Teams alerts
            • Alerts Configuration and Customization
          • Generate observability report
          • CLI commands and usage
        • Monte Carlo
      • Storage
        • Amazon S3
        • Snowflake Storage
      • Reverse ETL
        • Hightouch
      • CI/CD
        • GitHub
        • Spectacles
      • Notifications
        • Microsoft Teams
        • Slack
      • ETL
        • Fivetran
    • Security
      • Single Sign On (SSO)
        • Okta SSO
        • Azure AD SSO
        • Google SAML SSO
        • Google Workspace SSO
        • JumpCloud SSO
      • Audit Logs
      • Security model
      • Privacy model
      • FAQs
      • Trust Center
      • Security
    • Settings
      • Workspaces
      • Git Repositories
        • Importing a repository
          • Azure DevOps
          • BitBucket
          • GitHub
          • GitLab
        • Update connected git repository
      • Connections
        • Code IDE environment
          • Amazon Athena
          • BigQuery
          • Clickhouse
          • Databricks
          • Dremio
          • DuckDB
          • Firebolt
          • Microsoft Fabric
          • Microsoft SQL Server
          • MotherDuck
          • PostgreSQL
          • Redshift
          • Snowflake
          • Starburst/Trino
        • Scheduler environment
          • Amazon Athena
          • BigQuery
          • Clickhouse
          • Databricks
          • Dremio
          • DuckDB
          • Firebolt
          • Microsoft Fabric
          • Microsoft SQL Server
          • MotherDuck
          • PostgreSQL
          • Redshift
          • Snowflake
          • Starburst/Trino
        • Manage connections
          • Set alternative default connection
          • Delete connections
        • Cost connection
          • BigQuery cost connection
          • Snowflake cost connection
        • Connection Security
          • AWS PrivateLink
            • Snowflake PrivateLink
            • Redshift PrivateLink
          • BigQuery OAuth
          • Snowflake OAuth
        • Optional connection attributes
      • Notifications
      • dbt™
        • Upgrade dbt Core™ version
      • Users
        • Invite users
        • Manage Users
        • Enable Auto-join
        • Users and licences
        • Default Roles and Permissions
        • Role-based access control
      • Environment Variables
        • Bolt Schedules Environment Variables
        • Code IDE Environment Variables
  • 💻Developers
    • GraphQL API
      • Authentication
      • Examples
        • Audit Logs API
        • Bolt API
        • User Management API
        • Workspace Management API
    • Python SDK
      • Getting Started
      • Modules
        • Audit Log
        • Bolt
        • Lineage Diff
        • Custom Integration
        • User Management
        • Workspace Management
    • Paradime CLI
      • Getting Started
      • Bolt CLI
    • Webhooks
      • Getting Started
      • Custom Webhook Guides
        • Create an Azure DevOps Work item when a Bolt run complete with errors
        • Create a Linear Issue when a Bolt run complete with errors
        • Create a Jira Issue when a Bolt run complete with errors
        • Trigger a Slack notification when a Bolt run is overrunning
    • Virtual Environments
      • Using Poetry
      • Troubleshooting
    • API Keys
    • IP Restrictions in Paradime
    • Company & Workspace token
  • 🙌Best Practices
    • Data Mesh Setup
      • Configure Project dependencies
      • Model access
      • Model groups
  • ‼️Troubleshooting
    • Errors
    • Error List
    • Restart Code IDE
  • 🔗Other Links
    • Terms of Service
    • Privacy Policy
    • Paradime Blog
Powered by GitBook
On this page
  • Overview
  • Key Features
  • Use cases
  • Tutorial
  • Prerequisites
  • Setup Instructions

Was this helpful?

  1. Documentation
  2. Bolt
  3. CI/CD

Column-Level Lineage Diff

PreviousUsing GitLab PipelinesNextdbt™ mesh

Last updated 8 days ago

Was this helpful?

Overview

The Column-Level Lineage Diff Analysis feature in Paradime enables users to understand the blast radius of their changes directly within pull requests (PRs). By leveraging field-level lineage, this CI check identifies changes to columns in your dbt™ models and creates a report for all impacted downstream objects. This includes renaming or removing columns and changes to the underlying logic of columns in your dbt™ models.

When a PR is opened it GitHub, an automated comment is generated listing all downstream nodes. This allows users to understand the changes introduced at a column level and assess the potential impact on downstream dbt™ models. BI dashboards, and other downstream elements.

Key Features

  • Field-Level Lineage: Identify changes to columns in your dbt™ models and generate a detailed report of all impacted downstream objects.

  • Automated Comments: Receive automated comments in your PRs listing all downstream dbt™ models and BI nodes affected by the changes.

  • Impact Assessment: Understand what nodes and other elements might be impacted by the changes introduced in the PR.

Use cases

  • Assess all downstream nodes nodes impacted by changes both within a dbt project and in downstream application (example: BI)

  • For Data Mesh architectures, see how your current project's changes impact other project changes.

Tutorial

Prerequisites

To use theColumn-Level Lineage Diff Analysis features, ensure the following prerequisites are met:

Setup Instructions


Troubleshooting

Unable to find public GitHub email address

If a user GitHub is not configured correctly when opening a PR the user will see the below comment in the Pull Request:


1. Generate API keys

API keys are generate at a workspace level.

To be able to trigger Bolt using the API, you will first need to generate API keys for your workspace. Got to account settings and generate your API keys, make sure to save in your password manager:

  • API key

  • API secret

  • API Endpoint

You will need this later when setting up the secrets in GitLab.

2. Create a Pipeline

Now you will need to create a new .gitlab-ci.yml file in your dbt™️ repository. Copy the code below.

stages:
  - lineage_diff_report

lineage_diff_report:
  stage: lineage_diff_report
  image: python:3.11
  tags:
    - gitlab-org
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      when: always
  variables:
    PARADIME_API_KEY: ${PARADIME_API_KEY}
    PARADIME_API_SECRET: ${PARADIME_API_SECRET}
    PARADIME_API_ENDPOINT: ${PARADIME_API_ENDPOINT}
    GITLAB_TOKEN: ${GITLAB_TOKEN}
    GIT_DEPTH: 0
  before_script:
    - pip install paradime-io==4.5.0
  script: |
    # Echo the relevant environment variables
    echo "Head Commit Hash: $CI_COMMIT_SHA"
    echo "Base Commit Hash: $CI_MERGE_REQUEST_DIFF_BASE_SHA"
    echo "Pull Request Number: $CI_MERGE_REQUEST_IID"
    echo "Repository Full Name: $CI_PROJECT_PATH"
    echo "Pull Request Author Email: $GITLAB_USER_EMAIL"

    # Get the list of changed files in this MR
    export CHANGED_FILES=$(git diff --name-only $CI_MERGE_REQUEST_DIFF_BASE_SHA $CI_COMMIT_SHA)
    echo "Files changed in this MR:"
    echo "$CHANGED_FILES"

    # Generate a random report file name
    export REPORT_FILE="report_$(openssl rand -hex 12).json"
    echo "Report file: $REPORT_FILE"

    # Install  jq
    apt-get update -y && apt-get install -y jq

    # Run the Paradime lineage diff analysis
    python3 -u <<EOF
    import os
    import json
    from pathlib import Path
    from paradime import Paradime

    paradime = Paradime(
      api_key=os.environ['PARADIME_API_KEY'],
      api_secret=os.environ['PARADIME_API_SECRET'],
      api_endpoint=os.environ['PARADIME_API_ENDPOINT']
    )

    report = paradime.lineage_diff.trigger_report_and_wait(
      base_commit_sha=os.environ['CI_MERGE_REQUEST_DIFF_BASE_SHA'],
      head_commit_sha=os.environ['CI_COMMIT_SHA'],
      changed_file_paths=os.environ['CHANGED_FILES'].split('\\n'),
      pull_request_number=int(os.environ['CI_MERGE_REQUEST_IID']),
      repository_name=os.environ['CI_PROJECT_PATH'],
      user_email=os.environ['GITLAB_USER_EMAIL']
    )

    print("Lineage Diff Report Message: ", report.message)

    if report.result_markdown:
      data = json.dumps({'body': report.result_markdown})
      Path(os.environ['REPORT_FILE']).write_text(data)
    EOF

    # Write GitLab MR comment
    if [ -f "$REPORT_FILE" ]; then
      COMMENT_BODY=$(cat "$REPORT_FILE")
      curl --request POST --header "PRIVATE-TOKEN: ${GITLAB_TOKEN}" \
        --header "Content-Type: application/json" \
        --data "$COMMENT_BODY" \
        "https://gitlab.com/api/v4/projects/${CI_PROJECT_ID}/merge_requests/${CI_MERGE_REQUEST_IID}/notes"
    else
      echo "Report file not found, skipping comment."
    fi

3. Create a GitLab Access Token

You will need to create GitLab Access Token with API to enable Comments to be generated when opening a Merge Request.

Create a group access token (Only available on GitLab Premium Tier or Higher)

Navigate to your GitLab project

  1. Go to Settings > Access Tokens in the left sidebar

  2. Enter a name for your token (e.g., "API Comment Access")

  3. Select the appropriate scopes:

    • api (for general API access)

    • Specifically for comments, you'll need at least api scope, which includes the ability to write comments

Create a GitLab personal access token

If using a GitLab personal access token, we suggest creating a new user, called "Paradime" and create the access token attached to this user.

Log in to your GitLab account

  1. Go to your user profile > Preferences > Access Tokens (or navigate to https://gitlab.com/-/profile/personal_access_tokens)

  2. Enter a name for your token (e.g., "API Comment Access")

  3. Select the appropriate scopes:

    • api (for general API access)

    • Specifically for comments, you'll need at least api scope, which includes the ability to write comments

4. Add the API key and Credential in the GitLab variables

Set the corresponding values using your credentials for the variable names:

  • PARADIME_API_KEY

  • PARADIME_API_SECRET

  • PARADIME_API_ENDPOINT

  • GITLAB_TOKEN


1. Generate API keys

API keys are generate at a workspace level.

To be able to trigger Bolt using the API, you will first need to generate API keys for your workspace. Got to account settings and generate your API keys, make sure to save in your password manager:

  • API key

  • API secret

  • API Endpoint

You will need this later when setting up the secrets in Azure pipelines.

2. Create an Azure Pipeline

Now you will need to create a new azure-pipeline.yml file in your dbt™️ repository. Copy the code below.

trigger:
  - none

pr:
  - "*"

steps:
  - task: UsePythonVersion@0
    inputs:
      versionSpec: "3.11"
      addToPath: true
    displayName: Use Python 3.11

  - checkout: self
    fetchDepth: 0
    displayName: Clone repository

  - script: |  
      # Install paradime python sdk
      pip install paradime-io==4.5.0

      # Export Paradime API Credentials
      export PARADIME_API_ENDPOINT=$(PARADIME_API_ENDPOINT)
      export PARADIME_API_KEY=$(PARADIME_API_KEY)
      export PARADIME_API_SECRET=$(PARADIME_API_SECRET)

      # Set the relevant environment variables
      export HEAD_COMMIT_SHA=$(System.PullRequest.SourceCommitId)
      export BASE_COMMIT_SHA=$(git rev-parse origin/$(System.PullRequest.TargetBranchName))
      export PULL_REQUEST_NUMBER=$(System.PullRequest.PullRequestId)
      export REPO_NAME=$(Build.Repository.Name)
      export USER_EMAIL=$(Build.RequestedForEmail)
      
      # Get the list of changed files in this PR
      export CHANGED_FILES=$(git diff --name-only $BASE_COMMIT_SHA $HEAD_COMMIT_SHA)

      # Echo env variables
      echo "Head Commit Hash: $HEAD_COMMIT_SHA"
      echo "Base Commit Hash: $BASE_COMMIT_SHA"
      echo "Pull Request Number: $PULL_REQUEST_NUMBER"
      echo "Repository Full Name: $REPO_NAME"
      echo "Pull Request Author Email: $USER_EMAIL"
      echo "Files changed in this PR: \n $CHANGED_FILES"

      # Generate a random report file name
      export REPORT_FILE="report_$(openssl rand -hex 12).json"
      echo "Report file: $REPORT_FILE"

      python3 -u <<EOF
      import os
      import json
      from pathlib import Path
      from paradime import Paradime

      paradime = Paradime(
        api_key=os.environ['PARADIME_API_KEY'],
        api_secret=os.environ['PARADIME_API_SECRET'],
        api_endpoint=os.environ['PARADIME_API_ENDPOINT']
      )

      report = paradime.lineage_diff.trigger_report_and_wait(
        base_commit_sha=os.environ['BASE_COMMIT_SHA'],
        head_commit_sha=os.environ['HEAD_COMMIT_SHA'],
        changed_file_paths=os.environ['CHANGED_FILES'].split('\\n'),
        pull_request_number=int(os.environ['PULL_REQUEST_NUMBER']),
        repository_name=os.environ['REPO_NAME'],
        user_email=os.environ['USER_EMAIL']
      )

      print("Lineage Diff Report Message: ", report.message)

      if report.result_markdown:
        data = json.dumps({'body': report.result_markdown})
        Path(os.environ['REPORT_FILE']).write_text(report.result_markdown)
      EOF
      if [ -f "$REPORT_FILE" ]; then
        COMMENT=$(cat "$REPORT_FILE")
        echo $COMMENT

        ADO_API=$(echo "$(System.CollectionUri)$(System.TeamProject)/_apis/git/repositories/$(Build.Repository.Name)/pullRequests/$(System.PullRequest.PullRequestId)/threads?api-version=7.1-preview.1")
        PR_COMMENT=$(jq --arg comment "$COMMENT" '.comments[0].content = $comment' <<< '{"comments": [{"parentCommentId": 0,"content": "","commentType": 1}],"status": 1}')
  
        curl --request POST "$ADO_API" \
        --header "Content-Type: application/json" \
        --header "Accept: application/json" \
        --header "Authorization: Bearer $(System.AccessToken)" \
        --data "$PR_COMMENT" \

      else
        echo "Report file not found, skipping comment."
      fi

    displayName: "Generate report and comment on PR"

3. Add the API keys and Credential in the Azure Pipeline variables

Set the corresponding values using your credentials for the variable names:

  • PARADIME_API_KEY

  • PARADIME_API_SECRET

  • PARADIME_API_ENDPOINT

4. Update Build Service User Permissions

You will need to add the Contribute to pull requests permission to enable Comments to be generated when opening the Pull Request.

Go to Project Settings in Azure DevOps

  • Navigate to "Repositories" under Repos

  • Select your repository

  • Click "Security"

  • Find "Project Collection Build Service" or "[Project Name] Build Service"

  • Set "Contribute to pull requests" to allowed


1. Generate API keys

API keys are generate at a workspace level.

To be able to trigger Bolt using the API, you will first need to generate API keys for your workspace. Got to account settings and generate your API keys, make sure to save in your password manager:

  • API key

  • API secret

  • API Endpoint

You will need this later when setting up the secrets in BitBucket.

2. Create a BitBucket Pipeline

Now you will need to create a new bitbucket-pipelines.yml file in your dbt™️ repository. Copy the code block below.

image: python:3.11

definitions:
  steps:
    - step: &paradime-lineage-diff-report
        name: Generate Lineage Diff Report
        size: 2x
        max-time: 30
        script:
          - apt-get update && apt-get install -y curl jq git
          - pip install paradime-io==4.7.1
          - export PARADIME_API_KEY=$PARADIME_API_KEY
          - export PARADIME_API_SECRET=$PARADIME_API_SECRET
          - export PARADIME_API_ENDPOINT=$PARADIME_API_ENDPOINT
          - git fetch origin $BITBUCKET_PR_DESTINATION_BRANCH
          - export HEAD_COMMIT_SHA=$(git rev-parse $BITBUCKET_PR_DESTINATION_COMMIT)
          - export BASE_COMMIT_SHA=$BITBUCKET_COMMIT
          - export PULL_REQUEST_NUMBER=$BITBUCKET_PR_ID
          - export REPO_NAME=$BITBUCKET_REPO_FULL_NAME
          - export USER_EMAIL=$(git log -1 --pretty=format:"%ae" $BITBUCKET_COMMIT)
          - export CHANGED_FILES=$(git diff --name-only $BASE_COMMIT_SHA $HEAD_COMMIT_SHA)
          - echo "Head Commit Hash $HEAD_COMMIT_SHA"
          - echo "Base Commit Hash $BASE_COMMIT_SHA"
          - echo "Pull Request Number $PULL_REQUEST_NUMBER"
          - echo "Repository Full Name $REPO_NAME"
          - echo "Pull Request Author Email $USER_EMAIL"
          - echo "Files changed in this PR $CHANGED_FILES"
          - export REPORT_FILE="report_$(openssl rand -hex 12).json"
          - echo "Report file $REPORT_FILE"
          - |
            cat > generate_report.py << 'EOL'
            import os
            import json
            from pathlib import Path
            from paradime import Paradime

            def format_json_to_markdown(report_json):
                """
                Convert the JSON report into formatted markdown for Bitbucket comments.
                """
                formatted_output = []
                
                # 1. Title and Warning
                formatted_output.append("# Paradime Column-Level Lineage Analysis\n")
                formatted_output.append("⚠️ The changes included in this pull request may have downstream impacts:\n")
                
                # 2. Table headers and data
                formatted_output.append("| File changed | dbt nodes | dbt projects | Tableau | Looker | ThoughtSpot |")
                formatted_output.append("| :- | -: | -: | -: | -: | -: |")
                
                # Process each file's impact summary
                for file_path, file_data in report_json["changed_files"].items():
                    summary = file_data["impact_summary"]
                    row = [
                        file_path,
                        f"**{summary['dbt_nodes_count']}**",
                        f"**{summary['dbt_projects_count']}**",
                        f"**{summary['tableau_nodes_count']}**",
                        f"**{summary['looker_nodes_count']}**",
                        f"**{summary['thoughtspot_nodes_count']}**"
                    ]
                    formatted_output.append(f"| {' | '.join(row)} |")
                
                # 3. Impact Details Section
                formatted_output.append("\n## 📁 Impact by Changed Files\n")
                formatted_output.append("Expand to view the potential downstream impact created by each changed file:\n")
                
                # Process each file's detailed impacts
                for file_path, file_data in report_json["changed_files"].items():
                    # File header
                    formatted_output.append(f"\n### {file_path}\n")
                    
                    impacted_nodes = file_data["impacted_nodes"]
                    
                    # DBT impacts
                    if impacted_nodes["dbt"]["nodes"]:
                        formatted_output.append(f"**🔶 dbt nodes impacted**: \n")
                        for node in impacted_nodes["dbt"]["nodes"]:
                            formatted_output.append(f"- {node['model_name']}.*{node['column_name']}*\n")
                    
                    # Looker impacts
                    if any([
                        impacted_nodes["looker"]["views"],
                        impacted_nodes["looker"]["explores"],
                        impacted_nodes["looker"]["dashboards"]
                    ]):
                        formatted_output.append(f"**📊 Looker nodes impacted**\n")
                        
                        if impacted_nodes["looker"]["views"]:
                            formatted_output.append("**Views:**\n")
                            for view in impacted_nodes["looker"]["views"]:
                                node_text = f"{view['model_name']}.{view['view_id']}.**{view['field_id']}**"
                                if 'url' in view and view['url']:
                                    formatted_output.append(f"- [{node_text}]({view['url']})\n")
                                else:
                                    formatted_output.append(f"- {node_text}\n")
                        
                        if impacted_nodes["looker"]["explores"]:
                            formatted_output.append("**Explores:**\n")
                            for explore in impacted_nodes["looker"]["explores"]:
                                node_text = f"{explore['model_name']}.{explore['explore_id']}.**{explore['field_id']}**"
                                if 'url' in explore and explore['url']:
                                    formatted_output.append(f"- [{node_text}]({explore['url']})\n")
                                else:
                                    formatted_output.append(f"- {node_text}\n")
                        
                        if impacted_nodes["looker"]["dashboards"]:
                            formatted_output.append("**Dashboards:**\n")
                            for dashboard in impacted_nodes["looker"]["dashboards"]:
                                if 'url' in dashboard and dashboard['url']:
                                    formatted_output.append(f"- [{dashboard['name']}]({dashboard['url']})\n")
                                else:
                                    formatted_output.append(f"- {dashboard['name']}\n")

                    # Tableau impacts
                    if any([
                        impacted_nodes["tableau"]["dashboards"],
                        impacted_nodes["tableau"]["datasources"],
                        impacted_nodes["tableau"]["worksheets"]
                    ]):
                        formatted_output.append(f"**📊 Tableau nodes impacted**\n")
                        
                        if impacted_nodes["tableau"]["dashboards"]:
                            formatted_output.append("**Dashboards:**\n")
                            for dashboard in impacted_nodes["tableau"]["dashboards"]:
                                if 'url' in dashboard and dashboard['url']:
                                    formatted_output.append(f"- [{dashboard['name']}]({dashboard['url']})\n")
                                else:
                                    formatted_output.append(f"- {dashboard['name']}\n")
                        
                        if impacted_nodes["tableau"]["datasources"]:
                            formatted_output.append("**Datasources:**\n")
                            for datasource in impacted_nodes["tableau"]["datasources"]:
                                if 'url' in datasource and datasource['url']:
                                    formatted_output.append(f"- [{datasource['name']}]({datasource['url']})\n")
                                else:
                                    formatted_output.append(f"- {datasource['name']}\n")
                        
                        if impacted_nodes["tableau"]["worksheets"]:
                            formatted_output.append("**Worksheets:**\n")
                            for worksheet in impacted_nodes["tableau"]["worksheets"]:
                                if 'url' in worksheet and worksheet['url']:
                                    formatted_output.append(f"- [{worksheet['name']}]({worksheet['url']})\n")
                                else:
                                    formatted_output.append(f"- {worksheet['name']}\n")

                    # ThoughtSpot impacts
                    if any([
                        impacted_nodes["thoughtspot"]["answers"],
                        impacted_nodes["thoughtspot"]["liveboards"],
                        impacted_nodes["thoughtspot"]["visualizations"],
                        impacted_nodes["thoughtspot"]["worksheets"]
                    ]):
                        formatted_output.append(f"**📊 ThoughtSpot nodes impacted**\n")
                        
                        if impacted_nodes["thoughtspot"]["answers"]:
                            formatted_output.append("**Answers:**\n")
                            for answer in impacted_nodes["thoughtspot"]["answers"]:
                                if 'url' in answer and answer['url']:
                                    formatted_output.append(f"- [{answer['name']}]({answer['url']})\n")
                                else:
                                    formatted_output.append(f"- {answer['name']}\n")
                        
                        if impacted_nodes["thoughtspot"]["liveboards"]:
                            formatted_output.append("**Liveboards:**\n")
                            for liveboard in impacted_nodes["thoughtspot"]["liveboards"]:
                                if 'url' in liveboard and liveboard['url']:
                                    formatted_output.append(f"- [{liveboard['name']}]({liveboard['url']})\n")
                                else:
                                    formatted_output.append(f"- {liveboard['name']}\n")
                        
                        if impacted_nodes["thoughtspot"]["visualizations"]:
                            formatted_output.append("**Visualizations:**\n")
                            for viz in impacted_nodes["thoughtspot"]["visualizations"]:
                                if 'url' in viz and viz['url']:
                                    formatted_output.append(f"- [{viz['name']}]({viz['url']})\n")
                                else:
                                    formatted_output.append(f"- {viz['name']}\n")
                        
                        if impacted_nodes["thoughtspot"]["worksheets"]:
                            formatted_output.append("**Worksheets:**\n")
                            for worksheet in impacted_nodes["thoughtspot"]["worksheets"]:
                                if 'url' in worksheet and worksheet['url']:
                                    formatted_output.append(f"- [{worksheet['name']}]({worksheet['url']})\n")
                                else:
                                    formatted_output.append(f"- {worksheet['name']}\n")
                
                return '\n'.join(formatted_output)

            paradime = Paradime(
                api_key=os.environ["PARADIME_API_KEY"],
                api_secret=os.environ["PARADIME_API_SECRET"],
                api_endpoint=os.environ["PARADIME_API_ENDPOINT"]
            )

            report = paradime.lineage_diff.trigger_report_and_wait(
                base_commit_sha=os.environ["BASE_COMMIT_SHA"],
                head_commit_sha=os.environ["HEAD_COMMIT_SHA"],
                changed_file_paths=os.environ["CHANGED_FILES"].split("\n"),
                pull_request_number=int(os.environ["PULL_REQUEST_NUMBER"]),
                repository_name=os.environ["REPO_NAME"],
                user_email=os.environ["USER_EMAIL"]
            )

            print("Lineage Diff Report Message: ", report.message)

            if report.result_json:
                # Convert JSON string to Python dict and format as markdown
                report_data = json.loads(report.result_json)
                formatted_markdown = format_json_to_markdown(report_data)
                Path(os.environ["REPORT_FILE"]).write_text(formatted_markdown)
            EOL
          - python3 generate_report.py
          - |
            if [ -f "$REPORT_FILE" ]; then 
              # Create the JSON payload with properly escaped content
              FORMATTED_CONTENT=$(cat "$REPORT_FILE")
              JSON_PAYLOAD=$(jq -n \
                --arg content "$FORMATTED_CONTENT" \
                '{content:{raw:$content}}')
              
              # Send to Bitbucket API
              curl -X POST \
                "https://api.bitbucket.org/2.0/repositories/$BITBUCKET_REPO_FULL_NAME/pullrequests/$BITBUCKET_PR_ID/comments" \
                -H "Authorization: Bearer $BITBUCKET_ACCESS_TOKEN" \
                -H "Content-Type: application/json" \
                -d "$JSON_PAYLOAD"
            else 
              echo "Report file not found, skipping comment."
            fi

pipelines:
  pull-requests:
    '**':
      - step:
          <<: *paradime-lineage-diff-report
          variables:
            PARADIME_API_KEY: $PARADIME_API_KEY
            PARADIME_API_SECRET: $PARADIME_API_SECRET
            PARADIME_API_ENDPOINT: $PARADIME_API_ENDPOINT
                    

3. Generate BitBucket Access Token

You will need to create a BitBucket Access Token with pull_request:write to enable Comments to be generated when opening a Pull Request.

  1. Select your BitBucket Repository

  2. Click "Repository Settings" in the left sidebar

  3. In Settings, scroll to "Security" section

  4. Select "Access tokens"

  5. Create New Access Token

  6. Click "Create repository access token"

  7. Enter the following details:

    • Name: Paradime

    • Permissions: Select Pull requests: Write

4. Add the API keys and Credentials in the BitBucket Pipeline variables

Set the corresponding values using your credentials for the variable names:

  • PARADIME_API_KEY

  • PARADIME_API_SECRET

  • PARADIME_API_ENDPOINT

  • BITBUCKET_ACCESS_TOKEN

Git Integration: Install the Paradime GitHub app and authorize access to the dbt™ repository used in Paradime or use alternative methods based on your Git Provider. See .

Production Connection: Add a production connection with access to your sources and models generated when running production jobs. This allows Paradime to run information schema queries and build field-level lineage. See for instructions based on your data warehouse provider.

Have at least one Bolt Schedule configured. This is required to generate field-level lineage for your dbt™ project. See for configuration.

To get the most value out of Lineage Diff Analysis, connect you BI tools (, , , etc.) to see all downstream nodes impacted.

GitHub Integration: Install the Paradime GitHub app and authorize access to the dbt™ repository used in Paradime. See for instructions.

To fix this issue, make sure the user opening the Pull Request completed the in Paradime.

Finally you need to add the API key and Credentials generated in the in GitLab CI/CD pipelines as well as the .

Finally you need to add the API key and credentials generated in the in Azure Pipelines.

Go to and sign in:

Finally you need to add the API key and credentials generated in the in BitBucket Pipelines as well as the .

📖
connection guide
Bolt Scheduler
Tableau
Thoughtspot
Looker
installation guide
API Keys
API Keys
API Keys
bitbucket.org
setup instructions
previous step
GitLab Token
previous step
previous step
BitBucket Access Token
#github-user-level-oauth