Metadata

Overview

The Paradime Python SDK Metadata module helps you query and analyze dbt™ metadata from Bolt runs.

Use it to monitor model health, analyze dbt test results, track source freshness, and explore upstream/downstream lineage. It uses a local DuckDB database and returns results as high-performance polars DataFrames.

  • Query dbt artifacts like manifest.json, run_results.json, and source freshness results

  • Build reliability checks around failing models, failing tests, and stale sources

  • Run ad-hoc SQL with query_sql() against the DuckDB metadata schema

circle-info

The Metadata SDK is available in Python SDK version 4.18.0+ and requires a Paradime account with access to dbt metadata.

Quick Start

Basic Setup

from paradime.client.paradime_client import Paradime

client = Paradime(
    api_endpoint="your-api-endpoint",
    api_key="your-api-key",
    api_secret="your-api-secret"
)

metadata = client.metadata

Simple Health Check

Model Health Monitoring

Get Model Health Status

Monitor the health of all models in a dbt run, including execution status, test results, and performance metrics.

Filter by Health Status

Test Results

Source Freshness

circle-exclamation

Custom SQL Queries

The metadata SDK exposes a DuckDB database you can query directly using SQL. Results are returned as polars DataFrames.

circle-info

query_sql returns a polars.DataFrame. Use .iter_rows(named=True) to iterate with dict-style row access.

Dependency Analysis

Upstream Dependencies

Downstream Impact

Health Dashboard

Advanced Features

Performance Summary

Streaming Large Datasets

For large metadata datasets, stream results in batches to avoid memory pressure.

Cache Management

Configuration

Custom Database Connection

By default the SDK uses an in-memory DuckDB instance. Use a file path for persistent storage across sessions.

Complete Example

Database Schema

The Metadata SDK loads dbt artifacts into a local DuckDB database. You can query any of these tables directly via query_sql.

Core Tables

dbt_run_results

Main table containing execution results for all dbt resources (models, tests, seeds, snapshots).

Column
Type
Description

unique_id

VARCHAR

Unique identifier for the dbt resource

name

VARCHAR

Resource name

resource_type

VARCHAR

model, test, seed, snapshot

status

VARCHAR

success, error, fail, skipped

execution_time

DOUBLE

Execution time in seconds

executed_at

TIMESTAMP

When the resource was executed

schedule_name

VARCHAR

Schedule identifier

depends_on

VARCHAR[]

Array of upstream dependency IDs

error_message

TEXT

Error details if execution failed

schema_name

VARCHAR

Database schema name

database_name

VARCHAR

Database name

materialized_type

VARCHAR

Model materialization type

description

TEXT

Resource description from dbt

tags

VARCHAR[]

Array of dbt tags

meta

JSON

dbt meta configuration

owner

VARCHAR

Resource owner

compiled_sql

TEXT

Compiled SQL

raw_sql

TEXT

Raw SQL from dbt files

columns

JSON

Column information

children_l1

VARCHAR[]

Direct child dependencies

parents_models

VARCHAR[]

Parent model dependencies

parents_sources

VARCHAR[]

Parent source dependencies

dbt_source_freshness_results

Column
Type
Description

unique_id

VARCHAR

Unique source identifier

source_name

VARCHAR

dbt source name

table_name

VARCHAR

Source table name

schedule_name

VARCHAR

Schedule identifier

freshness_status

VARCHAR

pass, warn, error

max_loaded_at

TIMESTAMP

Last data load timestamp

snapshotted_at

TIMESTAMP

When freshness was checked

hours_since_load

DOUBLE

Hours since last data load

error_after_hours

INTEGER

Error threshold in hours

warn_after_hours

INTEGER

Warning threshold in hours

database

VARCHAR

Source database

schema_name

VARCHAR

Source schema

description

TEXT

Source description

meta

JSON

Source metadata

tags

VARCHAR[]

Source tags

model_metadata

Extended model metadata and lineage information from manifest.json.

Column
Type
Description

unique_id

VARCHAR

Model unique identifier

name

VARCHAR

Model name

resource_type

VARCHAR

Always model

depends_on

VARCHAR[]

Direct dependencies

config

JSON

dbt model configuration

tags

VARCHAR[]

Model tags

meta

JSON

Model metadata

description

TEXT

Model description

columns

JSON

Model column definitions

parents

VARCHAR[]

All parent resources

children

VARCHAR[]

All child resources

original_file_path

VARCHAR

Path to dbt model file

Specialized Tables

Table
Description

dbt_test_data

Detailed test execution results with column-level info

dbt_seed_data

Seed file load results

dbt_snapshot_data

Snapshot execution results

dbt_exposure_data

dbt exposure metadata

Optimized Views

Pre-calculated model health with test counts.

Example Queries

API Reference

MetadataClient

Method
Description
Returns

get_model_health(schedule_name)

Health status of all models

List[ModelHealth]

get_failing_models(schedule_name)

Models with failed tests

List[ModelHealth]

get_slowest_models(schedule_name, limit=10)

Slowest running models

List[ModelHealth]

get_test_results(schedule_name, failed_only=False)

Test results

List[TestResult]

get_source_freshness(schedule_name)

Source freshness status

List[SourceFreshness]

get_health_dashboard(schedule_name)

Aggregated health metrics

HealthDashboard

get_performance_summary(schedule_name, days=7)

Performance metrics

PerformanceMetrics

get_upstream_health(model_name, schedule_name, max_depth=10)

Upstream dependencies

List[ModelDependency]

get_downstream_impact(model_name, schedule_name, max_depth=10)

Downstream impact

DependencyImpact

get_model_health_stream(schedule_name, batch_size=100)

Stream models in batches

Iterator[List[ModelHealth]]

query_sql(sql, schedule_name, parameters=None)

Custom SQL query

polars.DataFrame

refresh_metadata(schedule_name)

Force refresh from Bolt

None

clear_cache(schedule_name=None)

Clear in-memory cache

None

close()

Close the DuckDB connection

None

Data Models

Troubleshooting

Common Issues

Error Handling

circle-check

Last updated

Was this helpful?