PII Anonymization with dbtβ’ Mesh Setup
Overview
This document demonstrates how to set up a dbt mesh architecture using Paradime where a parent repository contains PII (Personally Identifiable Information) models, and a child dbt project consumes anonymized subsets of these models.
Architecture
Parent Repo (customer-data-platform)
βββ PII Models (private)
βββ Anonymized Models (public via mesh)
βββ Data transformations
Child Repo (analytics-workspace)
βββ Consumes anonymized models from parent
βββ Creates analytics models
βββ Business intelligence layer
Parent Repository Setup
1. Project Structure
# dbt_project.yml (Parent)
name: 'customer_data_platform'
version: '1.0.0'
config-version: 2
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
models:
customer_data_platform:
# Private PII models - not exposed
staging:
+materialized: table
+group: private_data
# Public anonymized models - exposed via mesh
marts:
anonymized:
+materialized: table
+group: public_analytics
+access: public
2. Model Groups Configuration
3. Private PII Models
4. Public Anonymized Models (Exposed via Mesh)
Child Repository Setup
1. Paradime Mesh Dependencies Configuration
2. Project Configuration
3. Consuming Parent Models
4. Business Intelligence Models
Paradime Configuration
1. Producer Project Setup
Prerequisites:
dbt version 1.7 or greater in both projects
At least one successful Bolt schedule run in the producer project
Models with
access: publicconfiguration
Producer project requirements:
Ensure you have a Bolt schedule running (e.g., daily_production_run) This is required for Paradime to fetch model metadata
2. Consumer Project API Credentials Setup
Step 1: Generate API credentials in the producer project
Navigate to the producer project (
customer_data_platform)Go to Settings β API Keys
Generate API credentials with "Bolt schedules metadata viewer" capability
Note down: API Key, API Secret, and API Endpoint
Step 2: Set Workspace-level Environment Variables (for Bolt schedules) In the consumer project workspace settings, add:
Step 3: Set User-level Environment Variables (for Code IDE) Each developer in the consumer project must set the same environment variables in their Code IDE settings:
3. Model Referencing in Consumer Project
Always use the two-argument ref function when referencing models from the producer project:
Security Considerations
Access Control
PII models are in
private_datagroup with no public accessOnly anonymized models in
public_analyticsgroup are exposedChild projects can only access explicitly exposed models
Testing Strategy
1. Parent Project Tests
2. Child Project Tests
Best Practices
Regular Security Audits: Review anonymized models quarterly
Change Management: Use PR reviews for any changes to public models
Documentation: Keep anonymization logic well-documented
Testing: Implement comprehensive tests for PII detection
Monitoring: Set up alerts for mesh model failures
Version Control: Tag releases when exposing new models
Troubleshooting
Common Issues
Model not found in child: Check access configuration and group assignment
PII exposure: Review anonymization logic and add tests
Stale data: Monitor upstream model runs in parent project
Permission errors: Verify Paradime project dependency configuration
Debug Commands
Common Issues and Solutions:
"Model not found" errors
Verify
dbt_loom.config.ymlconfigurationCheck that environment variables are set correctly
Ensure the Bolt schedule has run successfully in producer project
Confirm model has
access: publicin producer project
API authentication errors
Verify API credentials are correctly set at both workspace and user levels
Check API key permissions include "Bolt schedules metadata viewer"
Ensure API endpoint URL is correct
Stale metadata
Producer project must have successful Bolt schedule runs
Paradime fetches metadata from the specified schedule name
If producer models change, wait for next Bolt schedule run
Model access denied
Check model
accessconfiguration in producer projectOnly
publicmodels are available through meshVerify model is in correct group with appropriate access level
This setup ensures that sensitive PII remains secure in the parent repository while providing rich, anonymized datasets for analytics in the child projects through Paradime's dbt mesh capabilities.
Related Docs:
Last updated
Was this helpful?