Python Data Pipeline using dltHub - Google Sheets to Snowflake
This guide will walk you through setting up automated data pipelines that extract data from Google Sheets and load it into Snowflake using Paradime and dltHub (a Python data loading tool).
By the end, you'll be able to:
Set up a Python pipeline project alongside your dbt project
Extract data from Google Sheets automatically
Load that data into Snowflake
Run pipelines both in development and production
Prerequisites
Before starting, make sure you have:
Paradime account with access to a workspace
Google Cloud Platform (GCP) account for Google Sheets API access
Snowflake credentials with permissions to create schemas and tables
Google spreadsheet with some data
Part 1: Understanding the Project Structure
Your project will look like this:
your-project/
├── dbt_project.yml # Your existing dbt project configuration
├── pyproject.toml # Python dependency management (we'll create this)
├── python_dlt/ # Folder for your Python pipelines
│ ├── gsheet_pipeline.py # Your pipeline script (we'll create this)
│ └── ... # dltHub scaffoldings (auto-generated)
└── models/ # Your existing dbt modelsPart 2: Set Up Google Sheets API Access
Why do we need this?
To read data from Google Sheets programmatically, you need API credentials from Google Cloud Platform.
Steps:
Go to Google Cloud Console
Visit console.cloud.google.com
Sign in with your Google account
Create a Service Account (if you don't have one)
In the left menu, go to IAM & Admin → Service Accounts
Click + CREATE SERVICE ACCOUNT
Give it a name like "paradime-sheets-reader"
Click CREATE AND CONTINUE
Skip the optional steps and click DONE
Enable Google Sheets API
In the search bar at the top, type "Google Sheets API"
Click on it and press ENABLE
Create API Credentials
Go back to IAM & Admin → Service Accounts
Find your service account and click the three dots (⋮) under Actions
Select Manage Keys
Click ADD KEY → Create new key
Choose JSON format
Click CREATE - a JSON file will download automatically
Share Your Google Sheet
Open the JSON file you just downloaded
Find the
client_emailfield (looks like:[email protected])Copy this email address
Go to your Google Sheet and click Share
Paste the service account email and give it Viewer access
Extract Credentials from JSON
Open the downloaded JSON file. You'll need these four values:
Encode the Private Key
The private key needs to be encoded in Base64 format.
On Mac/Linux:
On Windows (PowerShell):
Save this encoded value - you'll use it as
B64_BIGQUERY_PRIVATE_KEY.
Part 3: Prepare Snowflake Credentials
Why RSA Key Authentication?
For security, we use key-pair authentication instead of passwords for automated pipelines.
Steps:
Generate RSA Key Pair (if you don't have one)
Register Public Key in Snowflake
To get the key value, open rsa_key.pub and copy everything between the header/footer lines.
Format Private Key for Paradime
Open rsa_key.p8 in a text editor:
Original:
Formatted (remove headers and join into one line):
Gather Snowflake Connection Details
You'll need:
Account identifier: Found in your Snowflake URL (e.g.,
xy12345.us-east-1)Database name: The database where data will be loaded
Warehouse name: The compute warehouse to use
Role: Your Snowflake role (e.g.,
ACCOUNTADMIN,SYSADMIN)Username: Your Snowflake username
Passphrase: [optional] if set when generating the private key
Part 4: Configure Environment Variables in Paradime
Environment variables are secure ways to store credentials without hardcoding them in your scripts.
Where to Set Them:
You need to set these in TWO places in Paradime:
Code IDE (for development)
Click Settings → Environment Variables
Scroll to " Code IDE"
Bolt Scheduler (for production runs)
Click Settings → Environment Variables
Scroll to " Bolt"
Variables to Set:
Google Sheets Credentials
SOURCES__GOOGLE_SHEETS__CREDENTIALS__PROJECT_ID
From JSON: project_id
my-project-12345
SOURCES__GOOGLE_SHEETS__CREDENTIALS__TOKEN_URI
From JSON: token_uri
https://oauth2.googleapis.com/token
B64_BIGQUERY_PRIVATE_KEY
Base64-encoded private key from Step 2.7
LS0tLS1CRUdJTi...
Snowflake Credentials
DESTINATION__SNOWFLAKE__CREDENTIALS__HOST
Snowflake account identifier
xy12345.us-east-1
DESTINATION__SNOWFLAKE__CREDENTIALS__DATABASE
Target database
ANALYTICS
DESTINATION__SNOWFLAKE__CREDENTIALS__WAREHOUSE
Warehouse name
COMPUTE_WH
DESTINATION__SNOWFLAKE__CREDENTIALS__ROLE
Your role
TRANSFORMER
DESTINATION__SNOWFLAKE__CREDENTIALS__PRIVATE_KEY
Formatted private key (from Step 3.3)
MIIFHDBOBgkq...
DESTINATION__SNOWFLAKE__CREDENTIALS__PRIVATE_KEY_PASSPHRASE
Passphrase you created
MySecurePass123!
Development Variable
DEV_SCHEMA_PREFIX
Your initials or identifier
JD
Part 5: Initialize Your Python Project
Now we'll set up the project structure and dependencies.
Step 5.1: Create pyproject.toml
pyproject.tomlIn Paradime's Code IDE, at the root level (same folder as dbt_project.yml), create a new file called pyproject.toml:
What this does:
Poetry is a Python dependency manager (like npm for JavaScript)
We're specifying we need Python 3.11 or 3.12
We're installing
dltwith Snowflake support and Google API client
Step 5.2: Install Dependencies
Open the terminal in Paradime's Code IDE and run:
This will:
Create a virtual environment
Install dltHub and all required packages
Generate a
poetry.lockfile (commit this to git!)
Step 5.3: Initialize dltHub
Create the python_dlt folder and initialize the Google Sheets pipeline:
What this command does:
Creates
.dlt/folder with configuration filesDownloads the Google Sheets source code
Creates example pipeline script
Adds a
.gitignorefile
Note: We're using environment variables instead of .dlt/secrets.toml for security, so you can ignore that file.
Part 6: Create Your Pipeline Script
Replace the example google_sheets_pipeline.py or create a new file called gsheet_pipeline.py:
Understanding the Code
Function Breakdown:
setup_credentials(): Handles the Google API authenticationRetrieves the encoded private key
Decodes it from Base64
Sets it in the format dltHub expects
get_dataset_name(): Smart schema namingIn development: Creates
{YOUR_INITIALS}_GOOGLE_SHEETS_LOADIn production: Creates
GOOGLE_SHEETS_LOADPrevents dev data from mixing with prod data
load_pipeline(): The main extraction and loading logicSets up credentials
Creates a dltHub pipeline object
Configures the Google Sheets source
Runs the extraction and loading
if __name__ == "__main__": Configuration sectionThis is where YOU specify which Google Sheet to load
And which specific sheets/ranges to extract
Part 7: Run Your First Pipeline
Development Run (Testing)
Open Paradime's Code IDE terminal
Navigate to the pipeline folder:
Run the pipeline using Poetry:
What happens:
The script executes in the Poetry virtual environment
Data is extracted from your Google Sheet
Tables are created in Snowflake under
{YOUR_PREFIX}_GOOGLE_SHEETS_LOADschemaYou'll see progress logs in the terminal
Expected Output:
Production Run (Scheduled)
In Paradime, go to Bolt → Schedules
Create a new schedule: name: "Load Google Sheets Data"
Add Commands:
Schedule: Choose frequency (e.g., "Daily at 6 AM")
Save and enable the schedule

What happens:
Paradime automatically sets
PARADIME_SCHEDULE_RUN_IDData loads into
GOOGLE_SHEETS_LOADschema (without your prefix)Your dbt models can now reference this data
Runs automatically on your chosen schedule
Part 8: Using the Loaded Data in dbt
Now that data is in Snowflake, you can create dbt models to transform it:
Define the source in sources.yml:
Troubleshooting Common Issues
Best Practices
1. Schema Organization
Development:
Use personal prefixes (
JD_,SARAH_, etc.)Prevents conflicts when multiple developers test
Easy to identify and clean up dev data
Production:
Use clean, standard names (
GOOGLE_SHEETS_LOAD)Apply proper access controls
Document schema purposes
2. Pipeline Configuration
Keep it flexible:
Version control your changes:
Commit
pyproject.tomlandpoetry.lockTrack pipeline scripts in git
Document any configuration changes
3. Error Handling
Add try-except blocks for robustness:
4. Monitoring
Things to track:
Pipeline run duration
Number of rows loaded
Data freshness in Snowflake
Failed run alerts
Example logging:
5. Incremental Loading
For large sheets, configure incremental loading to only extract new data:
Next Steps
Now that you have a working pipeline:
Add More Data Sources
Check dltHub documentation for other verified sources
Initialize additional pipelines:
dlt init [source] snowflake
Build Transformations
Create dbt™. models that use your loaded data
Add tests and documentation
Build downstream analytics models
Automate Everything
Schedule your pipeline in Bolt
Set up dbt™ to run after data loads
Create alerts for pipeline failures
Explore Advanced Features
Incremental loading strategies
Schema evolution handling
Custom data transformations in Python
Multiple destination support
Additional Resources
dltHub Documentation: dlthub.com/docs
dltHub Verified Sources: dlthub.com/docs/verified-sources
Google Sheets API: developers.google.com/sheets
Poetry Documentation: python-poetry.org/docs
Last updated
Was this helpful?