BigQuery cost connection

IP RESTRICTIONS

Make sure to allow traffic from one of the Paradime IPs in your firewall depending on the data location selected.

BigQuery setup guide

1. Creating a Service Account User in BigQuery

Follow these steps to create a new service account user for Paradime and grant the required permissions at the project level:

Go to Google Cloud Console and navigate to IAM & Admin:
- Select your project from the project selector at the top.
- Go to the "IAM & Admin" section.
Create Service Account:
- Click on "Service Accounts".
- Click "Create Service Account".
- Enter the Service Account name (e.g., paradime-radar-user).
- Click "Create and Continue".
Assign Roles:
- Assign the necessary roles to the service account:
  - BigQuery Resource Viewer
  - BigQuery Job User
Complete and Save:
- Click "Done" after assigning the roles.
- The service account will now be created with the assigned permissions.
Generate Key:
- Go to the "Keys" tab and click "Add Key" -> "Create New Key".
- Choose JSON format and save the key file securely. (We will need this later to connect BigQuery to Paradime.)

2. Create the BigQuery Dataset

Make sure to name the BigQuery Dataset as paradime_cost_analytics

To set up your BigQuery Project for Paradime, follow these steps:

Go to Google Cloud Console, navigate to BigQuery and select your project.
Create a Dataset:
- Name: paradime_cost_analytics
- Select the appropriate Dataset region (this will be needed for the Paradime connection setup).
Grant Access:
- Assign the BigQuery User and BigQuery Data Editor roles to the service account user created previously, ensuring these permissions are at the Dataset level.

3. Setting up a Logs Routing Sink

To create a log sink in Google Cloud, you need the "Logs Configuration Writer" (roles/logging.configWriter) IAM role on the project where you're creating the sink.

Make sure to use paradime_cost_analytics as the destination dataset for the Logs Routing Sink.

To set up your Logs Routing Sink for Paradime, follow these steps:

In the Google Cloud console, go to the Log Router page. If you use the search bar to find this page, then select the result whose subheading is Logging.
Select the Google Cloud project in which the log entries that you want to route originate.
Select Create sink.
In the Sink details panel, enter the following details:
- Sink name: Provide an identifier for the sink; note that after you create the sink, you can't rename the sink but you can delete it and create a new sink.
- Sink description (optional): Describe the purpose or use case for the sink.
In the Sink destination panel, select the sink service and destination by using the Select sink service menu.
1. Select BigQuery dataset: Select the data set name paradime_cost_analytics (created in the previous step) to receive the routed log entries.
2. Select the option ot partition tables
Go to the Choose logs to include in sink panel and specify the log entries to include:

resource.type="bigquery_resource"
protoPayload.methodName="jobservice.jobcompleted"

Select Create sink.

Set the partition expiration

After the Logs Router Sink table has been created, you can setup an expiration for partitions older than 7 days.

ALTER TABLE paradime_cost_analytics.cloudaudit_googleapis_com_data_access
  SET OPTIONS (
    -- Sets partition expiration to 7 days
    partition_expiration_days = 7);

Connecting multiple BigQuery Projects to Paradime?

If you plan to connect multiple BigQuery Projects to Paradime, follow these steps for each project:

1. Grant Access to the BigQuery Project

Using the existing service account you from step 1, Creating a Service Account, complete the following:

Go to Google Cloud Console and navigate to IAM & Admin.
Select your project from the project selector at the top.
Go to the "IAM & Admin" section and select "IAM".
Assign Roles:
1. Select "Grant Access".
2. Assign the following roles to the service account created initially:
  1. BigQuery Resource Viewer
  2. BigQuery Job User

2. Create the BigQuery Dataset in Each Project

Similar to step 2, Create the BigQuery Dataset, complete the following:

Go to Google Cloud Console, navigate to BigQuery, and select your project.
Create a Dataset:
1. Name: paradime_cost_analytics
2. Ensure the Paradime Dataset region is the same across all BigQuery projects.

Grant Access:
1. Assign the BigQuery User and BigQuery Data Editor roles to the service account user created previously.
2. Ensure these permissions are granted at the Dataset level.

3. Setting up a Logs Routing Sink for each BigQuery Project

Similar to step 3, Setting up a Logs Routing Sink, complete the same process again, for each BigQuery Project to route Logs in their respective BigQuery Dataset in Each Project called paradime_cost_analytics

4.. Create and configure a GCS Bucket

Leave the rest of the Bucket configuration using the default options unless required by your organization. Ensure the GCS Bucket region is the same as the Dataset region.

Navigate to Cloud Storage and select Buckets
1. Click "Create".
2. Name your GCS bucket (e.g., paradime-bucket).
3. Select your Bucket region (e.g., US).
4. Click on "Create" to complete creating your GCS Bucket.
Set lifecycle policy
1. Go to the "Lifecycle" tab for your GCS Bucket.
2. Select "Add Rule" -> "Delete Object" action.
3. Set "Age" to 2 days and click "Create" to set the policy.
Grant the Paradime service account user access to the GCS Bucket
1. Go to the "Permissions" tab for your GCS Bucket.
2. Add the Paradime Service Account user email as a "New Principal" and assign the Storage Admin role.

5. dbt™ project setup guide

To enable Paradime to enrich your BigQuery queries with additional metadata you will need to create a new dbt™️ macro called get_query_comment.sql in the macros folder of your project.

Create the Macro (get_query_comment.sql)

get_query_comment.sql dbt™️ macro

get_query_comment.sql

{% macro get_query_comment(node, extra = {}) %}
    {%- set comment_dict = extra -%}
    {%- do comment_dict.update(
        app='dbt',
        dbt_version=dbt_version,
        project_name=project_name,
        target_name=target.name,
        target_database=target.database,
        target_schema=target.schema,
        invocation_id=invocation_id
    ) -%}

    {%- if node is not none -%}
        {%- do comment_dict.update(
            node_name=node.name,
            node_alias=node.alias,
            node_package_name=node.package_name,
            node_original_file_path=node.original_file_path,
            node_database=node.database,
            node_schema=node.schema,
            node_id=node.unique_id,
            node_resource_type=node.resource_type,
            node_meta=node.config.meta,
            node_tags=node.tags,
            full_refresh=flags.FULL_REFRESH,
            which=flags.WHICH,
        ) -%}

        {%- if flags.INVOCATION_COMMAND -%}
            {%- do comment_dict.update(
                invocation_command=flags.INVOCATION_COMMAND
            ) -%}
        {%- endif -%}

        {%- if node.resource_type != ('seed') -%} {# Otherwise this throws an error saying 'Seeds cannot depend on other nodes.' #}
            {%- if node.refs is defined -%}
                {% set refs = [] %}
                {% for ref in node.refs %}
                    {%- if (dbt_version.split('.')[0] | int > 1)
                       or (dbt_version.split('.')[0] | int == 1 and dbt_version.split('.')[1] | int >= 5) -%}
                        {%- do refs.append(ref.name) -%}
                    {%- else -%}
                        {%- do refs.append(ref[0]) -%}
                    {%- endif -%}
                {% endfor %}
                {%- do comment_dict.update(
                    node_refs=refs | unique | list
                ) -%}
            {%- endif -%}
        {%- endif -%}
        {%- if node.resource_type == 'model' -%}
            {%- do comment_dict.update(
                materialized=node.config.materialized,
            ) -%}
        {%- endif -%}
    {%- endif -%}

--- only if using dbt cloud

    {%- if env_var('DBT_CLOUD_PROJECT_ID', False) -%}
    {%- do comment_dict.update(
        dbt_cloud_project_id=env_var('DBT_CLOUD_PROJECT_ID')
    ) -%}
    {%- endif -%}

    {%- if env_var('DBT_CLOUD_JOB_ID', False) -%}
    {%- do comment_dict.update(
        dbt_cloud_job_id=env_var('DBT_CLOUD_JOB_ID')
    ) -%}
    {%- endif -%}

    {%- if env_var('DBT_CLOUD_RUN_ID', False) -%}
    {%- do comment_dict.update(
        dbt_cloud_run_id=env_var('DBT_CLOUD_RUN_ID')
    ) -%}
    {%- endif -%}

    {%- if env_var('DBT_CLOUD_RUN_REASON_CATEGORY', False) -%}
    {%- do comment_dict.update(
        dbt_cloud_run_reason_category=env_var('DBT_CLOUD_RUN_REASON_CATEGORY')
    ) -%}
    {%- endif -%}

    {%- if env_var('DBT_CLOUD_RUN_REASON', False) -%}
    {%- do comment_dict.update(
        dbt_cloud_run_reason=env_var('DBT_CLOUD_RUN_REASON')
    ) -%}
    {%- endif -%}

-- only if using bolt scheduler
    {%- if env_var('PARADIME_SCHEDULE_NAME', False) -%}
    {%- do comment_dict.update(
        paradime_schedule_name=env_var('PARADIME_SCHEDULE_NAME')
    ) -%}
    {%- endif -%}

    {%- if env_var('PARADIME_SCHEDULE_RUN_ID', False) -%}
    {%- do comment_dict.update(
        paradime_schedule_run_id=env_var('PARADIME_SCHEDULE_RUN_ID')
    ) -%}
    {%- endif -%}

    {%- if env_var('PARADIME_SCHEDULE_RUN_START_DTTM', False) -%}
    {%- do comment_dict.update(
        paradime_schedule_run_start_dttm=env_var('PARADIME_SCHEDULE_RUN_START_DTTM')
    ) -%}
    {%- endif -%}

    {%- if env_var('PARADIME_SCHEDULE_TRIGGER', False) -%}
    {%- do comment_dict.update(
        paradime_schedule_trigger=env_var('PARADIME_SCHEDULE_TRIGGER')
    ) -%}
    {%- endif -%}

    {%- if env_var('PARADIME_SCHEDULE_GIT_SHA', False) -%}
    {%- do comment_dict.update(
        paradime_schedule_git_sha=env_var('PARADIME_SCHEDULE_GIT_SHA')
    ) -%}
    {%- endif -%}

    {{ return(tojson(comment_dict)) }}
{% endmacro %}

Update your dbt_project.yml file

This step ensures that with each dbt™️ run, the query comment is appended to the query running in BigQuery.

dbt_project.yml

query-comment:
  comment: '{{ get_query_comment(node) }}'
  append: true

6. Configure Cost connection in Paradime

To connect Paradime to BigQuery for cost tracking, follow these steps:

How to Add a BigQuery Cost Connection

Click "Settings" in the top menu bar of the Paradime interface.
Click "Connections" in the left sidebar.
Click "Add New" next to the Radar Environment.
Select "BigQuery" as the connection type.

Enter the Required Details

Once you have selected BigQuery, provide the following details to complete the setup:

BigQuery Service Account JSON
BigQuery Project IDs
Dataset Location for the dataset used by Paradime
GCS Bucket Name used by Paradime

Important Note:

If connecting multiple projects, separate each BigQuery Project ID using commas (,) without spaces: ✅ dev-project,staging-project,production-project ❌ dev-project, staging-project, production-project

PreviousCost connection NextSnowflake cost connection

Last updated 1 month ago

Was this helpful?