Security model
Last updated
Last updated
In this section we describe the data security model of the Paradime application and its infrastructure components. This document is intended to help both practitioners seeking to understand the architecture of the Paradime application as well as administrators and security professionals.
The diagram below shows the critical components of the application and we will explain the security considerations for each component in more detail below.
Any traffic from the internet first reaches AWS edge-locations and then using Global Accelerator, they are routed through the AWS network. This improves the speed and reduces application load times for our users all over the world. Once within the AWS Cloud, the traffic is passed through Web Application Firewall (WAF) - to filter out any malicious traffic, DDoS attacks and bots trying to attack the Paradime application. The filtered traffic then hits our Load Balancers which then route the traffic to our kubernetes ingress controller located in our EKS cluster. Having this 4 stage network control helps us provide low latency, globally available and secure network traffic to our application inside the EKS cluster. Also to mention all compute and storage components within the VPC are never exposed to the internet and are all run inside private subnets.
Where customers need additional security, we provide AWS Private Link service between Paradime and customer VPC so any network traffic stays within AWS network without hitting the internet. This means Paradime can be connected to the customer's VPC within the same subnet IP range as the customer.
The Paradime app infrastructure lives in an AWS VPC managed by Paradime Labs. The VPC is shared by all customers within a region, but we can handle single tenant deployment upon request where we can deploy Paradime in a dedicated VPC at an extra cost.
The Paradime app leverages the AWS Postgres RDS to store application data. Every company has their own Postgres database with their own randomly generated credentials so that each company’s data is effectively isolated in single tenant mode. No secrets or user credentials are stored in the database. The names and email addresses of users within a company are stored with our identity provider (Auth0) and we only store a unique alpha-numeric identifier in our database that allows us to fetch user details from Auth0 into memory during application runtime over a https API call without needing any storage in the backend.
We use Hashicorp Vault on our own infrastructure in AWS, as our trusted storage for secrets and credentials encrypted at rest and in transit. This allows us to store sensitive information with all the power and security of Vault. In Vault, specifically we store:
User specific credentials to access data warehouse connected to an user’s company
User specific environment variables keys and values
Company specific credentials to access the company’s warehouse, git repository and third party integrations that the admin of the company has connected.
For every company, we provision an AWS FSX File System with their own subpath and exclusive access to users of that company to that subpath. In the FsX shares, we hold:
A clone of the company’s dbt™️ git repository for every developer so that changes to code made by one developer does not affect others - each developer/admin has access to only their own folder and their dbt™️ profile.yml
credentials also live in their own folder that nobody else has access to
The SSH key to access the company specific dbt™️ repository
The profile.yml
file that dbt™️ uses to execute dbt™️ CLI commands and connect to data warehouse is also sand-boxed inside each developer/admin's cloned dbt™️ git repo
The application uses AWS Elastic Kubernetes Service (EKS) to manage our application resources. EKS provides a high degree of reliability and scalability for the Paradime application. Within our kubernetes application, we have two relevant units:
Paradime controller - runs the overall Paradime application
Company-specific controller - runs company specific workloads including requesting company-specific secrets from Vault
Both the controllers above are designed to request secrets from Vault without caching except for postgres db credentials and git repository ssh key for speed and performance considerations.
To identify and authenticate users, we use Auth0. Auth0 provides a secure authorization and identification platform and it allows us to provide:
Ability to login with your G-Suite or other SSO credentials like OKTA and Google SAML
have user credentials stored in a secure environment outside of Paradime
have user credentials stored in a GDPR and SOC 2 compliant storage in a secure environment - more can be found here: https://auth0.com/security
We use Sendgrid to send emails to notify our users when needed e.g. during onboarding, email confirmation and resetting passwords. Sendgrid is a GDPR and SOC 2 compliant service that is used by the world’s biggest companies to send email. All Sendgrid emails are triggered through authenticated HTTPS REST API. Sendgrid's security information and compliance status is available here: https://sendgrid.com/policies/security/.
All the connection options to third party integrations are optional.
Company administrators can connect Paradime to their Slack workspace - this allows Paradime users to invite others to the app seamlessly and collaborate. The Slack integration requires administrators use OAuth 2.0 to authenticate the Paradime Slack App with the company’s Slack workspace and an OAuth token is stored by us in our Vault. Whenever a user wishes to invite others using Slack, we fetch a list of users from Slack using the pre-authorized token. However, we don’t store any list of users on our infrastructure and this list is only fetched into memory at the time when an user chooses to invite others.
Company administrators can connect Paradime to their Looker workspace - this allows us to build the data sources to data dashboards lineage, where the customer is using Looker. The Looker integration requires:
entering an SSH key generated by us in the git repo where LookML files are stored
an API Key and API Secret generated against the user used to connect to Looker using Looker API 3
Company administrators can connect Paradime to their Tableau site - this allows us to build the data sources to data dashboards lineage, where the customer is using Tableau Online. The Tableau integration requires:
a Token Name and a Token Value generated for the Tableau Site we are connecting to
Data Security Model
Using AWS RDS, EKS and Hashicorp Vault, we have built a logically separate single-tenant architecture with the following features:
the data between multiple companies are kept separate using kubernetes namespaces as a result each company's resources are logically separate
within a company, dbt™️ code, operational data and sensitive credentials are all separate
for dbt™️ code, each developer is allocated an isolated pod so that no developer is able to gain access to another developers folder
access to FsX, or to RDS is managed through randomly generated usernames and passwords that are stored in Vault
All the above, ensures there is no single point of failure that a potential attacker can use to compromise the security of the platform.
Only company administrators can add a data warehouse to the Paradime company account. Currently Snowflake, BigQuery, Redshift and Firebolt are supported. This connection is ONLY necessary if the users within an account want to run their dbt™️ CLI commands like “dbt run” etc. from within the application.
The Paradime application, when connected to a data warehouse, enables users to dispatch dbt™️ commands, which in turn dispatch SQL to the connected warehouse for transformation purposes. However, it is possible for users to dispatch SQL that returns customer data into the Paradime application. But, this data is never persisted and will only exist in the memory on the kubernetes pod attached to the currently logged in user. Hence, here Paradime’s primary role is always as a data processor and not a data store.
In order to properly lock down data, the administrator should connect Paradime to the dev warehouse environment and apply proper data warehouse permissions in that environment outside of Paradime to prevent improper access or storage of sensitive data. Individual users only have access to the dev environment. Only admins can set up connection to production warehouse and production warehouse is only required to run scheduled jobs and not for day to day using Paradime.
In terms of specific access requirements for each warehouse type, it’s identical to what one would need to run dbt™️ and their requirements are set out below in dbt™️ help docs:
Important to note that for Snowflake and Redshift, we support per user access credentials including database, schema, username and password..
For Snowflake, we support OAuth using Snowflake Security Integration. Using Snowflake OAuth, users will be able to connect Paradime to their Snowflake without sharing their username and password.
At the end, we would like to summarize the following main points about the Paradime.io application security model:
In our AWS infrastructure, we store each company’s data separately and within a company we store sensitive information in Hashicorp Vault, non-sensitive application data in AWS Postgres protected by randomly generated password and code in company specific protected AWS FsX subpath and per developer kubernetes pods.
As a result, in our application design there is no single point of failure.
We don’t store or manage user credentials and we use Auth0, who provide secure and compliant storage of user information.
Whenever we use a 3rd party data processor, we use GDPR and SOC2 compliant applications so that our user’s security is not compromised.
For our data warehouse connection, we operate as a data processor and don’t store any data.
Having a warehouse connection is not mandatory but it provides significant productivity advantages to developer workflows when using Paradime. Having a connection to production data warehouse is not strictly needed to use Paradime.
We never access or need to access the customers own data in production. For data catalog & cost analytics, we need access to the information schema and cost analytics permissions on Snowflake and NOT to actual customer data. In Workbench, when the user queries their own customer data, the information is only held in memory in the location where the Paradime workspace is setup and is erased from memory upon page refresh.
Our connection requirements for warehouses are identical to what dbt™️ requires to operate.
We have been independently audited and our SOC 2 report is available upon request. We use Drata, Inc. (https://drata.com) to continuously monitor our infrastructure, policies and personnel against control tests. The real-time state of our compliance and monitoring can be found in our Trust Center.