How to secure a BigQuery data warehouse that stores confidential data

This document is intended for data engineers and security administrators who deploy and secure data warehouses using BigQuery. It’s part of a security blueprint that’s made up of the following:

  • GitHub repository that contains a set of Terraform configurations and scripts. The Terraform configuration sets up an environment in Google Cloud that supports a data warehouse that stores confidential data.
  • A guide to the architecture, design, and security controls that you use this blueprint to implement (this document).

This document discusses the following:

  • The architecture and Google Cloud services that you can use to help secure a data warehouse in a production environment.
  • Best practices for data governance when creating, deploying, and operating a data warehouse in Google Cloud, including data de-identification, differential handling of confidential data, and column-level access controls.

This document assumes that you have already configured a foundational set of security controls as described in theĀ Google Cloud security foundations. It helps you to layer additional controls onto your existing security controls to help protect confidential data in a data warehouse.

Architecture

To create a confidential data warehouse, you need to categorize data as confidential and non-confidential, and then store the data in separate perimeters. The following image shows how ingested data is categorized, de-identified, and stored. It also shows how you can re-identify confidential data on demand for analysis.

The confidential data warehouse architecture.

Organization structure

You group your organization’s resources so that you can manage them and separate your testing environments from your production environment. Resource Manager lets you logically group resources by project, folder, and organization.

The following diagram shows you a resource hierarchy with folders that represent different environments such as bootstrap, common, production, non-production (or staging), and development. You deploy most of the projects in the blueprint into the production folder, and the data governance project in the common folder which is used for governance.

The resource hierarchy for a confidential data warehouse.

References:

  1. https://github.com/GoogleCloudPlatform/terraform-google-secured-data-warehouse
  2. https://cloud.google.com/architecture/confidential-data-warehouse-blueprint?hl=en

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.