MyLab: Database replication to BigQuery using change data capture

Databases like MySQL, Oracle, and SAP are the most often discussed CDC data sources. However, any system can be considered a data source if it captures and provides changes to data elements that are identified by primary keys. If a system doesn’t provide a built-in CDC process, such as a transaction log, you can deploy an incremental batch reader to get changes.

This document discusses CDC processes that meet the following criteria:

  1. Data replication captures changes for each table separately.
  2. Every table has a primary key or a composite primary key.
  3. Every emitted CDC event is assigned a monotonically increasing change ID, usually a numeric value like a transaction ID or a timestamp.
  4. Every CDC event contains the complete state of the row that changed.

For more detail, please see the references below.

Reference:
– https://github.com/GoogleCloudPlatform/bq-mirroring-cdc
– https://cloud.google.com/architecture/database-replication-to-bigquery-using-change-data-capture
– https://cloud.google.com/architecture/database-migration-concepts-principles-part-1
– https://cloud.google.com/architecture

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.