Databases like MySQL, Oracle, and SAP are the most often discussed CDC data sources. However, any system can be considered a data source if it captures and provides changes to data elements that are identified by primary keys. If a system doesn’t provide a built-in CDC process, such as a transaction log, you can deploy an incremental batch reader to get changes.
This document discusses CDC processes that meet the following criteria:
- Data replication captures changes for each table separately.
- Every table has a primary key or a composite primary key.
- Every emitted CDC event is assigned a monotonically increasing change ID, usually a numeric value like a transaction ID or a timestamp.
- Every CDC event contains the complete state of the row that changed.
For more detail, please see the references below.
Reference:
– https://github.com/GoogleCloudPlatform/bq-mirroring-cdc
– https://cloud.google.com/architecture/database-replication-to-bigquery-using-change-data-capture
– https://cloud.google.com/architecture/database-migration-concepts-principles-part-1
– https://cloud.google.com/architecture