CDC Ingestion

CDC stands for Change Data Capture, which is a broad concept, as long as it can capture the change data, it can be called CDC. Flink CDC is a Log message-based data capture tool, all the inventory and incremental data can be captured. Taking MySQL as an example, it can easily capture Binlog data through Debezium、Flink CDC and process the calculations in real time to send them to the data lake. The data lake can then be queried by other engines.

This section will show how to ingest one table or multiple tables into the data lake for both Iceberg format and Mixed-Iceberg format.

Apache Flink CDC

Apache Flink CDC is a distributed data integration tool for real time data and batch data. Flink CDC brings the simplicity and elegance of data integration via YAML to describe the data movement and transformation.

Amoro provides the relevant code case reference how to complete cdc data to different lakehouse table format, see flink-cdc-ingestion doc

At the same time, we provide Mixed-Iceberg format, which you can understand as STREAMING For iceberg, which will enhance your real-time processing scene for you

Debezium

Debezium is an open source distributed platform for change data capture. Start it up, point it at your databases, and your apps can start responding to all of the inserts, updates, and deletes that other apps commit to your databases. Debezium is durable and fast, so your apps can respond quickly and never miss an event, even when things go wrong.

Demo

Coming Soon

Airbyte

Airbyte is Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes

Demo

Coming Soon