CDC Pipelines with JBang and Debezium

CDC Pipelines with JBang and Debezium

About Giovanni

CDC Pipelines with JBang and Debezium

Change Data Capture

Change data capture (CDC) is a data integration pattern that captures changes made to data in a source system, such as inserts, updates, and deletes.

CDC Pipelines with JBang and Debezium

Use Cases

  • Data Synchronization
  • Event Driven Architectures
  • Materialized Views & Caches
  • Audit Trails
CDC Pipelines with JBang and Debezium

Debezium

Debezium is an open-source distributed platform for Change Data Capture (CDC). It streams real-time changes from databases into event-driven systems by monitoring database transaction logs and publishing change events.

CDC Pipelines with JBang and Debezium

Debezium + Kafka Connect

Kafka Connect provides the runtime framework, handling connector lifecycle management, fault tolerance, offset storage, and scalability, while Debezium supplies source connectors that read database transaction logs.

CDC Pipelines with JBang and Debezium

The Ecosystem

Debezium offers a wide range of solutions to address CDC in different contexts.

  • Debezium Connectors
  • Debezium Engine
  • Debezium Server
  • Debezium Operator
  • Debezium Platform
CDC Pipelines with JBang and Debezium

Quarkus Extensions

In addition to Outbox Extension, we have a series of new Quarkus Extensions to change data capture different datasources inside your Quarkus Application.

CDC Pipelines with JBang and Debezium

Benefits

  • CDC in Quarkus
  • Kafka-free Option
  • Lightweight & Fast
  • Developer Friendly
CDC Pipelines with JBang and Debezium

Extensions Use cases

CDC use cases are valid, but:

  • Patterns for Monolith to Microservice
    • strangler fig pattern
  • Cache Invalidation
  • Index Update
CDC Pipelines with JBang and Debezium

Debezium supports Jbang!

JBang is a lightweight tool that lets you run Java as easily as scripting languages.

CDC Pipelines with JBang and Debezium

Demo

CDC Pipelines with JBang and Debezium

What's next

  • Debezium Server Native
  • Shutdown strategy for Debezium Engine
  • Source Connectors for Milvus & Qdrant
  • and much more!
CDC Pipelines with JBang and Debezium

Thank you!

CDC Pipelines with JBang and Debezium

Thank you for the JUG Amsterdam to make this happen, I know how much is difficult to organize meetups and events so again thank you. I promised to Geertjan that my talk is light and fast. So I'll go very quick on the next slide

my name is Giovanni software engineer with 10 years of career and the most important part is that I am Debezium Core contributor. A part from that, I am also contributor for the zig opentelementry sdk and Ticino Software Craft organizer. In the photo my natural form, drinking beers. Ok after this embarrassing moment, let's jump to the main topic of today

before going to the practice, we need to talk about of the theory behind it. CDC Is a data integration pattern - there a lot of data integration pattern - but this one is mainly focused on the transfering or capturing data from a system to another but the data are expressed in the form of insert updates and deletes which in a near past are related to databases

so for this reason there are multiple use cases addressed by the following patter like: - data syncronization -> keepingh multiple databases or system in sync in real-time - EDA -> react to data changes and trigger business processes - MW & Cache -> maintain derived data structure and invalidate caches - AT -> track changes for data complaince and debugging

In this context, debeziumn is a platform for CDC and in particular, streams real time changes from databases to ED system by monitoring the database transaction logs. This point is really important: There are many kind of CDC but the transaction logs is the most efficient way to capture data from a database.

Probably many of you already know, but the natural fit of Debezium from the version 0.1 is inside Kafka connect as connector: Kafka connect provides the runtime which handle lifecycle, reliability and scalability and from the other part, Debezium supplies knowledge about data sources reading from the transaction logs.

but this is only part of the story. Today Debezium is an ecosystem composed by many components all of them related to CDC. Here a brief overview: - Connectors: the oldest part in Debezium. The connectors have the responsability to know how to follow the transaction log - Engine: Born as an embedded testing runtime, permits to run a connector like in kafka connect and emit events - Debezium Server: our flagship CDC solution which permits to capture data and sink them many supported systems - Debezium Operator: the best way to deploy debezium server on kubernetes - Debezium Platform: which permits to manage with a friendly UI the debezium operator

and the last component in the Debezium Ecosystem: The Quarkus Extensions. many of you already know Quarkus, Quarkus is a Java Framework for build web application with the main porpouse to create native java applications. Debezium now offers a series of Quarkus extensions that permits to make CDC inside your quarkus applications

The benefits are a lot and the main one is the possibility to avoid the need for full Kafka infrastructure and to work only with your quarkus application. at the same time , Debezium supports native build for fast startup time and low memory footprint.Last but not least, you have a series of API and annotations that helps you develop in a friendly way

there are many use cases, likely the one that you seen some slide before like data migration. But for sure the natural fit for the extensions are patterns for monolith to microservices like strangler fig pattern in which you are moving responsabilities from a service to under which implies not only moving data from a system to another but to keep them aligned in different forms like data, caches and indexes.

In order to show you a quick demo of the Debezium Quarkus Extension, I'll use JBang is a lightweight tool that lets you run Java as easily as scripting languages. With Jbang you can run Java code as scripts with automatic dependency and JDK management.