When to Use Debezium for Database Replication

March 4, 2021

What Is Debezium?

Debezium is open source software that records transactions on your database and forwards those messages to a Kafka cluster for downstream forwarding. Then, Kafka Connect is typically used to propagate those changes down to supported sinks (targets). Another implementation currently under development involves the use of Debezium Server, which can deliver messages to a variety of pub/sub targets.

Debezium With Kafka

As mentioned above, the recommended implementation and current standard practice for Debezium is to send transactions to a Kafka cluster for downstream message brokering. This software requires that you have technical teams on hand to implement these new technologies and understand how to properly leverage Kafka to apply your transactions. A common technical challenge with this setup is understanding how to apply transactions to your sink/target database. Kafka preserves transaction order, but has partial support for idempotence, which means partial support for unexpected connection failure. After a connection failure, Debezium will attempt to resume syncing transactions from a saved state from before the crash, but will overcorrect in the interest of capturing all records; their documentation indicates that these failures will create duplicate records which must be accounted for.

Debezium for Postgres and Other Databases

Debezium is a workable solution if you have the technical bandwidth to manage the implementation, configuration and downstream practical application of parsed database events, but in many cases, your teams may not have the bandwidth to contribute to the open source software, develop customized support for your use cases, and reconfigure the solution as common database production changes incur schema changes.In addition to not handling schema drift, Debezium wasn’t created to migrate databases either, so your team will need to evaluate alternative solutions to bulk load your existing database data without interrupting ongoing production processes.

What Is Fivetran?

Fivetran automated data integration delivers zero-configuration connectors that dynamically adapt as schemas and APIs change, ensuring reliable data access. Fivetran continuously synchronizes data from source to warehouse, and accelerates data analysis by programmatically managing ready-to-query schemas and automating in-warehouse transformations.

In addition to native support for database sources and popular data warehouses, setup time is drastically reduced for the fastest time-to-value. You’ll never need to manage any solution hosting infrastructure or configuration beyond user permissioning, and changes such as unexpected crashes and schema drift are automatically handled.

Fivetran also comes with a free 14-day trial, and the trial counter doesn’t start until you’ve completed your initial historical sync, so we recommend signing up as soon as possible to familiarize yourself with our application.

Alternatively, if you’d like to explore best practices with us prior to starting your trial, we’re always available to present a demo that includes how to make the best use of your trial.