-
Notifications
You must be signed in to change notification settings - Fork 16.8k
Description
Description
Using Airflow 3.1.8
Hi all,
I've been experimenting with Kafka Message Queue Trigger. As in CDC Debezium -> Kafka -> AssetWatcher + Kafka Message Queue Trigger -> AssetEvent -> DAG
It's somewhat disheartening. One day in, I noticed that AssetEvent table and Kafka's _consumer_offsets are starting to diverge. Stale AssetEvent rows in Airflow's Postgres Metastore were separate from Kafka offsets. Resetting Kafka consumer offsets had no effect on these already-queued events. Each stale event consumed a DAG run, blocking the next actual event from being processed within the timeout. It's likely caused by a network partition.
Is this a low probability scenario, or is this a practical concern? Most business would require Strict Exactly-Once. My understanding is that this is not a matter of If, but how to handle the fallout when it happens. Is there a way to improve upon this, any ideas?
Use case/motivation
A way to detect drift or divergence and a contingency/approach to re-align with Kafka.
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct