Skip to content

KafkaMessageQueueTrigger resulting AssetEvent diverges from _consumer_offsets #64205

@spencerhuang

Description

@spencerhuang

Description

Using Airflow 3.1.8

Hi all,

I've been experimenting with Kafka Message Queue Trigger. As in CDC Debezium -> Kafka -> AssetWatcher + Kafka Message Queue Trigger -> AssetEvent -> DAG

It's somewhat disheartening. One day in, I noticed that AssetEvent table and Kafka's _consumer_offsets are starting to diverge. Stale AssetEvent rows in Airflow's Postgres Metastore were separate from Kafka offsets. Resetting Kafka consumer offsets had no effect on these already-queued events. Each stale event consumed a DAG run, blocking the next actual event from being processed within the timeout. It's likely caused by a network partition.

Is this a low probability scenario, or is this a practical concern? Most business would require Strict Exactly-Once. My understanding is that this is not a matter of If, but how to handle the fallout when it happens. Is there a way to improve upon this, any ideas?

Use case/motivation

A way to detect drift or divergence and a contingency/approach to re-align with Kafka.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions