Skip to content

Add retry logic for transient login failures#918

Open
naushkorai wants to merge 1 commit intodocker:masterfrom
naushkorai:retry_auth
Open

Add retry logic for transient login failures#918
naushkorai wants to merge 1 commit intodocker:masterfrom
naushkorai:retry_auth

Conversation

@naushkorai
Copy link

@naushkorai naushkorai commented Jan 30, 2026

Adds configurable retry mechanism with basic exponential backoff to handle intermittent failures when authenticating to container registries, particularly GCP (GAR/GCR) where I'm seeing errors intermittently.

  • Add retry-attempts input (default: 0 for backward compatibility, making it opt in)
  • Add retry-delay input (default: 5000ms)
  • Implement exponential backoff retry logic in docker login
    • Chose to just write a simple retry function vs. going with a library
  • Retry all errors except 5xxs
    • I'm seeing intermittent 401 failures
  • Add tests for retry behavior
  • Update README with new input parameters

Attempting to address #885

Adds configurable retry mechanism with basic exponential backoff to handle intermittent failures when authenticating to container registries, particularly GCP (GAR/GCR) where I'm seeing errors intermittently.

- Add retry-attempts input (default: 0 for backward compatibility, making it opt in)
- Add retry-delay input (default: 5000ms)
- Implement exponential backoff retry logic in docker login
  - Chose to just write a simple retry function vs. going with a library
- Retry all errors except 5xxs
  - I'm seeing intermittent 401 failures
- Add tests for retry behavior
- Update README with new input parameters

Signed-off-by: Naush Korai <naush.korai@mixpanel.com>
@GabrielBianconi
Copy link

I'd love this supported as well.

@GabrielBianconi
Copy link

@thaJeztah Sorry for pinging directly: we're having a lot of transient failures from DockerHub on this action which is degrading our CI. Retry logic like this would be super helpful. I'm sure many other users would benefit.

Here's an example from our open-source repo. We're having flakiness due to timeouts today:

https://github.com/tensorzero/tensorzero/actions/runs/22406824746/job/64868963743?pr=6569

I'm not sure if this is exactly the solution you'll want. Our team is happy to contribute as well if you'd like a different approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants