Skip to content

fix: improve e2e observability test resilience and diagnostics#641

Draft
Hweinstock wants to merge 1 commit intoaws:mainfrom
Hweinstock:fix/e2e-observability-flakiness
Draft

fix: improve e2e observability test resilience and diagnostics#641
Hweinstock wants to merge 1 commit intoaws:mainfrom
Hweinstock:fix/e2e-observability-flakiness

Conversation

@Hweinstock
Copy link
Contributor

Description

The observability e2e tests (logs, logs --level, traces list) are failing consistently in CI across all suites. Root cause: the commands return exit code 1 when CloudWatch log groups or transaction search data are not yet available, but the tests either lack retry or have insufficient retry windows for the CI environment (12 parallel suites hitting CloudWatch simultaneously).

Three fixes:

  1. Error messages now include stdout — Ink renders errors to stdout, not stderr. Previously error messages only showed stderr (empty), making CI failures impossible to diagnose.
  2. logs supports level filtering now has retry — it was the only observability test without retry, failing immediately if the log group was not yet available.
  3. Retry window increased from 3×15s to 5×20s — the previous 45s max was too tight for CloudWatch propagation with 12 parallel deploys. New window is 100s max, within the 120s test timeout.

Related Issue

Closes #

Documentation PR

N/A

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Other (please describe):

Testing

How have you tested the change?

  • I ran npm run test:unit and npm run test:integ
  • I ran npm run typecheck
  • I ran npm run lint
  • If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots

The observability tests pass consistently in local dev account runs. The CI failures should be resolved by the increased retry window and the addition of retry to the level filter test. If failures persist, the improved error messages will now show the actual CLI error output for diagnosis.

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

@Hweinstock Hweinstock requested a review from a team March 25, 2026 13:46
@github-actions github-actions bot added the size/s PR size: S label Mar 25, 2026
@Hweinstock Hweinstock marked this pull request as draft March 25, 2026 14:23
@Hweinstock Hweinstock force-pushed the fix/e2e-observability-flakiness branch from 3fd15aa to 421a4e8 Compare March 25, 2026 14:26
@github-actions github-actions bot added size/s PR size: S and removed size/s PR size: S labels Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/s PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant