Fix StateChangedAt mismatch during container recovery from storage#14482
Conversation
…At mismatch during container recovery
There was a problem hiding this comment.
Pull request overview
Fixes intermittent ContainerRecoveryFromStorage failures by aligning StateChangedAt between live stop/event transitions and recovery by sourcing the timestamp from Docker’s FinishedAt when available.
Changes:
- Extended
WSLAContainerImpl::Transition()to accept an optionalstateChangedAtoverride. - Updated stop and stop-event paths to
InspectContainer()and pass Docker’sFinishedAtintoTransition(). - Preserved existing behavior by defaulting to
std::time(nullptr)when Docker’s timestamp is unavailable/unparseable.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/windows/wslasession/WSLAContainer.h |
Adds optional stateChangedAt parameter to Transition() to allow caller-provided timestamps. |
src/windows/wslasession/WSLAContainer.cpp |
Populates stateChangedAt from Docker FinishedAt in stop paths; updates Transition() to use the override. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
This PR aims to eliminate a flaky ContainerRecoveryFromStorage test failure by making StateChangedAt use a consistent time source between the “live” stop path and the recovery (Open()) path.
Changes:
- Extend
ContainerEventTrackercallbacks to include aneventTimeextracted from Docker events. - Add an optional
stateChangedAtparameter toWSLAContainerImpl::Transition()and introduceGetDockerFinishedAt()to source Docker’sFinishedAt. - Update container/process event handlers to accept the new callback signature and propagate timestamps into state transitions.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/windows/wslasession/WSLAProcessControl.h | Updates process control event handler signatures to accept event time. |
| src/windows/wslasession/WSLAProcessControl.cpp | Wires updated callbacks through std::bind and ignores unused eventTime. |
| src/windows/wslasession/WSLAContainer.h | Extends Transition() to accept an optional timestamp; adds helper declaration. |
| src/windows/wslasession/WSLAContainer.cpp | Uses Docker FinishedAt on explicit Stop(); uses event-provided time on stop notifications. |
| src/windows/wslasession/ContainerEventTracker.h | Extends callback type to include a timestamp. |
| src/windows/wslasession/ContainerEventTracker.cpp | Extracts time from Docker event JSON and forwards it to callbacks. |
…hangedat-timestamp-consistency
| } | ||
|
|
||
| Transition(WslaContainerStateExited); | ||
| Transition(WslaContainerStateExited, GetDockerFinishedAt()); |
There was a problem hiding this comment.
Hmm doing this might be a bit risky, because if the session is terminating, this call could fail and then we wouldn't call ReleaseRuntimeResources().
Here's an alternative idea: We could rethink Stop() to always wait for the stop notification to be triggered, that way we always get the correct stop timestamp. We might need to rethink the locking story a bit though, because OnEvent currently acquires m_lock as well
The easiest way I can think of would be have a waitable "stop state". Something like this:
std::mutex m_stopStateLock;
std::optional<std::promise<uint64_t>> m_stopState;
Then in Stop() we could do something like this:
auto lock = m_lock.lock_exclusive();
{
std::mutex m_stopStateLock;
m_stopState.emplace();
}
// Perform the stop
[..]
auto stopTimestamp = m_stopState.get_future().wait_for(...)
Transition(WslaContainerStateExited, topTimestamp);
With this, OnEvent() can check if m_stopState is set and just set the value if so (since there's a Stop call() in progress) without having to acquire m_lock
Let me know if that makes sense
Summary of the Pull Request
Fixed a flaky test failure in ContainerRecoveryFromStorage where StateChangedAt differed by 1 second between the live and recovery paths. The root cause was that Transition() used std::time(nullptr) (host clock) while Open() restored from Docker's FinishedAt (VM clock), producing different values.
PR Checklist
Detailed Description of the Pull Request / Additional comments
When a container is stopped, Transition() recorded m_stateChangedAt using std::time(nullptr) (the Windows host clock). When the container was later recovered via Open(), the timestamp was restored from Docker's FinishedAt field (the VM clock). These two clocks could differ by ~1 second, causing the ContainerRecoveryFromStorage test assertion AreEqual(containers[0].StateChangedAt, originalStateChangedAt) to fail intermittently.
Validation Steps Performed
Ran ContainerRecoveryFromStorage test repeatedly, no longer fails.