Skip to content

Commit e1d59ff

Browse files
dahongliDahong Li
andauthored
Fix race between two phase algorithm preSuspendBarrierand commit_begin
There is race between two phase algorithm preSuspendBarrier() and two phase commit commit_begin(). T1 thread #1: setCkptPending() T2 thread mpickpt#2: checks isCkptPending() then stop(comm) T3 thread mpickpt#2: stop(comm) sets state to PHASE_1 T4 thread mpickpt#2: while isCkptPending() loop T5 thread #1 calls waitForNewStateAfter(PHASE_1) This change adds a timeout when waiting state change. When the preSuspend wait for PHASE_1 state change times out, it would set response to FREE_PASS. Co-authored-by: Dahong Li <root@cori.nersc.gov>
1 parent e3118a6 commit e1d59ff

2 files changed

Lines changed: 20 additions & 7 deletions

File tree

mpi-proxy-split/two-phase-algo.cpp

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -230,8 +230,10 @@ TwoPhaseAlgo::preSuspendBarrier(query_t query)
230230
case INTENT:
231231
setCkptPending();
232232
if (getCurrState() == PHASE_1) {
233-
// If in PHASE_1, wait for us to finish doing IN_CS
234-
if (waitForNewStateAfter(PHASE_1) != IN_CS) {
233+
st = waitForNewStateAfter(PHASE_1, 100 /* timeout ms*/);
234+
if (st == PHASE_1) break;
235+
// Wait for us to finish doing IN_CS
236+
if (st != IN_CS) {
235237
break;
236238
} else {
237239
while (waitForNewStateAfter(IN_CS) != IN_CS);
@@ -299,24 +301,35 @@ TwoPhaseAlgo::stop(MPI_Comm comm)
299301
// INVARIANT: Ckpt should be pending when we get here
300302
JASSERT(isCkptPending());
301303
setCurrState(PHASE_1);
302-
303304
while (isCkptPending() && !phase1_freepass) {
304305
sleep(1);
305306
}
306307
phase1_freepass = false;
307308
}
308309

309310
phase_t
310-
TwoPhaseAlgo::waitForNewStateAfter(phase_t oldState)
311+
TwoPhaseAlgo::waitForNewStateAfter(phase_t oldState, int timeout_ms)
311312
{
312313
lock_t lock(_phaseMutex);
313-
// The user thread will notify us if transition to any of these states
314-
_phaseCv.wait(lock, [this, oldState]
314+
if (timeout_ms) {
315+
// Since _phaseCv is a condition var, it should be prepared for spurious wakeups.
316+
// So, waking it up on a timeout is harmless.
317+
_phaseCv.wait_for(lock, std::chrono::milliseconds(timeout_ms),
318+
[this, oldState]
315319
{ return _currState != oldState &&
316320
( _currState == IN_TRIVIAL_BARRIER ||
317321
_currState == PHASE_1 ||
318322
_currState == IN_CS ||
319323
_currState == IS_READY); });
324+
} else {
325+
// The user thread will notify us if transition to any of these states
326+
_phaseCv.wait(lock, [this, oldState]
327+
{ return _currState != oldState &&
328+
( _currState == IN_TRIVIAL_BARRIER ||
329+
_currState == PHASE_1 ||
330+
_currState == IN_CS ||
331+
_currState == IS_READY); });
332+
}
320333
return _currState;
321334
}
322335

mpi-proxy-split/two-phase-algo.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ namespace dmtcp_mpi
162162
void stop(MPI_Comm);
163163

164164
// Wait until the state is changed to a new state
165-
phase_t waitForNewStateAfter(phase_t oldState);
165+
phase_t waitForNewStateAfter(phase_t oldState, int timeout_ms=0);
166166

167167
// Sends the given message 'msg' (along with the given 'extraData') to
168168
// the coordinator

0 commit comments

Comments
 (0)