2020-09-30

Why/When does calling a context cancel function from another goroutine cause a deadlock?

I'm having difficulties getting behind the concept of context cancel functions and at which point calling the cancel func causes a deadlock.

I have a main method that declares a context and I am passing its cancel function to two goroutines

ctx := context.Background()
ctx, cancel := context.WithCancel(ctx)
go runService(ctx, wg, cancel, apiChan)
go api.Run(cancel, wg, apiChan, aviorDb)

I use this context in a service function (infinite loop that stops once the context is cancelled). I am controlling this by calling the cancel function from another goroutine. runService is a long running operation and looks similar to this:

func runService(ctx context.Context, wg *sync.WaitGroup, cancel context.CancelFunc, apiChan chan string) {
MainLoop:
    for {    
        // this is the long running operation
        worker.ProcessJob(dataStore, client, job, resumeChan)

        select {
        case <-ctx.Done():
            _ = glg.Info("service stop signal received")
            break MainLoop
        default:
        }
        select {
        case <-resumeChan:
            continue
        default:
        }
        waitCtx, cancel := context.WithTimeout(context.Background(), time.Duration(sleepTime)*time.Minute)
        globalstate.WaitCtxCancel = cancel
        <-waitCtx.Done()

    }
    _ = dataStore.SignOutClient(client)
    apiChan <- "stop"
    wg.Done()
    cancel()
}

api has a global variable for the context cancel function:

var appCancel context.CancelFunc

It is set in the beginning by the api.Run method like so:

func Run(cancel context.CancelFunc, wg *sync.WaitGroup, stopChan chan string, db *db.DataStore) {
    ...
    appCancel = cancel
    ...
}

api has a stop function which calls the cancel function:

func requestStop(w http.ResponseWriter, r *http.Request) {
    _ = glg.Info("endpoint hit: shut down service")
    if globalstate.WaitCtxCancel != nil {
        globalstate.WaitCtxCancel()
    }
    state := globalstate.Instance()
    state.ShutdownPending = true

    appCancel()
    encoder := json.NewEncoder(w)
    encoder.SetIndent("", " ")
    _ = encoder.Encode("stop signal received")
}

When the requestStop function is called and thus the context is cancelled, the long running operation (worker.ProcessJob) immediately halts and the entire program deadlocks. Before its next line of code is executed, the code jumps to gopark with reason waitReasonSemAcquire.

The context cancel function is only called in these two locations. So it seems like the other goroutine (runService) can't get a lock for some reason.

My understanding up to now was that the cancel function can be passed around to different goroutines and there are no synchronization issues attached when calling it.

For example, the WaitCtxCancel function never causes a deadlock when I call it.

I could

  • replace the context with a 1-buffered channel and send a message to break out of the loop
  • use my global state struct and a boolean

to determine whether should run.

However, I want to understand what's happening here and why. Also, is there any solution or approach I could use using contexts? It seemed like the "correct" thing to use for use cases like mine.



from Recent Questions - Stack Overflow https://ift.tt/3jj0uim
https://ift.tt/eA8V8J

No comments:

Post a Comment