Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-58029

Exceptions thrown while processing notified cookies can leave mcbp connection "stuck"

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown
    • KV 2023-4

    Description

      If an exception is thrown during Connection::processNotifiedCookie() while running Cookie::execute(), then the Connection can fail to be notified (via libevent) and hence can end up in a "stuck" state if there is not already read event enabled. For example, this was observed occurring via STAT("dcp") command when one of the DCP connections had a too long connection name:

      In this scenario, running dcp stats in a background task combined with MB-34280 is causing KV to repeatedly attempt dcp stats.

      The background task gathers the stats, but does not write the actual responses to the network. MB-34280 will not be hit yet, as nothing is checking the key len.

      The task completes, and informs the frontend that it should execute the cookie again.

      It reaches:

      void Connection::processNotifiedCookie(Cookie& cookie, cb::engine_errc status) {
          try {
              cookie.setAiostat(status);
              cookie.setEwouldblock(false);
              if (cookie.execute()) {
                  // completed!!! time to clean up after it and process the
                  // command pipeline? / schedule more?
                  if (cookies.front().get() == &cookie) {
                      cookies.front()->reset();
                  } else {
                      cookies.erase(
                              std::remove_if(cookies.begin(),
                                             cookies.end(),
                                             [ptr = &cookie](const auto& cookie) {
                                                 return ptr == cookie.get();
                                             }),
                              cookies.end());
                  }
                  triggerCallback();
              }
          } catch (const std::exception& e) {
              LOG_CRITICAL("Connection::processNotifiedCookie got exception: {}",
                           e.what());
          }
      }
      

      And throws an exception in execute() when it now actually tries to write the responses out. Note that it likely succeeds in writing some number of responses before reaching the problematic ones.

      This leaves a valid, non-empty, non-ewouldblock cookie - and note it also skips calling triggerCallback() which would normally trigger a libevent read event to cause the Connection to be scheduled when a new request comes in.

      (Credit to James Harrison for the original description from linked CBSE).

      I believe we should change the catch above to force closure of the connection in this case, given the state machine is in an unknown state.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              owend Daniel Owen
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty