Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31495

KVBucket::getRandomKey will hang 0.1% of the time for a node with active, empty VBs

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: master, 4.6.5, 5.5.0, 5.5.1, 5.1.2, 5.5.2
    • Fix Version/s: 6.0.0
    • Component/s: couchbase-bucket
    • Labels:
      None
    • Triage:
      Untriaged
    • Is this a Regression?:
      No

      Description

      Summary

      There is a bug in the getRandomKey functionality where it is possible that we get stuck and never return a response to the consumer.
      The bug has existed since 2013, however with the introduction of the query workbench is appearing more often, as it uses the getRandomKey functionality.

      The bug only appears where we have replica vbuckets and / or the all active vbuckets are empty.
      Furthermore the starting vbucket to search for a random key must be zero.

      Main user-visible symptom of this is one (or more) memcached front-end threads spinning at 100% CPU utilisation.

      Details
      The actual bug is KVBucket::getRandomKey

      1395  GetValue KVBucket::getRandomKey() {
      1396      VBucketMap::id_type max = vbMap.getSize();
      1397  
      1398      const long start = random() % max;
      1399      long curr = start;
      1400      std::unique_ptr<Item> itm;
      1401  
      1402      while (itm == NULL) {
      1403          VBucketPtr vb = getVBucket(curr++);
      1404          while (!vb || vb->getState() != vbucket_state_active) {
      1405              if (curr == start) {
      1406                  return GetValue(NULL, ENGINE_KEY_ENOENT);
      1407              }
      1408              if (curr == max) {
      1409                  curr = 0;
      1410              }
      1411  
      1412              vb = getVBucket(curr++);
      1413          }
      1414  
      1415          if ((itm = vb->ht.getRandomKey(random()))) {
      1416              GetValue rv(std::move(itm), ENGINE_SUCCESS);
      1417              return rv;
      1418          }
      1419  
      1420          if (curr == max) {
      1421              curr = 0;
      1422          }
      1423  
      1424          if (curr == start) {
      1425              return GetValue(NULL, ENGINE_KEY_ENOENT);
      1426          }
      1427          // Search next vbucket
      1428      }
      1429  
      1430      return GetValue(NULL, ENGINE_KEY_ENOENT);
      1431  }
      

      The bug is that we check for curr == start before checking if curr == max.

      The issues shows up when we have an active but empty vbucket and start == 0.
      Focusing the inner while loop

      1403          VBucketPtr vb = getVBucket(curr++);
      1404          while (!vb || vb->getState() != vbucket_state_active) {
      1405              if (curr == start) {
      1406                  return GetValue(NULL, ENGINE_KEY_ENOENT);
      1407              }
      1408              if (curr == max) {
      1409                  curr = 0;
      1410              }
      1411  
      1412              vb = getVBucket(curr++);
      1413          }
      

      Recalling that the bucket is empty, the first time we enter the while loop
      curr == 1. curr != start and curr != max and so we will call
      getVBucket and increment curr to 2. We repeat until curr == 1024.
      On going round the while loop again curr != start, but curr == max
      and so set curr = 0.

      We then call getVbucket and increment curr to 1. On going round the
      while loop again curr != start (as it is 1), and hence we loop
      indefinitely.

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.5.0-1407 contains kv_engine commit df99171 with commit message:
            Refactor: Simplify KVBucket::getRandomKey

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-1407 contains kv_engine commit df99171 with commit message: Refactor: Simplify KVBucket::getRandomKey
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-5.5.3-4002 contains kv_engine commit 298bd42 with commit message:
            [BP] MB-31548: Fix bug in getRandomKey

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-5.5.3-4002 contains kv_engine commit 298bd42 with commit message: [BP] MB-31548 : Fix bug in getRandomKey
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.0.0-1694 contains kv_engine commit 298bd42 with commit message:
            [BP] MB-31548: Fix bug in getRandomKey

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.0-1694 contains kv_engine commit 298bd42 with commit message: [BP] MB-31548 : Fix bug in getRandomKey
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.5.0-1447 contains kv_engine commit 298bd42 with commit message:
            [BP] MB-31548: Fix bug in getRandomKey

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-1447 contains kv_engine commit 298bd42 with commit message: [BP] MB-31548 : Fix bug in getRandomKey
            Hide
            lynn.straus Lynn Straus added a comment -

            reopened to remove 5.5.x candidate label as this is already fixed in 5.5.3.

            Show
            lynn.straus Lynn Straus added a comment - reopened to remove 5.5.x candidate label as this is already fixed in 5.5.3.

              People

              • Assignee:
                owend Daniel Owen
                Reporter:
                owend Daniel Owen
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.