Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-30553

'Hash' memcached stat collection causes significant intra-cluster replication delay

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.1.6, 4.1.2, 4.5.1, 4.6.5, 5.0.1, 5.1.1, 5.5.0
    • Fix Version/s: 6.5.0
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:
    • Triage:
      Untriaged
    • Flagged:
      Release Note
    • Is this a Regression?:
      No
    • Release Notes:

      Description

      Collecting the hash statistic from memcached causes significant replication delay, which severely affects the response times of replicateTo requests.

      It's suspected that this is because the hash stat uses the visitDepth() method of the hashtable (this is the only piece in the codebase which uses it) which uses inefficient locking:

      void HashTable::visitDepth(HashTableDepthVisitor &visitor) {
          if (valueStats.getNumItems() == 0 || !isActive()) {
              return;
          }
          size_t visited = 0;
          VisitorTracker vt(&visitors);
       
          for (int l = 0; l < static_cast<int>(mutexes.size()); l++) {
              LockHolder lh(mutexes[l]);
              for (int i = l; i < static_cast<int>(size); i+= mutexes.size()) {
                  size_t depth = 0;
                  StoredValue* p = values[i].get().get();
      

      In this code the lock for the relevant hashtable 'buckets' is held until all have been iterated over, rather than releasing it between iterations.

      This is a significant problem as even though the hash statistics are very rarely required, they are requested as a part of every single cbcollect_info.

      Reproduction
      Below is a very basic async Java application which runs upserts with replicateTo=1:

      package com.matt;
       
      import com.couchbase.client.java.*;
      import com.couchbase.client.java.document.JsonDocument;
      import com.couchbase.client.java.document.json.JsonArray;
      import com.couchbase.client.java.document.json.JsonObject;
      import rx.Observable;
       
      import java.text.DateFormat;
      import java.text.SimpleDateFormat;
      import java.util.Date;
      import java.util.TimeZone;
      import java.util.concurrent.TimeUnit;
      import java.util.concurrent.TimeoutException;
       
       
      public class Main {
       
          public static void main(String... args) {
       
              // Initialize the Connection
              Cluster cluster = CouchbaseCluster.create("localhost");
              cluster.authenticate("matt.carabine", "correcthorsebatterystaple");
              AsyncBucket bucket = cluster.openBucket("default").async();
              // Create a JSON Document
              JsonObject arthur = JsonObject.create()
                      .put("name", "Arthur")
                      .put("email", "kingarthur@couchbase.com")
                      .put("interests", JsonArray.from("Holy Grail", "African Swallows"))
                      .put("lorem_ipsum", "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.");
       
              for (int i = 0; i < 100000000; i++) {
                  JsonDocument doc = JsonDocument.create("Doc::" + i, arthur);
                  Observable
                          .just(doc)
                          .flatMap(v -> bucket.upsert(v, ReplicateTo.ONE).timeout(1, TimeUnit.SECONDS))
                          .forEach(document -> {
                                  }, error -> {
                                      if (error.getClass() == TimeoutException.class) {
                                          TimeZone tz = TimeZone.getTimeZone("UTC");
                                          DateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ"); // Quoted "Z" to indicate UTC, no timezone offset
                                          df.setTimeZone(tz);
                                          String nowAsISO = df.format(new Date());
                                          System.out.println(nowAsISO);
                                      } else {
                                          error.printStackTrace();
                                      }
                                  }
                          );
                  try {
                      Thread.sleep(2);
                  } catch (InterruptedException e) {
       
                  }
              }
          }
      }
      

      Running the following command during the execution of the program causes timeouts:

      /opt/couchbase/bin/cbstats -u matt.carabine -p correcthorsebatterystaple localhost:11210 -b default hash
      

      As soon as the cbstats command finishes, the timeouts stop.

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.5.0-1170 contains kv_engine commit 9f22eee with commit message:
            MB-30553: visitDepth: Re-acquire HashTable mutex on each HashBucket

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-1170 contains kv_engine commit 9f22eee with commit message: MB-30553 : visitDepth: Re-acquire HashTable mutex on each HashBucket
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.0.0-1559 contains ns_server commit 6de08ab with commit message:
            MB-30665: Remove cbstats hash task from cbcollect_info

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.0-1559 contains ns_server commit 6de08ab with commit message: MB-30665 : Remove cbstats hash task from cbcollect_info
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.5.0-1286 contains ns_server commit 6de08ab with commit message:
            MB-30665: Remove cbstats hash task from cbcollect_info

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-1286 contains ns_server commit 6de08ab with commit message: MB-30665 : Remove cbstats hash task from cbcollect_info
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.5.0-1358 contains ns_server commit 0334a9e with commit message:
            MB-31264: Restore 'cbstats hash' to cbcollect_info

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-1358 contains ns_server commit 0334a9e with commit message: MB-31264 : Restore 'cbstats hash' to cbcollect_info
            Hide
            owend Daniel Owen added a comment - - edited

            Hi Amarantha Kulkarni

            Suggested release note:

            Fixed an issue where requesting the hash statistic severely affects the response times of replicateTo requests.

            Show
            owend Daniel Owen added a comment - - edited Hi Amarantha Kulkarni Suggested release note: Fixed an issue where requesting the hash statistic severely affects the response times of replicateTo requests.

              People

              Assignee:
              arunkumar Arunkumar Senthilnathan
              Reporter:
              matt.carabine Matt Carabine
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                    PagerDuty