Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19781

GSI: incorrect handling of expired documents causes index fragmentation and enormous grow of used disk space

    XMLWordPrintable

Details

    • Task
    • Resolution: Won't Fix
    • Major
    • 6.5.0
    • 4.1.1
    • secondary-index
    • None
    • Ubuntu, 64-bit, 3 nodes 4CPU x 4GB RAM

    Description

      There is a problem with handling expired documents by GSI that causes index fragmentation (up to 98%) and results with enormous grow of used disk space ([index data size]:[index disk size] =~ [1]:[1/(1-fragmentation rate)]). Disk space usage returns to normal only after CB-server restart.
      How to reproduce:
      1. 4.1.1-EE GA (5914), 3 nodes (4 CPU x 4 GB RAM)
      2. All nodes have all services enabled
      3. Establish a cluster with default bucket (256Mb, full eviction, no password, 1 replica, view index replicas, i/o proirity = high, flush = enable)
      4. Run the following code (or write your own: 20 threads make sync-api-insertion of 100.000 docs per thread with indexed fields; need to create indexes too):

      package highcpuafterload;
       
      import com.couchbase.client.java.Bucket;
      import com.couchbase.client.java.Cluster;
      import com.couchbase.client.java.CouchbaseCluster;
      import com.couchbase.client.java.document.JsonDocument;
      import com.couchbase.client.java.document.json.JsonObject;
      import com.couchbase.client.java.env.CouchbaseEnvironment;
      import com.couchbase.client.java.env.DefaultCouchbaseEnvironment;
      import com.couchbase.client.java.query.N1qlQuery;
      import java.util.LinkedList;
      import java.util.concurrent.Phaser;
       
      public class BombardaMaxima extends Thread {
       
          private final int tid;
          
          // configure here
          private static final int threads = 20;
          private static final int docsPerThread = 100000;
          private static final int docTTLms = 30 * 1000;
          private static final int dumpToConsoleEachNDocs = 1000;
          
          private static final Phaser phaser = new Phaser(threads + 1);
          private static final CouchbaseEnvironment ce;
          private static final Cluster cluster;
          private static final String bucket = "default";
          
          static {
              ce = DefaultCouchbaseEnvironment.create();
              final LinkedList<String> nodes = new LinkedList();
              nodes.add("A.node");
              nodes.add("B.node");
              nodes.add("C.node");
              cluster = CouchbaseCluster.create(ce, nodes);
              final Bucket b = cluster.openBucket(bucket);
       
              
              final String iQA = "CREATE INDEX iQA ON `default`(a, b) WHERE a is valued USING GSI";
              final String iQX = "CREATE INDEX iQX ON `default`(a, c) WHERE a is valued USING GSI";
              
              b.query(N1qlQuery.simple(iQA));        
              b.query(N1qlQuery.simple(iQX));        
              
          }
          public BombardaMaxima(final int tid) {
              this.tid = tid;
          }
          public final void run() {
              try {
                  Bucket b = null;
                  synchronized(cluster) { b = cluster.openBucket(bucket); }
                  final long stm = System.currentTimeMillis();
                  final JsonObject jo = JsonObject
                          .empty()
                          .put("a", stm)
                          .put("b", stm)
                          .put("c", stm);
                  for(int i = 0; i< docsPerThread; i++) { 
                      b.upsert(JsonDocument.create(
                              tid + ":" + System.currentTimeMillis(), 
                              (int)((System.currentTimeMillis() + docTTLms) / 1000),
                              jo)
                      );
                      if (i % dumpToConsoleEachNDocs == 0) System.out.println("T[" + tid + "] = " + i);
                  }
              } catch(final Exception e) {
                  e.printStackTrace();
              } finally {
                  phaser.arriveAndAwaitAdvance();
              }   
          }
          public static void main(String[] args) {
              for(int i = 0; i< threads; i++) new BombardaMaxima(i).start();
              phaser.arriveAndAwaitAdvance();
              System.out.println("DONE");
          }
          
      }
      

      5. Watch growning indexes defragmentation rate via UI stats.
      6. Wait for code run to end.
      7. Wait a little bit more (you can also force compaction) until all documents in bucket are expired (you should not press "Documents" because of https://issues.couchbase.com/browse/MB-19758)
      8. Now, finally, goto UI stats. You will see big fragmentation rate, and [index data size]:[index disk size] =~ [1]:[1/(1-fragmentation rate)]

      My examples of final results for this run (see images below)

      Related forum thread: https://forums.couchbase.com/t/index-data-size-index-disk-size-1-50-is-it-normal/8467/2

      Attachments

        1. if-1.png
          if-1.png
          21 kB
        2. if-2.png
          if-2.png
          69 kB
        3. if-3.png
          if-3.png
          49 kB
        4. if-4.png
          if-4.png
          21 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            prathibha Prathibha Bisarahalli (Inactive)
            egrep Georgy Repin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty