Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-57813

[System Test on cloud] Indexing seems to be stuck for a period of more than 15 hours

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      A 5 node cluster with the following config -

      3 KV + 2 GSI/Query ( n2-standard-8 200 GB disk + n2-standard-8 450 GB disk)

      After performing an initial data load, create indexes and then do an incremental data load to arrive at the required numbers (doc counts mentioned below). During this incremental data load, there is query workload as well. It seems that the Index Mutations Remaining has stayed the same (around 150 million) over the course of the last 15 or so hours (screenshot attached). The CPU is being utilised ( 85% + ) during this period, so the machines are not idle. Since the index mutations numbers did not reduce, about 2 or 3 hours into incremental workload, I killed all the data/query workload docker containers, but it still hasn't worked.

      Another point to note is there are quite a few instances of auto-failover. Since the auto-failover interval is set to 10 seconds for cloud instances, nodes keep getting auto failed over and added back.
      Not really sure if this has anything to do with why indexing seems to be stuck. But I have filed a separate ticket to look into why the nodes are getting auto failed over so many times.

      cbcollect ->

      https://cb-engineering.s3.amazonaws.com/SysTest_11Jul_Slow_Indexing/collectinfo-2023-07-12T040332-ns_1%40svc-d-node-001.r45djlf-eb-k-m7.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTest_11Jul_Slow_Indexing/collectinfo-2023-07-12T040332-ns_1%40svc-d-node-002.r45djlf-eb-k-m7.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTest_11Jul_Slow_Indexing/collectinfo-2023-07-12T040332-ns_1%40svc-d-node-003.r45djlf-eb-k-m7.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTest_11Jul_Slow_Indexing/collectinfo-2023-07-12T040332-ns_1%40svc-qi-node-004.r45djlf-eb-k-m7.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTest_11Jul_Slow_Indexing/collectinfo-2023-07-12T040332-ns_1%40svc-qi-node-005.r45djlf-eb-k-m7.sandbox.nonprod-project-avengers.com.zip
      

      Info for QE ->

      Script used -> cmd/cp- cli/scenarios/system_tests/provisioned/provisioned_gsi_system_test.yaml
       
      10 buckets with no of docs -7500000, 1800000, 5000000, 2000000 ,4000000, 600000, 60000, 600000, 6001, 9001.
      Number of indexes totals to 584. 
      Wait for initial data load/ index creation/ incremental data load steps. 
      

      Attachments

        1. 005_CPU_usage.png
          005_CPU_usage.png
          308 kB
        2. Index_mutations_remaining.png
          Index_mutations_remaining.png
          233 kB
        3. 004_CPU_usage.png
          004_CPU_usage.png
          525 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            pavan.pb Pavan PB
            pavan.pb Pavan PB
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty