Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49319

High run-to-run variations in analytics drop tests

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown
    • CX Sprint 269

    Description

      We see high run-to-run variations in analytics drop tests. The build used is 7.1.0-1250. After the runs dropped a collection, it took different time to drop the index. 

      Avg. drop rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, SSD, s=1 c=3

      http://showfast.sc.couchbase.com/#/runs/bigfun_20M_drop_4n_1s_3c_oceanus/7.1.0-1250

      http://perf.jenkins.couchbase.com/job/oceanus/7317/

      2021-11-02T15:07:24 [INFO] Dropping collection bucket-1:scope-1.ChirpMessages-1

      2021-11-02T15:07:37 [INFO] Number of items in dataset ChirpMessages-1: 200002740

      2021-11-02T15:07:50 [INFO] Number of items in dataset ChirpMessages-1: 49996749

      2021-11-02T15:08:02 [INFO] Number of items in dataset ChirpMessages-1: 49996749

      2021-11-02T15:08:05 [INFO] Number of items in dataset ChirpMessages-1: 0

       

      http://perf.jenkins.couchbase.com/job/oceanus/6886/

      2021-09-09T12:38:38 [INFO] Dropping collection bucket-1:scope-1.ChirpMessages-1

      2021-09-09T12:38:51 [INFO] Number of items in dataset ChirpMessages-1: 200002740

      2021-09-09T12:38:53 [INFO] Number of items in dataset ChirpMessages-1: 0

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          HI Bo-Chun Wang,

          When the KV collection is dropped, KV sends a DCP event to Analytics. As a result, ingestion is stopped and after that we run a job that performs a full rollback (truncate) on the corresponding Analytics collections. The rollback is performed by removing the on-disk b-trees one by one. Because the test is running queries that access those same b-trees, the queries interfere with the rollback process. What happened in the run on (2021-11-02) is that the 2nd and 3rd query interfered with the rollback on one of the nodes and as a result, it delayed the completion of the rollback job.

          Not sure if this has been discussed before, but we shouldn't be using queries to check if the rollback completed. You can grep the analytics_info.log as we log when the rollback completes on each Analytics collection. What's even better is to use the newly introduced System Events logs in 7.1. We have added an event log with a unique identifier that is emitted every time a rollback is performed.

          Also, I'm not sure what exactly we are trying to capture by this test since we aren't really dropping items but rather deleting full b-trees. Depending on the storage layout at the time of the rollback, we could be deleting a single or multiple b-trees. What we are testing right now is something similar to executing a 'drop analytics collection' rather than a 'delete' keys operation. So, maybe we don't need this test at all.

          murtadha.hubail Murtadha Hubail added a comment - HI Bo-Chun Wang , When the KV collection is dropped, KV sends a DCP event to Analytics. As a result, ingestion is stopped and after that we run a job that performs a full rollback (truncate) on the corresponding Analytics collections. The rollback is performed by removing the on-disk b-trees one by one. Because the test is running queries that access those same b-trees, the queries interfere with the rollback process. What happened in the run on (2021-11-02) is that the 2nd and 3rd query interfered with the rollback on one of the nodes and as a result, it delayed the completion of the rollback job. Not sure if this has been discussed before, but we shouldn't be using queries to check if the rollback completed. You can grep the analytics_info.log as we log when the rollback completes on each Analytics collection. What's even better is to use the newly introduced System Events logs in 7.1. We have added an event log with a unique identifier that is emitted every time a rollback is performed. Also, I'm not sure what exactly we are trying to capture by this test since we aren't really dropping items but rather deleting full b-trees. Depending on the storage layout at the time of the rollback, we could be deleting a single or multiple b-trees. What we are testing right now is something similar to executing a 'drop analytics collection' rather than a 'delete' keys operation. So, maybe we don't need this test at all.

          I close the issue. We will review the test and decide if we still need it.

          bo-chun.wang Bo-Chun Wang added a comment - I close the issue. We will review the test and decide if we still need it.

          People

            bo-chun.wang Bo-Chun Wang
            bo-chun.wang Bo-Chun Wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty