Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-41786

[System Test] : Indexer crashed with error "panic: KV Rollback Received For Initial Build Request" after a memcached OOM kill on a KV node

    XMLWordPrintable

Details

    Description

      Build : 7.0.0-3154
      Test : -test tests/2i/test_idx_steady_state_cheshire_cat.yml -scope tests/2i/scope_idx_cheshire_cat_moi.yml
      GSI Storage Mode : MOI
      Scale : 2

      Memcached got OOM killed on the KV node 172.23.96.18 due to the projector memory consumption issue MB-41422.

      As a result of this, there was a KV rollback. The memcached logs on 172.23.96.18 has messages like these -

      2020-09-29T19:40:56.334689-07:00 WARNING 92: (bucket3) DCP (Producer) eq_dcpq:secidx:proj-bucket3-MAINT_STREAM_TOPIC_aab3f073fd2d8bff0bf3ca584ad910ad-7555551877666133025/0 - (vb:256) Stream request requires rollback to seqno:29974 because consumer ahead of producer - producer upper at 29974. Client requested seqnos:{29975,18446744073709551615} snapshot:{29968,29975} uuid:255571248687441
      

      On the indexer log on 172.23.120.77, the following msg can be seen -

      2020-09-29T19:40:56.040-07:00 [Info] Timekeeper::sendRestartMsg Received KV Repair Msg For Stream MAINT_STREAM KeyspaceId bucket1. Attempting Stream Repair.
      

      And then, few mins later, on 172.23.120.77, indexer crashes with the following error :

      2020-09-29T19:45:22.450-07:00 [Error] Indexer::sendStreamUpdateForBuildIndex Unexpected Rollback from Projector during Initial Stream Request &{3 bucket2:scope_6:coll_-2 0xc0294a6dc0 0 0}
      panic: KV Rollback Received For Initial Build Request
       
      goroutine 135288670 [running]:
      github.com/couchbase/indexing/secondary/common.CrashOnError(...)
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/common/util.go:413
      github.com/couchbase/indexing/secondary/indexer.(*indexer).sendStreamUpdateForBuildIndex.func1(0xc000077180, 0x3, 0xc00578d960, 0x17, 0xc000188780, 0x5, 0x8, 0x4, 0x1411e80, 0xc01919e240, ...)
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/indexer.go:4168 +0xf1e
      created by github.com/couchbase/indexing/secondary/indexer.(*indexer).sendStreamUpdateForBuildIndex
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/indexer.go:4116 +0xf63
      

      Indexer also crashed on other nodes around the same time. Following excerpt is from eagle-eye :

      2020-09-29 20:08:00,866 - systestmon - WARNING - *** 1 occurences of exited with status keyword found on 172.23.120.77 ***
      2020-09-29 20:08:00,872 - systestmon - DEBUG - [user:info,2020-09-29T19:45:22.864-07:00,ns_1@172.23.120.77:<0.11559.0>:ns_log:crash_consumption_loop:69]Service 'indexer' exited with status 2. Restarting. Messages:
      2020-09-29 20:08:02,215 - systestmon - WARNING - *** 1 occurences of exited with status keyword found on 172.23.121.77 ***
      2020-09-29 20:08:02,225 - systestmon - DEBUG - [user:info,2020-09-29T19:45:08.852-07:00,ns_1@172.23.121.77:<0.15443.0>:ns_log:crash_consumption_loop:69]Service 'indexer' exited with status 2. Restarting. Messages:
      2020-09-29 20:08:02,900 - systestmon - WARNING - *** 1 occurences of exited with status keyword found on 172.23.123.24 ***
      2020-09-29 20:08:02,907 - systestmon - DEBUG - [user:info,2020-09-29T19:45:17.881-07:00,ns_1@172.23.123.24:<0.13282.0>:ns_log:crash_consumption_loop:69]Service 'indexer' exited with status 2. Restarting. Messages:
      2020-09-29 20:08:03,577 - systestmon - WARNING - *** 1 occurences of exited with status keyword found on 172.23.123.25 ***
      2020-09-29 20:08:03,581 - systestmon - DEBUG - [user:info,2020-09-29T19:45:15.660-07:00,ns_1@172.23.123.25:<0.12149.0>:ns_log:crash_consumption_loop:69]Service 'indexer' exited with status 2. Restarting. Messages:
      2020-09-29 20:08:04,254 - systestmon - WARNING - *** 1 occurences of exited with status keyword found on 172.23.123.26 ***
      2020-09-29 20:08:04,263 - systestmon - DEBUG - [user:info,2020-09-29T19:46:01.083-07:00,ns_1@172.23.123.26:<0.12405.0>:ns_log:crash_consumption_loop:69]Service 'indexer' exited with status 2. Restarting. Messages:
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            deepkaran.salooja Deepkaran Salooja
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty