Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-28750

System test : Indexer crash when rebalancing out another indexer node

    XMLWordPrintable

Details

    Description

      Build : 5.5.0-2211

      In the system test for secondary indexing, following are the steps performed :
      1. 6 node cluster : 2 kv, 1 query and 3 indexer node
      2. 4 buckets and 4 indexes on each of them, including 1 partitioned indexes.
      3. Start constant kv ops
      4. Start constant queries including aggregate pushdown queries
      5. Leave the system idle for a few minutes.
      6. Rebalance in another indexer node.
      7. Rebalance out another index node.

      There is a failure observed twice at this step. Indexer on node added in Step 6 fails. Here is the error shown in diag logs.

      Service 'indexer' exited with status 134. Restarting. Messages:
      /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/select.go:423 +0x1235 fp=0xc465b97b88 sp=0xc465b97928
      runtime.selectgo(0xc465b97c38)
      /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/select.go:238 +0x1c fp=0xc465b97bb0 sp=0xc465b97b88
      github.com/couchbase/indexing/secondary/indexer.(*Rebalancer).tokenMergeOrReady.func1(0xc4211df600, 0xc49dfcd994, 0x24, 0xc4bf0a5880, 0x20, 0xc4bf0a58a0, 0x20, 0xc4bf0a58c0, 0x20, 0xc4bf0a58e0, ...)
      /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/rebalancer.go:788 +0x480 fp=0xc465b97d00 sp=0xc465b97bb0
      runtime.goexit()
      /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc465b97d08 sp=0xc465b97d00
      created by github.com/couchbase/indexing/secondary/indexer.(*Rebalancer).tokenMergeOrReady
      /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/rebalancer.go:829 +0x279
      [goport(/opt/couchbase/bin/indexer)] 2018/03/16 10:22:54 child process exited with status 134
      

      One observation was that even though the UI logs showed message Rebalance completed successfully for Step 6, it was stuck at 99.4% overall progress for >2 mins after that message.

      cbcollectinfo attached.

      The cluster is currently available for debugging if needed. It may be repurposed over the weekend.
      http://172.23.104.18:8091/

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            jliang John Liang
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty