Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-28970

System Test : Query panics during indexer rebalance causing it to fail

    XMLWordPrintable

Details

    Description

      Build : 5.5.0-2340

      In the 2i component level system test, query service is seen to be crashing while rebalance out of an indexer node is in progress. There are queries running in the background.

      The following error is shown on the diag logs on UI for rebalance failure:

      Rebalance exited with reason {service_rebalance_failed,index,
      {linked_process_died,<21376.3280.4>,
      {timeout,
      {gen_server,call,
      [<21376.19056.3>,
      {call,"ServiceAPI.GetTaskList",
      #Fun<json_rpc_connection.0.125340786>},
      60000]}}}}
      

      Following is shown for query crash on diag logs on UI:

      Service 'query' exited with status 134. Restarting. Messages:
      /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/util/sync.go:51 +0x68 fp=0xc7a0211728 sp=0xc7a0211700
      github.com/couchbase/query/execution.(*base).runConsumer(0xc5c55b2780, 0x188e840, 0xc5c55b2780, 0xc7468d9760, 0x0, 0x0)
      /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:551 +0xaf fp=0xc7a0211780 sp=0xc7a0211728
      github.com/couchbase/query/execution.(*InitialGroup).RunOnce(0xc5c55b2780, 0xc7468d9760, 0x0, 0x0)
      /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/group_initial.go:53 +0x5c fp=0xc7a02117c0 sp=0xc7a0211780
      runtime.goexit()
      /home/couchbase/.cbdepscache/exploded/x86_64/go-1.8.5/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc7a02117c8 sp=0xc7a02117c0
      created by github.com/couchbase/query/execution.(*Sequence).RunOnce.func1
      /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/sequence.go:95 +0x404
      [goport(/opt/couchbase/bin/cbq-engine)] 2018/03/29 09:29:21 child process exited with status 134
      

      Following is the excerpt from query logs:

      2018-03-29T09:25:58.705-07:00 [Error] PickRandom: Fail to find indexer for all index partitions. Num partition 16.  Partition with instances 4
      2018-03-29T09:25:58.705-07:00 [Warn] Fail to find indexers to satisfy query request.  Terminate scan for index 17854410000405815456,  reqId:07f94c1b-581c-4a4d-afca-a29cfdcd951f :  queryport.connPoolTimeout from [172.23.105.62:9101 172.23.105.63:9101]
      2018-03-29T09:25:58.717-07:00 [Warn] scan failed: requestId 4b7d111e-ca5f-45ef-b28c-4a5a9ff9b2b7 queryport 172.23.105.62:9101 inst 12979636491335180211 partition [2 14 7]
      2018-03-29T09:25:58.717-07:00 [Warn] scan failed: requestId 4b7d111e-ca5f-45ef-b28c-4a5a9ff9b2b7 queryport 172.23.105.63:9101 inst 12979636491335180211 partition [4 1 3 5 8]
      2018-03-29T09:25:58.717-07:00 [Warn] scan failed: requestId 4b7d111e-ca5f-45ef-b28c-4a5a9ff9b2b7 queryport 172.23.104.41:9101 inst 12979636491335180211 partition [15 11 9]
      2018-03-29T09:25:58.717-07:00 [Warn] scan failed: requestId 4b7d111e-ca5f-45ef-b28c-4a5a9ff9b2b7 queryport 172.23.104.41:9101 inst 2831273699395716626 partition [12 13 16 10 6]
      2018-03-29T09:25:58.717-07:00 [Warn] Scan failed with error for index 9550062138552874306.  Trying scan again with replica, reqId:4b7d111e-ca5f-45ef-b28c-4a5a9ff9b2b7 :  queryport.connPoolTimeout from [172.23.105.62:9101 172.23.105.63:9101 172.23.104.41:9101 172.23.104.41:9101] ...
      2018-03-29T09:25:58.733-07:00 [Warn] scan failed: requestId a091f351-7158-43ef-abc4-dbab3ce8e6c7 queryport 172.23.105.63:9101 inst 12979636491335180211 partition [8 3 5 1 4]
      2018-03-29T09:25:58.733-07:00 [Warn] scan failed: requestId a091f351-7158-43ef-abc4-dbab3ce8e6c7 queryport 172.23.105.62:9101 inst 12979636491335180211 partition [2 7 14]
      2018-03-29T09:25:58.733-07:00 [Warn] scan failed: requestId a091f351-7158-43ef-abc4-dbab3ce8e6c7 queryport 172.23.104.41:9101 inst 2831273699395716626 partition [12 6 16 10 13]
      2018-03-29T09:25:58.733-07:00 [Warn] scan failed: requestId a091f351-7158-43ef-abc4-dbab3ce8e6c7 queryport 172.23.104.41:9101 inst 12979636491335180211 partition [15 9 11]
      2018-03-29T09:25:58.733-07:00 [Warn] Scan failed with error for index 9550062138552874306.  Trying scan again with replica, reqId:a091f351-7158-43ef-abc4-dbab3ce8e6c7 :  queryport.connPoolTimeout from [172.23.105.63:9101 172.23.105.62:9101 172.23.104.41:9101 172.23.104.41:9101] ...
      2018-03-29T09:25:58.750-07:00 [Warn] scan failed: requestId e8e2c541-b6ec-4dd7-80bb-521d637d1667 queryport 172.23.105.62:9101 inst 12979636491335180211 partition [2 7 14]
      2018-03-29T09:25:58.750-07:00 [Warn] scan failed: requestId e8e2c541-b6ec-4dd7-80bb-521d637d1667 queryport 172.23.105.62:9101 inst 2831273699395716626 partition [5 8 1 3 4]
      2018-03-29T09:25:58.750-07:00 [Warn] scan failed: requestId e8e2c541-b6ec-4dd7-80bb-521d637d1667 queryport 172.23.104.41:9101 inst 2831273699395716626 partition [6 12 13 10 16]
      2018-03-29T09:25:58.750-07:00 [Warn] scan failed: requestId e8e2c541-b6ec-4dd7-80bb-521d637d1667 queryport 172.23.104.41:9101 inst 12979636491335180211 partition [15 11 9]
      2018-03-29T09:25:58.750-07:00 [Warn] Scan failed with error for index 9550062138552874306.  Trying scan again with replica, reqId:e8e2c541-b6ec-4dd7-80bb-521d637d1667 :  queryport.connPoolTimeout from [172.23.105.62:9101 172.23.105.62:9101 172.23.104.41:9101 172.23.104.41:9101] ...
      2018-03-29T09:25:58.750-07:00 [Error] PickRandom: Fail to find indexer for all index partitions. Num partition 16.  Partition with instances 14
      2018-03-29T09:25:58.750-07:00 [Warn] Fail to find indexers to satisfy query request.  Trying scan again for index 9550062138552874306, reqId:e8e2c541-b6ec-4dd7-80bb-521d637d1667 :  queryport.connPoolTimeout from [172.23.105.62:9101 172.23.105.62:9101 172.23.104.41:9101 172.23.104.41:9101] ...
      _time=2018-03-29T09:25:58.757-07:00 _level=SEVERE _msg=panic: runtime error: invalid memory address or nil pointer dereference
       
      request text:
      <ud>select result,SUM(rating) from `other-1`where result is not null group by result</ud>
       
      stack:
      goroutine 76640777 [running]:
      github.com/couchbase/query/execution.(*Context).Recover(0xc68e7002c0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/context.go:498 +0xbc
      panic(0xe2bd00, 0x1863b10)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.8.5/go/src/runtime/panic.go:489 +0x2cf
      github.com/couchbase/query/execution.(*Sequence).SendStop(0xc474e581e0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/sequence.go:118 +0x61
      github.com/couchbase/query/execution.(*base).notifyStop(0xc474e583c0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:650 +0x49
      github.com/couchbase/query/execution.(*base).runConsumer.func1()
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:549 +0x276
      github.com/couchbase/query/util.(*Once).Do(0xc474e584b8, 0xc71668e738)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/util/sync.go:51 +0x68
      github.com/couchbase/query/execution.(*base).runConsumer(0xc474e583c0, 0x188e8c0, 0xc474e583c0, 0xc68e7002c0, 0x0, 0x0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:551 +0xaf
      github.com/couchbase/query/execution.(*IntermediateGroup).RunOnce(0xc474e583c0, 0xc68e7002c0, 0x0, 0x0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/group_intermediate.go:53 +0x5c
      created by github.com/couchbase/query/execution.(*base).runConsumer.func1
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:537 +0x2f4
       
      goroutine 76640777 [running]:
      github.com/couchbase/query/execution.(*Context).Recover(0xc68e7002c0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/context.go:498 +0xbc
      panic(0xe2bd00, 0x1863b10)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.8.5/go/src/runtime/panic.go:489 +0x2cf
      github.com/couchbase/query/execution.(*Sequence).SendStop(0xc474e581e0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/sequence.go:118 +0x61
      github.com/couchbase/query/execution.(*base).notifyStop(0xc474e583c0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:650 +0x49
      github.com/couchbase/query/execution.(*base).runConsumer.func1()
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:549 +0x276
      github.com/couchbase/query/util.(*Once).Do(0xc474e584b8, 0xc71668e738)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/util/sync.go:51 +0x68
      github.com/couchbase/query/execution.(*base).runConsumer(0xc474e583c0, 0x188e8c0, 0xc474e583c0, 0xc68e7002c0, 0x0, 0x0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:551 +0xaf
      github.com/couchbase/query/execution.(*IntermediateGroup).RunOnce(0xc474e583c0, 0xc68e7002c0, 0x0, 0x0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/group_intermediate.go:53 +0x5c
      created by github.com/couchbase/query/execution.(*base).runConsumer.func1
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:537 +0x2f4
      

      Attaching cbcollectinfo shortly

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty