Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46241

Service 'indexer' exited with status 137

    XMLWordPrintable

Details

    Description

      Was trying to reproduce MB-46206 when I hit into this: (could be the same RCA of MB-46206)
      Indexer seems to have crashed on one or more nodes. For example on indexer node ns_1@172.23.121.78:

      Service 'indexer' exited with status 137. Restarting. Messages:
      2021-05-11T04:56:12.308-07:00 [Info] watcher.processChange(): done -> key = IndexTopology/s9NLzpdOOFpz3X%--30-262000/_default/PSuxyUlPfrsh33N9-30-296000
      2021-05-11T04:56:12.371-07:00 [Info] LeaderSyncProxy.syncReceive(). Receive stream_end. Txid : 4294971431
      2021-05-11T04:56:12.372-07:00 [Info] FollowerSyncProxy.receiveAndUpdateCurrentEpoch()
      2021-05-11T04:56:12.372-07:00 [Info] WatcherServer.syncWithPeer(): Watcher done synchronization with peer (TCP 172.23.106.251:9100)
      2021-05-11T04:56:12.372-07:00 [Info] WatcherServer.runWatcher(): Start Watcher Protocol
      2021-05-11T04:56:12.372-07:00 [Info] WatcherServer.runWatcher(): Watcher is ready to process request
      2021-05-11T04:56:12.397-07:00 [Info] lifecycleMgr.dispatchRequest: op OPCODE_SERVICE_MAP elapsed 126.725088ms len(expediates) 0 len(incomings) 0 len(outgoings) 0 error <nil>
      2021-05-11T04:56:12.398-07:00 [Info] lifecycleMgr.dispatchRequest: op OPCODE_CLIENT_STATS elapsed 11.244µs len(expediates) 0 len(incomings) 0 len(outgoings) 0 error <nil>
      2021-05-11T04:56:12.398-07:00 [Info] Plasma: Adaptive memory quota tuning RSS:23647268864, freePercent:1.7754779891251544, currentQuota=4152360960

      Steps to Reproduce
      1. create 2 kv, 1 n1ql, 14 indexer nodes

      2021-05-11 01:41:19,924 | test  | INFO    | pool-2-thread-21 | [table_view:display:72] Rebalance Overview
      +----------------+-----------+-----------------------+----------------+--------------+
      | Nodes          | Services  | Version               | CPU            | Status       |
      +----------------+-----------+-----------------------+----------------+--------------+
      | 172.23.105.175 | kv        | 7.0.0-5127-enterprise | 0.162948107295 | Cluster node |
      | 172.23.106.233 | ['kv']    |                       |                | <--- IN ---  |
      | 172.23.106.236 | ['n1ql']  |                       |                | <--- IN ---  |
      | 172.23.106.238 | ['index'] |                       |                | <--- IN ---  |
      | 172.23.106.250 | ['index'] |                       |                | <--- IN ---  |
      | 172.23.106.251 | ['index'] |                       |                | <--- IN ---  |
      | 172.23.121.74  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.121.78  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.43  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.58  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.44  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.45  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.54  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.47  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.78  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.84  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.85  | ['index'] |                       |                | <--- IN ---  |
      +----------------+-----------+-----------------------+----------------+--------------+
      

      2. Create 1 bucket with 1k collections, and a total of 10K items

       +----------------------------+-----------+----------+------------+-----+-------+------------+-----------+-----------+ +----------------------------+-----------+----------+------------+-----+-------+------------+-----------+-----------+| Bucket                     | Type      | Replicas | Durability | TTL | Items | RAM Quota  | RAM Used  | Disk Used |+----------------------------+-----------+----------+------------+-----+-------+------------+-----------+-----------+| s9NLzpdOOFpz3X%--30-262000 | couchbase | 1        | none       | 0   | 10000 | 4194304000 | 266983632 | 434381289 |+----------------------------+-----------+----------+------------+-----+-------+------------+-----------+-----------+

      3. Create 10 GSI indexes per collection with no replicas, and with defer build (so a total of 10K deferred indexes)
      Indexer crashed midway into step 3 and create index requests started failing

      2021-05-11 04:56:16,259 | test  | ERROR   | MainThread | [rest_client:_http_request:748] POST http://172.23.106.236:8093/query?statement=CREATE+INDEX+%60gsi6221%60+ON+%60s9NLzpdOOFpz3X%25--30-262000%60.%60y--30-347000%60.%60JaMXMS2HcvZIp%25hXEo-30-354000%60%28%60age%60%29+WITH+%7B+%27defer_build%27%3A+true%2C+%27num_replica%27%3A+0+%7D body:  headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==\n', 'Content-Type': 'application/x-www-form-urlencoded'} error: 500 reason: unknown {
      "requestID": "9e9cf0bf-cd7d-4338-ab65-4c15823440ce",
      "signature": null,
      "results": [
      ],
      "errors": [{"code":5000,"msg":"GSI CreateIndex() - cause: Index creation for index gsi6221, bucket s9NLzpdOOFpz3X%--30-262000, scope y--30-347000, collection JaMXMS2HcvZIp%hXEo-30-354000 cannot start. Reason: Create index or Alter replica cannot proceed due to network partition, node failover or indexer failure.."}],
      "status": "errors",
      "metrics": {"elapsedTime": "3.641195108s","executionTime": "3.641139289s","resultCount": 0,"resultSize": 0,"serviceLoad": 3,"errorCount": 1}
      } auth: Administrator:password

       

      Attachments

        1. consoleText.txt
          104 kB
        2. log1.png
          log1.png
          589 kB
        3. log2.png
          log2.png
          577 kB
        4. servers.png
          servers.png
          454 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              akhil.mundroy Akhil Mundroy
              sumedh.basarkod Sumedh Basarkod (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty