Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44798

[Volume Test] analytics node becomes unresponsive

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • 7.0.0
    • Cheshire-Cat
    • analytics
    • Enterprise Edition 7.0.0 build 4603 ‧ IPv4
    • Untriaged
    • 1
    • No
    • CX Sprint 240

    Description

      1. Create a cluster of 8 nodes - 3 KV + 5 CBAS
      2. Create 200 Collection and 400 Datasets
      3. There are 8 steps of rebalance in/out. Post every step, validate doc counts, by running count against dataset.
      4. Following is the last step in terms of rebalance
      ----------------------------------------------------------------------------

      Nodes Services Version CPU Status

      ----------------------------------------------------------------------------

      172.23.97.217 kv 7.0.0-4603-enterprise 6.06175173283 Cluster node
      172.23.97.215 kv 7.0.0-4603-enterprise 10.6358819076 Cluster node
      172.23.107.3 cbas 7.0.0-4603-enterprise 18.812381554 Cluster node
      172.23.97.237 cbas 7.0.0-4603-enterprise 13.9769998736 Cluster node
      172.23.107.5 cbas 7.0.0-4603-enterprise 25.2538071066 Cluster node
      172.23.107.2 cbas 7.0.0-4603-enterprise 17.1496975806 Cluster node
      172.23.107.4 cbas 7.0.0-4603-enterprise 24.0794634949 Cluster node
      172.23.97.227 index, kv, n1ql 7.0.0-4603-enterprise 9.40160081311 — OUT --->
      172.23.97.216 ['kv']     <--- IN —
      172.23.97.236 ['kv']     <--- IN —

      ----------------------------------------------------------------------------
      This rebalance was abruptly terminated by janitor.

      5. Next steps to do count on each dataset, fails
      2021-03-08 01:07:50,125 | infra | ERROR | cbas_worker_19 | [Rest_Connection:_http_request:257] Socket error while connecting to http://172.23.107.4:8095/analytics/service. Error timed out
      2021-03-08 01:07:50,125 | infra | ERROR | cbas_worker_9 | [Rest_Connection:_http_request:257] Socket error while connecting to http://172.23.107.4:8095/analytics/service. Error timed out
      2021-03-08 01:07:50,125 | infra | ERROR | cbas_worker_16 | [Rest_Connection:_http_request:257] Socket error while connecting to http://172.23.107.4:8095/analytics/service. Error timed out
      2021-03-08 01:07:50,125 | infra | ERROR | cbas_worker_20 | [Rest_Connection:_http_request:257] Socket error while connecting to http://172.23.107.4:8095/analytics/service. Error timed out

      6. Running simple "select count from Metadata.`Dataset` where DataverseName != "Metadata"; from workbench does not get any results.

      Seems like analytics service on .5 has died.

      After an hour analytics was back online:
      select count from Metadata.`Dataset` where DataverseName != "Metadata";
      [

      { "$1": 400 }

      ]

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ritam.sharma Ritam Sharma
            ritam.sharma Ritam Sharma
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty