Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44138

Analytics service timing out even without any activity - Error with query monitoring: [{"code":23000,"msg":"Analytics Service is temporarily u

    XMLWordPrintable

Details

    Description

      • Create the following:
        172.23.120.74 -b default,WAREHOUSE,NEW_ORDER,ITEM -o create_cbas_infra --dataverse_count 50 --dataset_count 200 --index_count 0 --indexed_data_source catapult --synonym_count 100 --thread_count 20 --api_timeout 3600
      • Query fails with analytics service not available.

      Looking at logs:

      First set of errors/logs:
      Service 'cbas' exited with status 1. Restarting. Messages:
      2021-02-04T19:37:28.636-08:00 WARN CBAS.cbas restarting driver due to encryption setting change
      2021-02-04T19:37:28.636-08:00 INFO CBAS.cbas restart driver request rec'd; restarting analytics driver
      2021-02-04T19:37:28.637-08:00 INFO CBAS.control.StdInWatcher [Stdin Watcher] Blank line on stdin; shutting down...
      2021-02-04T19:37:28.638-08:00 FATA CBAS.bootstrap.Auditor [main] auditor bootstrap aborted due to interrupt
      2021-02-04T19:37:29.056-08:00 WARN CBAS.cbas analytics driver has exited w/ error exit status 1
      2021-02-04T19:37:29.056-08:00 INFO CBAS.cbas cbAddress: http://127.0.0.1:8091
      2021-02-04T19:37:31.182-08:00 INFO CBAS.cbas Adding replica b926bb1a9187fde23750d69615f74011 at 172.23.120.75:9120
      2021-02-04T19:37:31.182-08:00 ERRO CBAS.cbas Failed to add replica Post https://172.23.120.74:9110/analytics/node/storage/addReplica: dial tcp 172.23.120.74:9110: connect: connection refused
      2021-02-04T19:37:31.182-08:00 FATA CBAS.cbas Unexpected failure while sync'ing replicas Post https://172.23.120.74:9110/analytics/node/storage/addReplica: dial tcp 172.23.120.74:9110: connect: connection refused

      Rebalance Failure:
      Rebalance exited with reason {service_rebalance_failed,cbas,
      {worker_died,
      {'EXIT',<0.24932.527>,
      {rebalance_failed,

      {service_error, <<"Rebalance 325d1ec0e542ba360b90412ac4d8e101 failed: CBAS0001: Datasets in different partitions have different DCP states. Mutations needed to catch up = 394301. User action: Try again later">>}

      }}}}.
      Rebalance Operation Id = eb00e6b58bbe3684f24fa220de7ac7b7

      Also noticed high CPU and memory usage.

      Second set of logs:
      – At the last point there is no activity on cluster, and analytics services is still failing

      Error with query monitoring: [{"code":23000,"msg":"Analytics Service is temporarily u

      Analytics Nodes:
      172.23.120.74
      172.23.120.75

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ritam.sharma Ritam Sharma
              ritam.sharma Ritam Sharma
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty