Details

    • Untriaged
    • 1
    • Unknown
    • CX Sprint 235

    Description

      seen in UI logs:

       
      Service 'cbas' exited with status 1. Restarting. Messages:
      2021-01-24T20:27:16.749-08:00 ERRO CBAS.cbas error from clusterCompatibility monitor endpoint: Get http://127.0.0.1:8091/poolsStreaming/default: dial tcp 127.0.0.1:8091: i/o timeout, restarting in 5s...
      2021-01-24T20:27:16.749-08:00 ERRO CBAS.cbas error from clusterCompatibility monitor endpoint: Get http://127.0.0.1:8091/poolsStreaming/default: dial tcp 127.0.0.1:8091: i/o timeout, restarting in 5s...
      2021-01-24T20:27:16.750-08:00 ERRO CBAS.cbas error from clusterCompatibility monitor endpoint: invalid byte in chunk length, restarting in 5s...
      2021-01-24T20:27:16.751-08:00 ERRO CBAS.cbas error from clusterCompatibility monitor endpoint: Get http://127.0.0.1:8091/poolsStreaming/default: dial tcp 127.0.0.1:8091: i/o timeout, restarting in 5s...
      2021-01-24T20:27:16.751-08:00 ERRO CBAS.cbas error from clusterCompatibility monitor endpoint: Get http://127.0.0.1:8091/poolsStreaming/default: dial tcp 127.0.0.1:8091: i/o timeout, restarting in 5s...
      2021-01-24T20:27:16.751-08:00 FATA CBAS.cbas Unexpected error waiting for node ec5369e7dbf94ae356bbed5f19899413 config: Get http://127.0.0.1:8091/_metakv/cbas/config/node/?feed=continuous: dial tcp 127.0.0.1:8091: i/o timeout
       
      ns_log 000
      ns_1@172.23.97.86
      8:27:16 PM   24 Jan, 2021
      

      Logs: https://s3.amazonaws.com/cb-engineering/perry/timers_lost/collectinfo-2021-01-25T091645-ns_1%40172.23.97.84.zip
      https://s3.amazonaws.com/cb-engineering/perry/timers_lost/collectinfo-2021-01-25T091645-ns_1%40172.23.97.85.zip
      https://s3.amazonaws.com/cb-engineering/perry/timers_lost/collectinfo-2021-01-25T091645-ns_1%40172.23.97.86.zip

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            michael.blow Michael Blow added a comment -

            The endpoints to "do something" with analytics is provided by the java processes- so unfortunately if the service is deployed i.e. active we don't have much choice but to stand up the JVM.

            michael.blow Michael Blow added a comment - The endpoints to "do something" with analytics is provided by the java processes- so unfortunately if the service is deployed i.e. active we don't have much choice but to stand up the JVM.

            Hi Perry Krug,

            If the cluster is still up, could you please get us the contents of the following files from node .86?

            /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/checkpoint_120 /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/checkpoint_121 /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/checkpoint_122
            

            murtadha.hubail Murtadha Hubail added a comment - Hi Perry Krug , If the cluster is still up, could you please get us the contents of the following files from node .86? /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/checkpoint_120 /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/checkpoint_121 /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/checkpoint_122

            Perry Krug,

            I believe I have identified the issue. As a workaround, try to delete the following file on node .86:

            /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/transaction_log_120

            The Analytics service should eventually recover after that.

            murtadha.hubail Murtadha Hubail added a comment - Perry Krug , I believe I have identified the issue. As a workaround, try to delete the following file on node .86: /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log/transaction_log_120 The Analytics service should eventually recover after that.
            perry Perry Krug added a comment -

            [root@s60803-cnt7 txn-log]# pwd
            /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log
            [root@s60803-cnt7 txn-log]# cat checkpoint_120
            {"@type":"Checkpoint","@version":1,"@class":"org.apache.asterix.common.transactions.Checkpoint","id":120,"checkpointLsn":31406950368,"minMCTFirstLsn":-1,"maxTxnId":604,"sharp":true,"storageVersion":12}[root@s60803-cnt7 txn-log]# 
            [root@s60803-cnt7 txn-log]# cat checkpoint_121
            {"@type":"Checkpoint","@version":1,"@class":"org.apache.asterix.common.transactions.Checkpoint","id":121,"checkpointLsn":31675385248,"minMCTFirstLsn":-1,"maxTxnId":604,"sharp":true,"storageVersion":12}[root@s60803-cnt7 txn-log]# 
            [root@s60803-cnt7 txn-log]# cat checkpoint_122
            {"@type":"Checkpoint","@version":1,"@class":"org.apache.asterix.common.transactions.Checkpoint","id":122,"checkpointLsn":31943820704,"minMCTFirstLsn":-1,"maxTxnId":604,"sharp":true,"storageVersion":12}[root@s60803-cnt7 txn-log]#
            

            I deleted the file, will keep an eye on it as it seems it was crashing every couple days or so.

            perry Perry Krug added a comment - [root@s60803-cnt7 txn-log]# pwd /opt/couchbase/var/lib/couchbase/data/@analytics/v_iodevice_0/txn-log [root@s60803-cnt7 txn-log]# cat checkpoint_120 {"@type":"Checkpoint","@version":1,"@class":"org.apache.asterix.common.transactions.Checkpoint","id":120,"checkpointLsn":31406950368,"minMCTFirstLsn":-1,"maxTxnId":604,"sharp":true,"storageVersion":12}[root@s60803-cnt7 txn-log]# [root@s60803-cnt7 txn-log]# cat checkpoint_121 {"@type":"Checkpoint","@version":1,"@class":"org.apache.asterix.common.transactions.Checkpoint","id":121,"checkpointLsn":31675385248,"minMCTFirstLsn":-1,"maxTxnId":604,"sharp":true,"storageVersion":12}[root@s60803-cnt7 txn-log]# [root@s60803-cnt7 txn-log]# cat checkpoint_122 {"@type":"Checkpoint","@version":1,"@class":"org.apache.asterix.common.transactions.Checkpoint","id":122,"checkpointLsn":31943820704,"minMCTFirstLsn":-1,"maxTxnId":604,"sharp":true,"storageVersion":12}[root@s60803-cnt7 txn-log]# I deleted the file, will keep an eye on it as it seems it was crashing every couple days or so.
            umang.agrawal Umang added a comment -

            Closing this issue based on regression runs on build 6.6.2-9556 and Longevity Test on build 6.6.2-9557

            umang.agrawal Umang added a comment - Closing this issue based on regression runs on build 6.6.2-9556 and Longevity Test on build 6.6.2-9557

            People

              umang.agrawal Umang
              perry Perry Krug
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty