Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31331

[System Test]: disconnect link Local failed with Analytics Service is temporarily unavailable

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Done
    • 6.0.0
    • 6.0.0
    • analytics
    • centos longevity
    • Untriaged
    • Yes
    • CX Sprint 120, CX Sprint 121

    Description

      Build: 6.0.0 build 1643 

      Test: Alice longevity

      Cycle: 3rd

      disconnect link Local failed with "Analytics Service is temporarily unavailable"

      [2018-09-17T09:06:16-07:00, sequoiatools/cbq:e48dad] -e=http://172.23.108.104:8095 -u=Administrator -p=password -script=disconnect link Local; -t 2m
       
      Error occurred on container - sequoiatools/cbq:[-e=http://172.23.108.104:8095 -u=Administrator -p=password -script=disconnect link Local; -t 2m]
       
      docker logs e48dad
      docker start e48dad
       
      AConnected to : http://172.23.108.104:8095/. Type Ctrl-D to exit.
      
      - ERROR 108 : N1QL: Connection failure {
      6	"requestID": "4c35eb03-7cca-4357-bc4a-71df5801f483",
      	"errors": [{ 
      		"code": 23000,
      8		"msg": "Analytics Service is temporarily unavailable"
      	}],
      	"status": "fatal",
      	"metrics": {
      "		"elapsedTime": "19.876920686s",
      $		"executionTime": "19.870528543s",
      		"resultCount": 0,
      		"resultSize": 0,
      		"processedObjects": 0,
      		"errorCount": 1
      	}
      } 

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Vikas Chaudhary,

          After the rebalance out of 172.23.108.103 (the CC at that time), cbas was supposed to have two node (172.23.97.238 and 172.23.104.87). Node 172.23.97.238 was assigned as the new CC. After a CC rebalance out, the AnalyticsDriver process is restarted:

          2018-09-25T03:42:08.063-07:00 INFO CBAS.cbas our cc node has changed (was: dc90a8ed74735d7f6e8cf3f5c2a3567b, now: 4c665130f6cd10952b333e0f6085c72f); instruct driver to restart...
          

          After that, it took node 172.23.104.87 about 8 minutes to connect to the new CC:

          2018-09-25T03:50:48.799-07:00 INFO CBAS.work.RegisterNodeWork [Worker:ClusterController] registering node: 8d1d7b9aced5ec3364e38765a948cb78
          2018-09-25T03:50:50.776-07:00 INFO CBAS.utils.ClusterStateManager [Executor-10:ClusterController] Cluster State is now ACTIVE
          

          The connect link statement was issued during those 8 minutes, and that is why it failed. We need the logs of 172.23.104.87 to see why it took this long to connect to the new CC, but I couldn't find them in the attached logs. Could you please attach them?

          murtadha.hubail Murtadha Hubail added a comment - Vikas Chaudhary , After the rebalance out of 172.23.108.103 (the CC at that time), cbas was supposed to have two node (172.23.97.238 and 172.23.104.87). Node 172.23.97.238 was assigned as the new CC. After a CC rebalance out, the AnalyticsDriver process is restarted: 2018-09-25T03:42:08.063-07:00 INFO CBAS.cbas our cc node has changed (was: dc90a8ed74735d7f6e8cf3f5c2a3567b, now: 4c665130f6cd10952b333e0f6085c72f); instruct driver to restart... After that, it took node 172.23.104.87 about 8 minutes to connect to the new CC: 2018-09-25T03:50:48.799-07:00 INFO CBAS.work.RegisterNodeWork [Worker:ClusterController] registering node: 8d1d7b9aced5ec3364e38765a948cb78 2018-09-25T03:50:50.776-07:00 INFO CBAS.utils.ClusterStateManager [Executor-10:ClusterController] Cluster State is now ACTIVE The connect link statement was issued during those 8 minutes, and that is why it failed. We need the logs of 172.23.104.87 to see why it took this long to connect to the new CC, but I couldn't find them in the attached logs. Could you please attach them?
          vikas.chaudhary Vikas Chaudhary added a comment - Murtadha Hubail Here are the logs https://s3.amazonaws.com/bugdb/jira/1661_disconnect/172.23.104.87.zip  

          Thanks Vikas Chaudhary. We found an issue that could intermittently delay the restart of the AnalyticsDriver process. The fix is already up and just pending a review.

          murtadha.hubail Murtadha Hubail added a comment - Thanks Vikas Chaudhary . We found an issue that could intermittently delay the restart of the AnalyticsDriver process. The fix is already up and just pending a review.

          Vikas Chaudhary,

          The fix for the intermittent restart delay on CC rebalance out has been merged and should be available in the next Alice build (1667).

          murtadha.hubail Murtadha Hubail added a comment - Vikas Chaudhary , The fix for the intermittent restart delay on CC rebalance out has been merged and should be available in the next Alice build (1667).

          Verified on 6.0.0-1673 (RC2). The longevity test has completed 4 cycles and we havent encountered this issue. We will reopen the issue if seen later. We will continue the test for 7 days.

          mihir.kamdar Mihir Kamdar (Inactive) added a comment - Verified on 6.0.0-1673 (RC2). The longevity test has completed 4 cycles and we havent encountered this issue. We will reopen the issue if seen later. We will continue the test for 7 days.

          People

            vikas.chaudhary Vikas Chaudhary
            vikas.chaudhary Vikas Chaudhary
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty