Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59967

[System Test] : Graceful failover failed after 24 hours with "unexpected_status, <<"stream_does_not_exist""

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • None
    • 7.6.0
    • couchbase-bucket, ns_server
    • Enterprise Edition 7.6.0 build 1878
    • Untriaged
    • Linux x86_64
    • 0
    • Yes

    Description

      Script to Repro

      + ./sequoia -client 172.23.110.181:2375 -provider file:debian_pine.yml -test tests/integration/7.6/test_7.6.yml -scope tests/integration/7.6/scope_7.6_magma.yml -scale 2 -repeat 0 -log_level 0 -version 7.6.0-1878 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=1209600 -show_topology=true
      

      Step at which it failed

      [2023-12-04T03:59:03-08:00, sequoiatools/couchbase-cli:7.6:6f8cbe] failover -c 172.23.106.109:8091 --server-failover 172.23.97.243:8091 -u Administrator -p password
       
      Error occurred on container - sequoiatools/couchbase-cli:7.6:[failover -c 172.23.106.109:8091 --server-failover 172.23.97.243:8091 -u Administrator -p password]
       
      docker logs 6f8cbe
      docker start 6f8cbe
       
      *Unable to display progress bar on this os
      RERROR: Graceful failover failed. See logs for detailed reason. You can try again.
      

      I saw that graceful failover was stuck at bucket7 for almost 24 hours and then failed eventually as shown below.
      172.23.96.203 5:35:17 AM 5 Dec, 2023

      Graceful failover exited with reason {mover_crashed,
      {unexpected_exit,
      {'EXIT',<0.5126.1790>,
      {{dcp_wait_for_data_move_failed,
      "bucket7",815,'ns_1@172.23.97.243',
      ['ns_1@172.23.97.242'],
      {error,
      {unexpected_status,
      <<"stream_does_not_exist">>},
      "Error getting dcp stats on 'ns_1@172.23.97.243' for bucket \"bucket7\", partition 815, connection \"replication:ns_1@172.23.97.243->ns_1@172.23.97.242:bucket7\": {unexpected_status,\n <<\"stream_does_not_exist\">>}"}},
      [{ns_single_vbucket_mover,
      '-wait_dcp_data_move/5-fun-0-',5,
      [{file,
      "src/ns_single_vbucket_mover.erl"},
      {line,453}]},
      {proc_lib,init_p,3,
      [{file,"proc_lib.erl"},
      {line,225}]}]}}}}.
      Rebalance Operation Id = 0c3e0042a2f7cf8f2de3d90c596dcb31 hidens_orchestrator 000ns_1@172.23.96.2035:35:17 AM 5 Dec, 2023
      

      Cbcollect attached. This was not seen on the run we had on 7.6.0-1845. System test runs are highly unreliable from perspective of consistent repro's. So last working build need to be taken with a grain of salt.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty