Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47633

[BP 7.0.1 MB-46945] - [System Test] : Rebalance failure due to reason service_rebalance_failed,index - Post http://127.0.0.1:9102/createIndexRebalance: EOF

    XMLWordPrintable

Details

    Description

      Build : 7.0.0-5295
      Test : -test tests/integration/cheshirecat/test_cheshirecat_kv_gsi_coll_xdcr_backup_sgw_fts_itemct_txns_eventing_cbas_scale3.yml -scope tests/integration/cheshirecat/scope_cheshirecat_with_backup.yml
      Scale : 3
      Iteration : 5th (Day 5th)

      Rebalance operation to add a new index node 172.23.123.24 to the cluster failed.

      This seems to be an intermittent issue. It was recently reported and fixed via MB-46039 as well.

      From test console

      [2021-06-15T18:42:18-07:00, sequoiatools/couchbase-cli:7.0:7b1dbd] server-add -c 172.23.97.74:8091 --server-add https://172.23.123.24 -u Administrator -p password --server-add-username Administrator --server-add-password password --services index
      [2021-06-15T18:42:34-07:00, sequoiatools/couchbase-cli:7.0:563a15] rebalance -c 172.23.97.74:8091 -u Administrator -p password
       
      Error occurred on container - sequoiatools/couchbase-cli:7.0:[rebalance -c 172.23.97.74:8091 -u Administrator -p password]
       
      docker logs 563a15
      docker start 563a15
       
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      [2021-06-15T19:12:31-07:00, sequoiatools/cmd:9a04e4] 60
      

      From error.log on 172.23.106.134 :

      [ns_server:error,2021-06-15T19:12:23.994-07:00,ns_1@172.23.106.134:service_rebalancer-index<0.29734.2887>:service_rebalancer:run_rebalance_worker:119]Worker terminated abnormally: {'EXIT',<0.28245.2887>,
                                     {rebalance_failed,
                                      {service_error,
                                       <<"Post http://127.0.0.1:9102/createIndexRebalance: EOF">>}}}
      [user:error,2021-06-15T19:12:23.996-07:00,ns_1@172.23.106.134:<0.17645.1347>:ns_orchestrator:log_rebalance_completion:1416]Rebalance exited with reason {service_rebalance_failed,index,
                                    {worker_died,
                                     {'EXIT',<0.28245.2887>,
                                      {rebalance_failed,
                                       {service_error,
                                        <<"Post http://127.0.0.1:9102/createIndexRebalance: EOF">>}}}}}.
      Rebalance Operation Id = 6058b85561144335d30164f8c1a96327
      

      From the rebalance report :

      "index":{
               "totalProgress":69.18429003021149,
               "perNodeProgress":{
                  "ns_1@172.23.97.110":0.6918429003021148,
                  "ns_1@172.23.96.243":0.6918429003021148,
                  "ns_1@172.23.123.24":0.6918429003021148,
                  "ns_1@172.23.97.105":0.6918429003021148,
                  "ns_1@172.23.120.75":0.6918429003021148,
                  "ns_1@172.23.97.148":0.6918429003021148,
                  "ns_1@172.23.120.58":0.6918429003021148
               },
               "startTime":"2021-06-15T18:42:48.907-07:00",
               "completedTime":false,
               "timeTaken":1775138
            }
      

      On indexer node 172.23.123.24, seeing the following in the indexer logs :

      2021-06-15T19:12:20.974-07:00 [Info] Rebalancer::decodeTransferToken TransferToken TransferToken67:2b:33:90:cf:8e:7e:4b  MasterId: a826a4733e9644442e2517288e82a0d8 SourceId: be30ddb96b59e6b70c07733a0155e0d6 (172.23.120.58:8091) DestId: e4198e9e98e43788fae35314ade88f0a (172.23.97.105:8091) RebalId: 5c70774072149bca4d20d7ec2f0ec364 State: TransferTokenCreated BuildSource: Dcp TransferMode: Move Error: Post http://127.0.0.1:9102/createIndexRebalance: EOF InstId: 16276776005385447829 RealInstId: 14031018094979839685 Partitions: [2] Versions: [4] Inst: 
      	InstId: 14031018094979839685
      	Defn: DefnId: 7702856850988250970 Name: idx3_ftdQ Using: plasma Bucket: bucket4 Scope/Id: scope_1/9 Collection/Id: coll_2/f IsPrimary: false NumReplica: 2 InstVersion: 4 
      		SecExprs: <ud>([`free_breakfast` `free_parking` `country` `city`])</ud> 
      		Desc: [false false false false]
      		PartitionScheme: KEY 
      		HashScheme: CRC32 PartitionKeys: [(meta().`id`)] WhereExpr: <ud>()</ud> RetainDeletedXATTR: false 
      	State: INDEX_STATE_ACTIVE
      	RState: RebalActive
      	Stream: NIL_STREAM
      	Version: 3
      	ReplicaId: 1
      	PartitionContainer: <nil> 
      2021-06-15T19:12:20.974-07:00 [Error] Rebalancer::processTokenAsMaster Detected TransferToken in Error state  MasterId: a826a4733e9644442e2517288e82a0d8 SourceId: be30ddb96b59e6b70c07733a0155e0d6 (172.23.120.58:8091) DestId: e4198e9e98e43788fae35314ade88f0a (172.23.97.105:8091) RebalId: 5c70774072149bca4d20d7ec2f0ec364 State: TransferTokenCreated BuildSource: Dcp TransferMode: Move Error: Post http://127.0.0.1:9102/createIndexRebalance: EOF InstId: 16276776005385447829 RealInstId: 14031018094979839685 Partitions: [2] Versions: [4] Inst: 
      	InstId: 14031018094979839685
      	Defn: DefnId: 7702856850988250970 Name: idx3_ftdQ Using: plasma Bucket: bucket4 Scope/Id: scope_1/9 Collection/Id: coll_2/f IsPrimary: false NumReplica: 2 InstVersion: 4 
      		SecExprs: <ud>([`free_breakfast` `free_parking` `country` `city`])</ud> 
      		Desc: [false false false false]
      		PartitionScheme: KEY 
      		HashScheme: CRC32 PartitionKeys: [(meta().`id`)] WhereExpr: <ud>()</ud> RetainDeletedXATTR: false 
      	State: INDEX_STATE_ACTIVE
      	RState: RebalActive
      	Stream: NIL_STREAM
      	Version: 3
      	ReplicaId: 1
      	PartitionContainer: <nil> 
      . Abort.
      2021-06-15T19:12:20.974-07:00 [Info] Rebalancer::doFinish Cleanup Post http://127.0.0.1:9102/createIndexRebalance: EOF
      2021-06-15T19:12:20.975-07:00 [Info] Rebalancer::processDropIndexQueue Done Received
      2021-06-15T19:12:20.975-07:00 [Info] Rebalancer::observeRebalance exiting err <nil>
      2021-06-15T19:12:20.975-07:00 [Info] Rebalancer::updateProgress Done Received
      

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-47633
          # Subject Branch Project Status CR V

          Activity

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              jeelan.poola Jeelan Poola
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty