Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46085

[System Test] 14 Index rebalance operations failed due to error "Unable to read index layout from cluster 127.0.0.1:8091. err = response status:500 cause:unexpected end of JSON input"

    XMLWordPrintable

Details

    Description

      Build : 7.0.0-5075
      Test : -test tests/2i/cheshirecat/test_idx_clusterops_cheshire_cat_moi.yml -scope tests/2i/cheshirecat/scope_idx_cheshire_cat_moi.yml
      Scale : 2
      Iteration : 4th onwards (day 2)

      14 rebalance operations in this test starting from the 4th iteration have failed due to the following type of error as seen in the error.log of the orchestrator node 172.23.97.215.

      [ns_server:error,2021-05-03T06:37:03.461-07:00,ns_1@172.23.97.215:service_rebalancer-index<0.21017.829>:service_rebalancer:run_rebalance_worker:119]Worker terminated abnormally: {'EXIT',<0.26091.829>,
                                     {rebalance_failed,
                                      {service_error,
                                       <<"Unable to read index layout from cluster 127.0.0.1:8091. err = response status:500 cause:unexpected end of JSON input\n">>}}}
      [user:error,2021-05-03T06:37:03.464-07:00,ns_1@172.23.97.215:<0.8584.0>:ns_orchestrator:log_rebalance_completion:1405]Rebalance exited with reason {service_rebalance_failed,index,
                                    {worker_died,
                                     {'EXIT',<0.26091.829>,
                                      {rebalance_failed,
                                       {service_error,
                                        <<"Unable to read index layout from cluster 127.0.0.1:8091. err = response status:500 cause:unexpected end of JSON input\n">>}}}}}.
      Rebalance Operation Id = ed7df537411873157be53916cb670554
      

      Around the same time, on 172.23.97.216, the following errors/warnings can be seen in the indexer.log

      2021-05-03T06:37:02.821-07:00 [Info] DDLServiceMgr::handleListCreateTokens Processing Request req: Method GET, Host 172.23.97.216:9102, ContentLength 0, UserAgent Go-http-client/1.1, RemoteAddr 172.23.97.216:36110
      2021-05-03T06:37:02.936-07:00 [Error] Planner::getLocalCreateTokensResp: Failed to get create tokens from node: 172.23.97.217:9102, err: response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:02.936-07:00 [Warn] Planner::restHelperNoLock LocalCreateTokensResp took 115.277205ms for addr 172.23.97.217:9102 with err response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:02.986-07:00 [Info] DDLServiceMgr::handleListCreateTokens error unexpected end of JSON input in FetchCreateCommandToken for entry /indexing/ddl/commandToken/create/2444621758783027348. req: Method GET, Host 172.23.97.216:9102, ContentLength 0, UserAgent Go-http-client/1.1, RemoteAddr 172.23.97.216:36110
      2021-05-03T06:37:02.986-07:00 [Error] Planner::getLocalCreateTokensResp: Failed to get create tokens from node: 172.23.97.216:9102, err: response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:02.986-07:00 [Warn] Planner::restHelperNoLock LocalCreateTokensResp took 165.289725ms for addr 172.23.97.216:9102 with err response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:03.011-07:00 [Error] Planner::getLocalCreateTokensResp: Failed to get create tokens from node: 172.23.107.3:9102, err: response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:03.011-07:00 [Warn] Planner::restHelperNoLock LocalCreateTokensResp took 190.601704ms for addr 172.23.107.3:9102 with err response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:03.016-07:00 [Error] Planner::getLocalCreateTokensResp: Failed to get create tokens from node: 172.23.107.5:9102, err: response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:03.016-07:00 [Warn] Planner::restHelperNoLock LocalCreateTokensResp took 195.411818ms for addr 172.23.107.5:9102 with err response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:03.018-07:00 [Error] Planner::getLocalCreateTokensResp: Failed to get create tokens from node: 172.23.107.4:9102, err: response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:03.018-07:00 [Warn] Planner::restHelperNoLock LocalCreateTokensResp took 196.989178ms for addr 172.23.107.4:9102 with err response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:03.052-07:00 [Info] janitor: Processing delete token /indexing/ddl/commandToken/delete/18047232184212857730
      2021-05-03T06:37:03.207-07:00 [Error] Planner::getLocalCreateTokensResp: Failed to get create tokens from node: 172.23.107.2:9102, err: response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:03.207-07:00 [Warn] Planner::restHelperNoLock LocalCreateTokensResp took 386.695578ms for addr 172.23.107.2:9102 with err response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:03.207-07:00 [Error] Rebalancer::initRebalAsync Planner Error Unable to read index layout from cluster 127.0.0.1:8091. err = response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:03.207-07:00 [Info] Rebalancer::doFinish Cleanup Unable to read index layout from cluster 127.0.0.1:8091. err = response status:500 cause:unexpected end of JSON input
      2021-05-03T06:37:03.207-07:00 [Info] ServiceMgr::onRebalanceDoneLOCKED Rebalance Done, cancel: false, err: Unable to read index layout from cluster 127.0.0.1:8091. err = response status:500 cause:unexpected end of JSON input
      

      Similar errors are also seen for 13 other rebalance operations at these times :
      2021-05-03T06:54:31
      2021-05-03T07:12:53
      2021-05-03T07:30:55
      2021-05-03T07:48:59
      2021-05-03T08:26:12
      2021-05-03T08:49:17
      2021-05-03T09:48:22
      2021-05-03T13:26:04
      2021-05-03T14:01:20
      2021-05-03T14:19:10
      2021-05-03T14:36:43
      2021-05-03T15:08:51
      2021-05-03T15:32:18

      Logs attached are from ~9 AM on 5/3/21. Please let me know if you need logs from before or after.

      These issues weren't seen in the run with 7.0.0-4960, hence marking it as a regression.

      Indexer nodes : 172.23.107.2, 172.23.107.3, 172.23.107.4, 172.23.107.5, 172.23.97.216, 172.23.97.217

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-46085
          # Subject Branch Project Status CR V

          Activity

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty