Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44085

Rebalance failure in the fts server where a worker dies.

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Yes

    Description

      Description:

      Observed a rebalance failure in a backup restore test involving the fts service where 'a worker is terminated abnormally'.

      Steps to reproduce:

      (I'm not 100% sure on the steps here as it's part of a test with lots of tests preceding it)

      There are two nodes involved in this particular test:

      (I'm guessing these are 172.23.123.117 and 172.23.123.109 based on the logs).

      From the test code I can gather that:

      1. Node 2 is added to Node 1 specifying kv and fts as the services that should run on Node 2.

      2. A rebalance happens.

      What happens:

      The rebalance fails with the following message present in the logs:

      (From 172.23.123.117)

      cbcollect_info_ns_1@172.23.123.117_20210202-115415/ns_server.error.log

      [ns_server:error,2021-02-02T03:47:33.824-08:00,ns_1@172.23.123.117:service_rebalancer-fts<0.31138.8>:service_rebalancer:run_rebalance_worker:125]Worker terminated abnormally: {'EXIT',<0.31152.8>,
                                     {rebalance_failed,
                                      {service_error,
                                       <<"planner: indexDefs.ImplVersion:  > version: 5.5.0">>}}}
      [user:error,2021-02-02T03:47:33.834-08:00,ns_1@172.23.123.117:<0.22240.8>:ns_orchestrator:log_rebalance_completion:1402]Rebalance exited with reason {service_rebalance_failed,fts,
                                    {worker_died,
                                     {'EXIT',<0.31152.8>,
                                      {rebalance_failed,
                                       {service_error,
                                        <<"planner: indexDefs.ImplVersion:  > version: 5.5.0">>}}}}}.
      Rebalance Operation Id = 39e9f04255cf50ee97eed090d380830c
      [ns_server:error,2021-02-02T03:47:43.921-08:00,ns_1@172.23.123.117:service_rebalancer-fts<0.31827.8>:service_rebalancer:run_rebalance_worker:125]Worker terminated abnormally: {'EXIT',<0.31841.8>,
                                     {rebalance_failed,
                                      {service_error,
                                       <<"planner: indexDefs.ImplVersion:  > version: 5.5.0">>}}}
      [user:error,2021-02-02T03:47:43.922-08:00,ns_1@172.23.123.117:<0.22240.8>:ns_orchestrator:log_rebalance_completion:1402]Rebalance exited with reason {service_rebalance_failed,fts,
                                    {worker_died,
                                     {'EXIT',<0.31841.8>,
                                      {rebalance_failed,
                                       {service_error,
                                        <<"planner: indexDefs.ImplVersion:  > version: 5.5.0">>}}}}}.
      Rebalance Operation Id = 821db8b83b00decae950c50a8a5b5933

      (From 172.23.123.109)

      cbcollect_info_ns_1@172.23.123.109_20210202-115832/ns_server.error.log

      [ns_server:error,2021-02-02T03:47:33.826-08:00,ns_1@172.23.123.109:service_agent-fts<0.27302.7>:service_agent:handle_info:287]Rebalancer <27513.31138.8> died unexpectedly: {worker_died,
                                                     {'EXIT',<27513.31152.8>,
                                                      {rebalance_failed,
                                                       {service_error,
                                                        <<"planner: indexDefs.ImplVersion:  > version: 5.5.0">>}}}}
      [ns_server:error,2021-02-02T03:47:43.922-08:00,ns_1@172.23.123.109:service_agent-fts<0.27302.7>:service_agent:handle_info:287]Rebalancer <27513.31827.8> died unexpectedly: {worker_died,
                                                     {'EXIT',<27513.31841.8>,
                                                      {rebalance_failed,
                                                       {service_error,
                                                        <<"planner: indexDefs.ImplVersion:  > version: 5.5.0">>}}}}

      What I expected to happen:

      I expected the rebalance to succeed.

      The logs:

      The rebalance errors were only present in these two sets of logs:

      172.23.123.109-20210202-0358-diag.zip

      172.23.123.117-20210202-0354-diag.zip

      I've also uploaded the logs for the other nodes which I presume were not involved in the test as they contain no 'rebalance failed' error messages.

       

      Perhaps the following also might be of interest as these are the first two nodes specified in the ini file.

      172.23.123.105-20210202-0347-diag.zip

      172.23.123.116-20210202-0351-diag.zip

       

       

       

       

      Attachments

        1. 172.23.100.15-20210205-0305-diag.zip
          9.91 MB
        2. 172.23.100.16-20210205-0305-diag.zip
          10.78 MB
        3. 172.23.100.17-20210205-0305-diag.zip
          9.13 MB
        4. 172.23.100.18-20210205-0305-diag.zip
          9.39 MB
        5. 172.23.105.79-20210205-0305-diag.zip
          8.38 MB
        6. 172.23.106.18-20210205-0305-diag.zip
          14.47 MB
        7. 172.23.108.239-20210205-0305-diag.zip
          7.72 MB
        8. 172.23.121.22-20210205-0305-diag.zip
          13.99 MB
        9. 172.23.123.105-20210202-0347-diag.zip
          30.51 MB
        10. 172.23.123.107-20210202-0404-diag.zip
          16.06 MB
        11. 172.23.123.108-20210202-0352-diag.zip
          10.61 MB
        12. 172.23.123.109-20210202-0358-diag.zip
          19.85 MB
        13. 172.23.123.111-20210202-0400-diag.zip
          11.50 MB
        14. 172.23.123.114-20210202-0406-diag.zip
          11.32 MB
        15. 172.23.123.115-20210202-0401-diag.zip
          15.50 MB
        16. 172.23.123.116-20210202-0351-diag.zip
          19.63 MB
        17. 172.23.123.117-20210202-0354-diag.zip
          14.66 MB
        18. 172.23.98.15-20210205-0305-diag.zip
          16.83 MB
        19. archive.zip
          1.69 MB
        20. test_logs.txt
          186 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            asad.zaidi Asad Zaidi (Inactive)
            asad.zaidi Asad Zaidi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty