Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-36400

[high-bucket] FTS rebalance out in high bucket density test takes really long hours (28hrs+)

    XMLWordPrintable

Details

    • Untriaged
    • No

    Description

      Build 6.5.0-4380

      Observed that rebalancing out FTS node in high bucket density test with 30 buckets taking 764.42 min and fails with following reason.

      Each bucket have 1 FTS index.

      Rebalance failure reason-

      Rebalance exited with reason {service_rebalance_failed,fts,
      {worker_died,
      {'EXIT',<0.13606.525>,
      {{linked_process_died,<27016.24380.381>,
      {timeout,
      {gen_server,call,
      [<27016.22501.381>,
      {call,"ServiceAPI.GetCurrentTopology",
      #Fun<json_rpc_connection.0.102434519>},
      60000]}}},
      {gen_server,call,
      [{'service_agent-fts','ns_1@172.23.97.15'},
      {if_rebalance,<0.19462.525>,
      {start_rebalance,
      <<"f81721335b42662e837612d6b366cca3">>,
      rebalance,
      [{[{node_id,
      <<"827b0f5298e3447fa469c0ebdcad3da0">>},
      {priority,0},
      {opaque,null}],
      full}],
      [[{node_id,
      <<"592321897f0f721188d778a77f4c9d9b">>},
      {priority,0},
      {opaque,null}]],
      <0.13606.525>}},
      90000]}}}}}.
      Rebalance Operation Id = f5116259df628e7619516c5b6cba1997
      

       On build 6.5.0-2082, this rebalance out took 79 min.

      Job- http://perf.jenkins.couchbase.com/job/arke-multi-bucket/322/

      Logs-
      FTS node- https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.96.20.zip
      FTS node going out- https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.15.zip

      Other nodes-
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.96.16.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.96.17.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.96.23.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.12.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.13.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.14.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.177.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.19.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.20.zip

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            mahesh.mandhare Mahesh Mandhare (Inactive) created issue -
            mahesh.mandhare Mahesh Mandhare (Inactive) made changes -
            Field Original Value New Value
            Summary FTS rebalance out in high bucket density test takes huge time FTS rebalance out in high bucket density test takes 12 hours
            mahesh.mandhare Mahesh Mandhare (Inactive) made changes -
            Description Build 6.5.0-4380

            Observed that rebalancing out FTS node in high bucket density test with 30 buckets taking huge time. Each bucket have 1 FTS index.

            At the time of log collection it took 11 hours and rebalance progress was ~50%.

            I terminated test and collected logs.

            On build 6.5.0-2082, this rebalance out took 79 min.

            Job- [http://perf.jenkins.couchbase.com/job/arke-multi-bucket/322/]

            Logs-

            FTS node- [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.96.20.zip]

            FTS node going out- [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.15.zip]

            Other nodes-
            [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.96.23.zip]
            [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.12.zip]
            [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.13.zip]
            [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.14.zip]
            [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.177.zip]
            [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.19.zip]
            [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.20.zip]
            Build 6.5.0-4380

            Observed that rebalancing out FTS node in high bucket density test with 30 buckets taking 764.42 min. Each bucket have 1 FTS index.

            At the time of log collection it took 11 hours and rebalance progress was ~50%.

            On build 6.5.0-2082, this rebalance out took 79 min.

            Job- [http://perf.jenkins.couchbase.com/job/arke-multi-bucket/322/]

            Logs-

            FTS node- [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.96.20.zip]

            FTS node going out- [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.15.zip]

            Other nodes-
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.96.23.zip]
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.12.zip]
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.13.zip]
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.14.zip]
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.177.zip]
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.19.zip]
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.20.zip]
            mahesh.mandhare Mahesh Mandhare (Inactive) made changes -
            Summary FTS rebalance out in high bucket density test takes 12 hours FTS rebalance out in high bucket density test takes 12 hours and fails
            mahesh.mandhare Mahesh Mandhare (Inactive) made changes -
            Description Build 6.5.0-4380

            Observed that rebalancing out FTS node in high bucket density test with 30 buckets taking 764.42 min. Each bucket have 1 FTS index.

            At the time of log collection it took 11 hours and rebalance progress was ~50%.

            On build 6.5.0-2082, this rebalance out took 79 min.

            Job- [http://perf.jenkins.couchbase.com/job/arke-multi-bucket/322/]

            Logs-

            FTS node- [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.96.20.zip]

            FTS node going out- [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.15.zip]

            Other nodes-
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.96.23.zip]
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.12.zip]
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.13.zip]
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.14.zip]
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.177.zip]
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.19.zip]
             [https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_reb_out_hbd/collectinfo-2019-10-09T085427-ns_1%40172.23.97.20.zip]
            Build 6.5.0-4380

            Observed that rebalancing out FTS node in high bucket density test with 30 buckets taking 764.42 min and fails with following reason.

            Each bucket have 1 FTS index.

            Rebalance failure reason-
            {noformat}Rebalance exited with reason {service_rebalance_failed,fts,
            {worker_died,
            {'EXIT',<0.13606.525>,
            {{linked_process_died,<27016.24380.381>,
            {timeout,
            {gen_server,call,
            [<27016.22501.381>,
            {call,"ServiceAPI.GetCurrentTopology",
            #Fun<json_rpc_connection.0.102434519>},
            60000]}}},
            {gen_server,call,
            [{'service_agent-fts','ns_1@172.23.97.15'},
            {if_rebalance,<0.19462.525>,
            {start_rebalance,
            <<"f81721335b42662e837612d6b366cca3">>,
            rebalance,
            [{[{node_id,
            <<"827b0f5298e3447fa469c0ebdcad3da0">>},
            {priority,0},
            {opaque,null}],
            full}],
            [[{node_id,
            <<"592321897f0f721188d778a77f4c9d9b">>},
            {priority,0},
            {opaque,null}]],
            <0.13606.525>}},
            90000]}}}}}.
            Rebalance Operation Id = f5116259df628e7619516c5b6cba1997
            {noformat}
             On build 6.5.0-2082, this rebalance out took 79 min.

            Job- [http://perf.jenkins.couchbase.com/job/arke-multi-bucket/322/]

            Logs-
            FTS node- https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.96.20.zip
            FTS node going out- https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.15.zip

            Other nodes-
            https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.96.16.zip
            https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.96.17.zip
            https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.96.23.zip
            https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.12.zip
            https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.13.zip
            https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.14.zip
            https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.177.zip
            https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.19.zip
            https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-322/172.23.97.20.zip
            wayne Wayne Siu made changes -
            Summary FTS rebalance out in high bucket density test takes 12 hours and fails [high-bucket] FTS rebalance out in high bucket density test takes 12 hours and fails
            raju Raju Suravarjjala made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            raju Raju Suravarjjala made changes -
            Fix Version/s Mad-Hatter [ 15037 ]
            Sreekanth Sivasankaran Sreekanth Sivasankaran made changes -
            Assignee Keshav Murthy [ keshav ] Sreekanth Sivasankaran [ sreekanth sivasankaran ]
            Sreekanth Sivasankaran Sreekanth Sivasankaran made changes -
            Assignee Sreekanth Sivasankaran [ sreekanth sivasankaran ] Mahesh Mandhare [ mahesh.mandhare ]
            lynn.straus Lynn Straus made changes -
            Labels Performance high-bucket-density Performance approved-for-mad-hatter high-bucket-density
            mahesh.mandhare Mahesh Mandhare (Inactive) made changes -
            Assignee Mahesh Mandhare [ mahesh.mandhare ] Sreekanth Sivasankaran [ sreekanth sivasankaran ]
            lynn.straus Lynn Straus made changes -
            Due Date 15/Nov/19
            Sreekanth Sivasankaran Sreekanth Sivasankaran made changes -
            Is this a Regression? Yes [ 10450 ] No [ 10451 ]
            keshav Keshav Murthy made changes -
            Fix Version/s Cheshire-Cat [ 15915 ]
            Fix Version/s Mad-Hatter [ 15037 ]
            Sreekanth Sivasankaran Sreekanth Sivasankaran made changes -
            Summary [high-bucket] FTS rebalance out in high bucket density test takes 12 hours and fails [high-bucket] FTS rebalance out in high bucket density test takes really long hours (1
            Sreekanth Sivasankaran Sreekanth Sivasankaran made changes -
            Summary [high-bucket] FTS rebalance out in high bucket density test takes really long hours (1 [high-bucket] FTS rebalance out in high bucket density test takes really long hours (28hrs+)
            abhinav Abhinav Dangeti made changes -
            Link This issue relates to MB-34911 [ MB-34911 ]
            keshav Keshav Murthy made changes -
            Assignee Sreekanth Sivasankaran [ sreekanth sivasankaran ] Jyotsna Nayak [ jyotsna.nayak ]
            keshav Keshav Murthy made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            lynn.straus Lynn Straus made changes -
            Fix Version/s CheshireCat.Next [ 16908 ]
            Fix Version/s Cheshire-Cat [ 15915 ]
            Labels Performance approved-for-mad-hatter high-bucket-density Performance approved-for-mad-hatter deferred-from-Cheshire-Cat high-bucket-density
            jyotsna.nayak Jyotsna Nayak made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            jyotsna.nayak Jyotsna Nayak made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]

            People

              jyotsna.nayak Jyotsna Nayak
              mahesh.mandhare Mahesh Mandhare (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty