Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47529

[Magma] - Multi node rebalance out fails with "mover_crashed,{unexpected_exit,{\'EXIT\',<0.6756.8>,{{{{timeout,{gen_server,call,[memcached_refresh,{apply_to_file"

    XMLWordPrintable

Details

    • Triaged
    • Centos 64-bit
    • 1
    • No

    Description

      Script to Repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops-temp_rebalance_even2-magma.ini rerun=False,get-cbcollect-info=True,quota_percent=99,crash_warning=True,retry_get_process_num=600,bucket_storage=magma,enable_dp=True -t bucket_collections.collections_drop_recreate_rebalance.CollectionsDropRecreateRebalance.test_data_load_collections_with_rebalance_out,nodes_init=5,nodes_out=2,bucket_spec=multi_bucket.buckets_1000_collections'
      

      Steps to Repro
      1. Create a 5 node cluster
      2021-07-20 22:39:31,190 | test | INFO | pool-3-thread-7 | [table_view:display:72] Rebalance Overview
      -----------------------------------------------------------------------

      Nodes Services Version CPU Status

      -----------------------------------------------------------------------

      172.23.121.135 kv 7.1.0-1083-enterprise 0.150281778334 Cluster node
      172.23.121.136 None     <--- IN —
      172.23.121.139 None     <--- IN —
      172.23.121.140 None     <--- IN —
      172.23.121.141 None     <--- IN —

      -----------------------------------------------------------------------

      2. Create Bucket/scopes/collections/data
      2021-07-20 22:41:09,727 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
      --------------------------------------------------------------------------------------------------------------------------------------------------------

      Bucket Type Storage Backend Replicas Durability TTL Items RAM Quota RAM Used Disk Used ARR

      --------------------------------------------------------------------------------------------------------------------------------------------------------

      56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000 couchbase magma 2 none 0 3000000 9.77 GiB 2.77 GiB 3.12 GiB 100

      --------------------------------------------------------------------------------------------------------------------------------------------------------

      3. Start CRUD on collections, this goes on until the end of rebalance that is started in the next step.

      4. Start multi node rebalance out(172.23.121.136 and 172.23.121.135).
      2021-07-20 22:41:10,466 | test | INFO | pool-3-thread-23 | [table_view:display:72] Rebalance Overview
      ----------------------------------------------------------------------

      Nodes Services Version CPU Status

      ----------------------------------------------------------------------

      172.23.121.140 kv 7.1.0-1083-enterprise 4.18966382338 Cluster node
      172.23.121.136 kv 7.1.0-1083-enterprise 4.37405731523 — OUT --->
      172.23.121.139 kv 7.1.0-1083-enterprise 4.3047188755 Cluster node
      172.23.121.135 kv 7.1.0-1083-enterprise 3.75864236329 — OUT --->
      172.23.121.141 kv 7.1.0-1083-enterprise 4.00703429217 Cluster node

      ----------------------------------------------------------------------

      This rebalance fails as shown below.

      Latest logs from UI on 172.23.121.135:

      2021-07-20 23:35:20,617 | test  | ERROR   | pool-3-thread-23 | [rest_client:print_UI_logs:2695] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.121.135', u'tstamp': 1626849319109L, u'shortText': u'message', u'serverTime': u'2021-07-20T23:35:19.109Z', u'text': u'Rebalance exited with reason {mover_crashed,\n                              {unexpected_exit,\n                               {\'EXIT\',<0.6756.8>,\n                                {{{{timeout,\n                                    {gen_server,call,\n                                     [memcached_refresh,\n                                      {apply_to_file,\n                                       "/opt/couchbase/var/lib/couchbase/config/memcached.rbac.tmp",\n                                       "/opt/couchbase/var/lib/couchbase/config/memcached.rbac"}]}},\n                                   {gen_server,call,\n                                    [memcached_permissions,sync,infinity]}},\n                                  {gen_server,call,\n                                   [\'ns_memcached-56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000\',\n                                    {get_dcp_docs_estimate,189,\n                                     "replication:ns_1@172.23.121.135->ns_1@172.23.121.141:56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000"},\n                                    180000]}},\n                                 {gen_server,call,\n                                  [{\'janitor_agent-56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000\',\n                                    \'ns_1@172.23.121.135\'},\n                                   {if_rebalance,<0.15174.0>,\n                                    {wait_dcp_data_move,\n                                     [\'ns_1@172.23.121.141\',\n                                      \'ns_1@172.23.121.139\',\n                                      \'ns_1@172.23.121.140\'],\n                                     189}},\n                                   infinity]}}}}}.\nRebalance Operation Id = 5646fb1a94153b57ce60511c2e21f24f'}
      2021-07-20 23:35:20,621 | test  | ERROR   | pool-3-thread-23 | [rest_client:print_UI_logs:2695] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.121.135', u'tstamp': 1626849319028L, u'shortText': u'message', u'serverTime': u'2021-07-20T23:35:19.028Z', u'text': u'Worker <0.3080.1> (for action {move,{189,\n                                     [\'ns_1@172.23.121.135\',\n                                      \'ns_1@172.23.121.141\',\n                                      \'ns_1@172.23.121.139\'],\n                                     [\'ns_1@172.23.121.141\',\n                                      \'ns_1@172.23.121.139\',\n                                      \'ns_1@172.23.121.140\'],\n                                     []}}) exited with reason {unexpected_exit,\n                                                               {\'EXIT\',\n                                                                <0.6756.8>,\n                                                                {{{{timeout,\n                                                                    {gen_server,\n                                                                     call,\n                                                                     [memcached_refresh,\n                                                                      {apply_to_file,\n                                                                       "/opt/couchbase/var/lib/couchbase/config/memcached.rbac.tmp",\n                                                                       "/opt/couchbase/var/lib/couchbase/config/memcached.rbac"}]}},\n                                                                   {gen_server,\n                                                                    call,\n                                                                    [memcached_permissions,\n                                                                     sync,\n                                                                     infinity]}},\n                                                                  {gen_server,\n                                                                   call,\n                                                                   [\'ns_memcached-56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000\',\n                                                                    {get_dcp_docs_estimate,\n                                                                     189,\n                                                                     "replication:ns_1@172.23.121.135->ns_1@172.23.121.141:56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000"},\n                                                                    180000]}},\n                                                                 {gen_server,\n                                                                  call,\n                                                                  [{\'janitor_agent-56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000\',\n                                                                    \'ns_1@172.23.121.135\'},\n                                                                   {if_rebalance,\n                                                                    <0.15174.0>,\n                                                                    {wait_dcp_data_move,\n                                                                     [\'ns_1@172.23.121.141\',\n                                                                      \'ns_1@172.23.121.139\',\n                                                                      \'ns_1@172.23.121.140\'],\n                                                                     189}},\n                                                                   infinity]}}}}'}
      

      cbcollect_info attached.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Balakumaran.Gopal Balakumaran Gopal
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty