Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49515

[Windows][Magma] - Minidumps seen during Rebalance in/out + CRUD on data

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Yes

    Description

      Script to Repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops-temp_rebalance_magma_win.ini rerun=False,disk_optimized_thread_settings=True,get-cbcollect-info=True,bucket_storage=magma -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_swap_rebalance,nodes_init=5,nodes_swap=2,bucket_spec=magma_dgm.5_percent_dgm.5_node_2_replica_magma_512,doc_size=512,randomize_value=True,data_load_stage=during,skip_validations=False,GROUP=swap_rebalance_P0_set0'
      

      Steps to Repro
      1. Create a 5 node cluster

      2021-11-12 01:34:25,729 | test  | INFO    | MainThread | [table_view:display:72] Cluster statistics
      +----------------+----------+-----------------+-----------+----------+---------------------+-------------------+-----------------------+
      | Node           | Services | CPU_utilization | Mem_total | Mem_free | Swap_mem_used       | Active / Replica  | Version               |
      +----------------+----------+-----------------+-----------+----------+---------------------+-------------------+-----------------------+
      | 172.23.136.106 | kv       | 0.156666666667  | 6.00 GiB  | 4.58 GiB | 1.63 GiB / 7.00 GiB | 0 / 0             | 7.1.0-1696-enterprise |
      | 172.23.136.103 | kv       | 0.991553433713  | 6.00 GiB  | 4.52 GiB | 1.68 GiB / 7.00 GiB | 0 / 0             | 7.1.0-1696-enterprise |
      | 172.23.136.104 | kv       | 0.91475119769   | 6.00 GiB  | 4.54 GiB | 1.67 GiB / 7.00 GiB | 0 / 0             | 7.1.0-1696-enterprise |
      | 172.23.136.101 | kv       | 0.910015166919  | 6.00 GiB  | 4.48 GiB | 1.72 GiB / 7.00 GiB | 713544 / 1420352  | 7.1.0-1696-enterprise |
      | 172.23.136.102 | kv       | 3.10156328122   | 6.00 GiB  | 4.55 GiB | 1.63 GiB / 7.00 GiB | 0 / 0             | 7.1.0-1696-enterprise |
      +----------------+----------+-----------------+-----------+----------+---------------------+-------------------+-----------------------+
      

      2. Create bucket/scopes/collections/data

      2021-11-12 01:38:40,154 | test  | INFO    | MainThread | [table_view:display:72] Bucket statistics
      +-----------------------------------------------------------------+-----------+-----------------+----------+------------+-----+---------+-----------+----------+-----------+-----+
      | Bucket                                                          | Type      | Storage Backend | Replicas | Durability | TTL | Items   | RAM Quota | RAM Used | Disk Used | ARR |
      +-----------------------------------------------------------------+-----------+-----------------+----------+------------+-----+---------+-----------+----------+-----------+-----+
      | L5VT2_xKXtFBcLfd_QevK4bT0zv3QiDW6IaZPG42CevEkRkGoe%jkb-26-12000 | couchbase | magma           | 2        | none       | 0   | 3000000 | 9.77 GiB  | 6.15 GiB | 3.50 GiB  | 100 |
      +-----------------------------------------------------------------+-----------+-----------------+----------+------------+-----+---------+-----------+----------+-----------+-----+
      

      3.Add one node (172.23.136.105) and Remove 2 nodes(172.23.136.106 and 172.23.136.102) and start rebalance.

      2021-11-12 01:38:52,213 | test  | INFO    | pool-6-thread-22 | [table_view:display:72] Rebalance Overview
      +----------------+----------+-----------------------+---------------+--------------+
      | Nodes          | Services | Version               | CPU           | Status       |
      +----------------+----------+-----------------------+---------------+--------------+
      | 172.23.136.106 | kv       | 7.1.0-1696-enterprise | 10.4666666667 | --- OUT ---> |
      | 172.23.136.103 | kv       | 7.1.0-1696-enterprise | 12.5886140348 | Cluster node |
      | 172.23.136.104 | kv       | 7.1.0-1696-enterprise | 9.06166666667 | Cluster node |
      | 172.23.136.101 | kv       | 7.1.0-1696-enterprise | 10.4714921418 | Cluster node |
      | 172.23.136.102 | kv       | 7.1.0-1696-enterprise | 9.06166666667 | --- OUT ---> |
      | 172.23.136.105 | kv       | 7.1.0-1696-enterprise | 0             | Cluster node |
      +----------------+----------+-----------------------+---------------+--------------+
      

      Rebalance fails as shown below because we see a memcached crash.
      172.23.136.101:

      2021-11-12 01:39:35,654 | test  | ERROR   | pool-6-thread-22 | [rest_client:print_UI_logs:2784] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.136.101', u'tstamp': 1636709967895L, u'shortText': u'message', u'serverTime': u'2021-11-12T01:39:27.895Z', u'text': u'Rebalance exited with reason {mover_crashed,\n                              {unexpected_exit,\n                               {\'EXIT\',<0.1835.1>,\n                                {{{lost_connection,\n                                   [{ns_memcached,worker_loop,3,\n                                     [{file,"src/ns_memcached.erl"},\n                                      {line,209}]},\n                                    {proc_lib,init_p_do_apply,3,\n                                     [{file,"proc_lib.erl"},{line,226}]}]},\n                                  {gen_server,call,\n                                   [\'ns_memcached-L5VT2_xKXtFBcLfd_QevK4bT0zv3QiDW6IaZPG42CevEkRkGoe%jkb-26-12000\',\n                                    {set_vbucket,994,active,\n                                     [[\'ns_1@172.23.136.106\',\n                                       \'ns_1@172.23.136.104\',\n                                       \'ns_1@172.23.136.101\']]},\n                                    180000]}},\n                                 {gen_server,call,\n                                  [{\'janitor_agent-L5VT2_xKXtFBcLfd_QevK4bT0zv3QiDW6IaZPG42CevEkRkGoe%jkb-26-12000\',\n                                    \'ns_1@172.23.136.106\'},\n                                   {if_rebalance,<0.511.1>,\n                                    {update_vbucket_state,994,active,paused,\n                                     undefined,\n                                     [[\'ns_1@172.23.136.106\',\n                                       \'ns_1@172.23.136.104\',\n                                       \'ns_1@172.23.136.101\']]}},\n                                   infinity]}}}}}.\nRebalance Operation Id = b413500a9e7ef72f63f40ab4f1e36723'}
      

      grep CRITICAL on 172.23.136.106

       $grep CRITICAL memcached.log.0000*
      memcached.log.000013.txt:2021-11-12T01:39:27.297618-08:00 CRITICAL Breakpad caught a crash (Couchbase version 7.1.0-1696). Writing crash dump to c:/Program Files/Couchbase/Server/var/lib/couchbase/crash\d754b25b-b14b-4f36-9d18-fa19f722e211.dmp before terminating.
      memcached.log.000013.txt:2021-11-12T01:39:27.297632-08:00 CRITICAL Stack backtrace of crashed thread:
      memcached.log.000013.txt:2021-11-12T01:39:27.299891-08:00 CRITICAL     #0  c:\Program Files\Couchbase\Server\bin\memcached.exe(FileOpsInterface::set_mprotect_enabled+10787324) [0x00007FF61B17D4C0]
      memcached.log.000013.txt:2021-11-12T01:39:27.299923-08:00 CRITICAL     #1  c:\Program Files\Couchbase\Server\bin\memcached.exe(FileOpsInterface::set_mprotect_enabled+11330637) [0x00007FF61B201F11]
      memcached.log.000013.txt:2021-11-12T01:39:27.299947-08:00 CRITICAL     #2  C:\Windows\System32\KERNEL32.DLL(BaseThreadInitThunk+20) [0x00007FFF53EB84D4]
      memcached.log.000013.txt:2021-11-12T01:39:27.299969-08:00 CRITICAL     #3  C:\Windows\SYSTEM32\ntdll.dll(RtlUserThreadStart+33) [0x00007FFF5483E8B1]
       
      Administrator@WIN-1T98IIFH727 /cygdrive/c/Program Files/Couchbase/Server/var/lib/couchbase/logs
      

      We see minidump d754b25b-b14b-4f36-9d18-fa19f722e211.dmp on 172.23.136.106. We just started running magma tests on windows, so we don't have an baseline.

      cbcollect_info attached.

      Attachments

        1. MB-49515.zip
          6.72 MB
        2. test.log
          43 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty