Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44009

[ASan] memcached terminated by oom-killer during collection CRUD + durability data load

    XMLWordPrintable

Details

    • Triaged
    • Centos 64-bit
    • 1
    • Yes

    Description

      Script to Repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops.ini rerun=False,quota_percent=95,crash_warning=True,GROUP=failover_with_collection_crud_durability_MAJORITY -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_graceful_failover_rebalance_out,nodes_init=5,nodes_failover=1,override_spec_params=durability;replicas,durability=MAJORITY,replicas=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,quota_percent=80,GROUP=failover_with_collection_crud_durability_MAJORITY'
      

      Steps to Repro
      1) Create a 5 node cluster
      2021-01-31 08:29:33,272 | test | INFO | pool-1-thread-6 | [table_view:display:72] Rebalance Overview
      ------------------------------------

      Nodes Services Status

      ------------------------------------

      172.23.98.196 kv Cluster node
      172.23.98.195 None <--- IN —
      172.23.121.10 None <--- IN —
      172.23.104.186 None <--- IN —
      172.23.120.206 None <--- IN —

      ------------------------------------

      2) Create buckets/scopes/collections.
      --------------------------------------------------------------------------

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used

      --------------------------------------------------------------------------

      bucket1 couchbase 2 none 0 3000 1048576000 15118121 357353918
      bucket2 ephemeral 2 none 0 3000 1048576000 198787274 170
      default couchbase 2 none 0 500000 10485760000 339767192 602148970

      --------------------------------------------------------------------------

      3) Start collection CRUD + durability data load

      4) Start graceful failover. It fails as shown below.

      2021-01-31 08:39:26,507 | test  | ERROR   | pool-1-thread-16 | [rest_client:print_UI_logs:2595] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.98.196', u'tstamp': 1612111164914L, u'shortText': u'message', u'serverTime': u'2021-01-31T08:39:24.914Z', u'text': u'Graceful failover exited with reason {mover_crashed,\n                                      {{{{badmatch,{error,timeout}},\n                                         [{mc_client_binary,stats_recv,4,\n                                           [{file,"src/mc_client_binary.erl"},\n                                            {line,171}]},\n                                          {mc_client_binary,stats,4,\n                                           [{file,"src/mc_client_binary.erl"},\n                                            {line,482}]},\n                                          {ns_memcached,do_handle_call,3,\n                                           [{file,"src/ns_memcached.erl"},\n                                            {line,453}]},\n                                          {ns_memcached,worker_loop,3,\n                                           [{file,"src/ns_memcached.erl"},\n                                            {line,224}]},\n                                          {proc_lib,init_p_do_apply,3,\n                                           [{file,"proc_lib.erl"},\n                                            {line,249}]}]},\n                                        {gen_server,call,\n                                         [\'ns_memcached-default\',\n                                          {get_dcp_docs_estimate,285,\n                                           "replication:ns_1@172.23.120.206->ns_1@172.23.121.10:default"},\n                                          180000]}},\n                                       {gen_server,call,\n                                        [{\'janitor_agent-default\',\n                                          \'ns_1@172.23.120.206\'},\n                                         {if_rebalance,<0.13184.2>,\n                                          {get_vbucket_high_seqno,385}},\n                                         infinity]}}}.\nRebalance Operation Id = 765e6b6d308dcd677bb6c4ef88b8b2c3'}
      2021-01-31 08:39:26,509 | test  | ERROR   | pool-1-thread-16 | [rest_client:print_UI_logs:2595] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.98.196', u'tstamp': 1612111164805L, u'shortText': u'message', u'serverTime': u'2021-01-31T08:39:24.805Z', u'text': u'Worker <0.31602.2> (for action {move,{385,\n                                      [\'ns_1@172.23.120.206\',\n                                       \'ns_1@172.23.98.196\',\n                                       \'ns_1@172.23.121.10\'],\n                                      [\'ns_1@172.23.98.196\',\n                                       \'ns_1@172.23.121.10\',\n                                       \'ns_1@172.23.120.206\'],\n                                      []}}) exited with reason {{{{badmatch,\n                                                                   {error,\n                                                                    timeout}},\n                                                                  [{mc_client_binary,\n                                                                    stats_recv,\n                                                                    4,\n                                                                    [{file,\n                                                                      "src/mc_client_binary.erl"},\n                                                                     {line,\n                                                                      171}]},\n                                                                   {mc_client_binary,\n                                                                    stats,4,\n                                                                    [{file,\n                                                                      "src/mc_client_binary.erl"},\n                                                                     {line,\n                                                                      482}]},\n                                                                   {ns_memcached,\n                                                                    do_handle_call,\n                                                                    3,\n                                                                    [{file,\n                                                                      "src/ns_memcached.erl"},\n                                                                     {line,\n                                                                      453}]},\n                                                                   {ns_memcached,\n                                                                    worker_loop,\n                                                                    3,\n                                                                    [{file,\n                                                                      "src/ns_memcached.erl"},\n                                                                     {line,\n                                                                      224}]},\n                                                                   {proc_lib,\n                                                                    init_p_do_apply,\n                                                                    3,\n                                                                    [{file,\n                                                                      "proc_lib.erl"},\n                                                                     {line,\n                                                                      249}]}]},\n                                                                 {gen_server,\n                                                                  call,\n                                                                  [\'ns_memcached-default\',\n                                                                   {get_dcp_docs_estimate,\n                                                                    285,\n                                                                    "replication:ns_1@172.23.120.206->ns_1@172.23.121.10:default"},\n                                                                   180000]}},\n                                                                {gen_server,\n                                                                 call,\n                                                                 [{\'janitor_agent-default\',\n                                                                   \'ns_1@172.23.120.206\'},\n                                                                  {if_rebalance,\n                                                                   <0.13184.2>,\n                                                                   {get_vbucket_high_seqno,\n                                                                    385}},\n                                                                  infinity]}}'}
      

      cbcollect_info attached.

      I think this would be regression as it was not seen on 7.0.0-4325.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty