Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-37166

Service 'memcached' exited with status 135 on orchestrator node during bulk_set_vbucket_state_failed

    XMLWordPrintable

Details

    Description

      Note: Works fine with couchstore

      Steps to Reproduce:

      1. Create a 3 node cluster and a default bucket with replica=1

      +----------------+----------+--------------+
      | Nodes          | Services | Status       |
      +----------------+----------+--------------+
      | 172.23.105.220 | kv       | Cluster node |
      | 172.23.105.221 | None     | <--- IN ---  |
      | 172.23.105.223 | None     | <--- IN ---  |
      +----------------+----------+--------------+
      

      http://172.23.105.220:8091/pools/default/buckets with param: replicaIndex=1&maxTTL=0&flushEnabled=1&compressionMode=off&bucketType=membase&name=default&replicaNumber=1&ramQuotaMB=1424&threadsNumber=3&evictionPolicy=valueOnly
      

      2. Load 250k items, Bucket statistics:

      +---------+---------+----------+-----+--------+------------+-----------+-----------+
      | Bucket  | Type    | Replicas | TTL | Items  | RAM Quota  | RAM Used  | Disk Used |
      +---------+---------+----------+-----+--------+------------+-----------+-----------+
      | default | membase | 1        | 0   | 250000 | 4479516672 | 417257704 | 361104332 |
      +---------+---------+----------+-----+--------+------------+-----------+-----------+
       

      3. Rebalance In 1 more node and upsert 50% of data(0-125000) in parallel:

      +----------------+----------+--------------+
      | Nodes          | Services | Status       |
      +----------------+----------+--------------+
      | 172.23.105.220 | kv       | Cluster node |
      | 172.23.105.223 | kv       | Cluster node |
      | 172.23.105.221 | kv       | Cluster node |
      | 172.23.105.225 | None     | <--- IN ---  |
      +----------------+----------+--------------+
      

      4. Rebalance Failed:

      {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.105.220', u'tstamp': 1575524489479L, u'shortText': u'message', u'serverTime': u'2019-12-04T21:41:29.479Z', u'text': u'Rebalance exited with reason {mover_crashed,\n                              {unexpected_exit,\n                               {\'EXIT\',<0.4296.1>,\n                                {{bulk_set_vbucket_state_failed,\n                                  [{\'ns_1@172.23.105.225\',\n                                    {\'EXIT\',\n                                     {{{{{child_interrupted,\n                                          {\'EXIT\',<24815.15401.0>,\n                                           socket_closed}},\n                                         [{dcp_replicator,spawn_and_wait,1,\n                                           [{file,"src/dcp_replicator.erl"},\n                                            {line,266}]},\n                                          {dcp_replicator,handle_call,3,\n                                           [{file,"src/dcp_replicator.erl"},\n                                            {line,127}]},\n                                          {gen_server,try_handle_call,4,\n                                           [{file,"gen_server.erl"},\n                                            {line,636}]},\n                                          {gen_server,handle_msg,6,\n                                           [{file,"gen_server.erl"},\n                                            {line,665}]},\n                                          {proc_lib,init_p_do_apply,3,\n                                           [{file,"proc_lib.erl"},\n                                            {line,247}]}]},\n                                        {gen_server,call,\n                                         [<24815.15398.0>,get_partitions,\n                                          infinity]}},\n                                       {gen_server,call,\n                                        [\'dcp_replication_manager-default\',\n                                         {get_replicator_pid,320},\n                                         infinity]}},\n                                      {gen_server,call,\n                                       [{\'janitor_agent-default\',\n                                         \'ns_1@172.23.105.225\'},\n                                        {if_rebalance,<0.5984.0>,\n                                         {update_vbucket_state,95,replica,\n                                          undefined,undefined}},\n                                        infinity]}}}}]},\n                                 [{janitor_agent,bulk_set_vbucket_state,4,\n                                   [{file,"src/janitor_agent.erl"},\n                                    {line,403}]},\n                                  {ns_single_vbucket_mover,\n                                   \'-cleanup_old_streams/4-fun-1-\',4,\n                                   [{file,"src/ns_single_vbucket_mover.erl"},\n                                    {line,353}]},\n                                  {proc_lib,init_p,3,\n                                   [{file,"proc_lib.erl"},{line,232}]}]}}}}.\nRebalance Operation Id = ba35eaf773a90e32c1ec0d39c45c83db'}
      2019-12-04 21:41:39,161 | test  | ERROR   | pool-2-thread-4 | [rest_client:print_UI_logs:2644] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.105.220', u'tstamp': 1575524489475L, u'shortText': u'message', u'serverTime': u'2019-12-04T21:41:29.475Z', u'text': u'Worker <0.3643.1> (for action {move,{95,\n                                     [\'ns_1@172.23.105.220\',\n                                      \'ns_1@172.23.105.221\'],\n                                     [\'ns_1@172.23.105.221\',\n                                      \'ns_1@172.23.105.225\'],\n                                     []}}) exited with reason {unexpected_exit,\n                                                               {\'EXIT\',\n                                                                <0.4296.1>,\n                                                                {{bulk_set_vbucket_state_failed,\n                                                                  [{\'ns_1@172.23.105.225\',\n                                                                    {\'EXIT\',\n                                                                     {{{{{child_interrupted,\n                                                                          {\'EXIT\',\n                                                                           <24815.15401.0>,\n                                                                           socket_closed}},\n                                                                         [{dcp_replicator,\n                                                                           spawn_and_wait,\n                                                                           1,\n                                                                           [{file,\n                                                                             "src/dcp_replicator.erl"},\n                                                                            {line,\n                                                                             266}]},\n                                                                          {dcp_replicator,\n                                                                           handle_call,\n                                                                           3,\n                                                                           [{file,\n                                                                             "src/dcp_replicator.erl"},\n                                                                            {line,\n                                                                             127}]},\n                                                                          {gen_server,\n                                                                           try_handle_call,\n                                                                           4,\n                                                                           [{file,\n                                                                             "gen_server.erl"},\n                                                                            {line,\n                                                                             636}]},\n                                                                          {gen_server,\n                                                                           handle_msg,\n                                                                           6,\n                                                                           [{file,\n                                                                             "gen_server.erl"},\n                                                                            {line,\n                                                                             665}]},\n                                                                          {proc_lib,\n                                                                           init_p_do_apply,\n                                                                           3,\n                                                                           [{file,\n                                                                             "proc_lib.erl"},\n                                                                            {line,\n                                                                             247}]}]},\n                                                                        {gen_server,\n                                                                         call,\n                                                                         [<24815.15398.0>,\n                                                                          get_partitions,\n                                                                          infinity]}},\n                                                                       {gen_server,\n                                                                        call,\n                                                                        [\'dcp_replication_manager-default\',\n                                                                         {get_replicator_pid,\n                                                                          320},\n                                                                         infinity]}},\n                                                                      {gen_server,\n                                                                       call,\n                                                                       [{\'janitor_agent-default\',\n                                                                         \'ns_1@172.23.105.225\'},\n                                                                        {if_rebalance,\n                                                                         <0.5984.0>,\n                                                                         {update_vbucket_state,\n                                                                          95,\n                                                                          replica,\n                                                                          undefined,\n                                                                          undefined}},\n                                                                        infinity]}}}}]},\n                                                                 [{janitor_agent,\n                                                                   bulk_set_vbucket_state,\n                                                                   4,\n                                                                   [{file,\n                                                                     "src/janitor_agent.erl"},\n                                                                    {line,\n                                                                     403}]},\n                                                                  {ns_single_vbucket_mover,\n                                                                   \'-cleanup_old_streams/4-fun-1-\',\n                                                                   4,\n                                                                   [{file,\n                                                                     "src/ns_single_vbucket_mover.erl"},\n                                                                    {line,\n                                                                     353}]},\n                                                                  {proc_lib,\n                                                                   init_p,3,\n                                                                   [{file,\n                                                                     "proc_lib.erl"},\n                                                                    {line,\n                                                                     232}]}]}}}'}
      

      {u'code': 0, u'module': u'ns_log', u'type': u'info', u'node': u'ns_1@172.23.105.221', u'tstamp': 1575524489472L, u'shortText': u'message', u'serverTime': u'2019-12-04T21:41:29.472Z', u'text': u"Service 'memcached' exited with status 135. Restarting. Messages:\n2019-12-04T21:41:29.446927-08:00 CRITICAL     /opt/couchbase/bin/../lib/ep.so() [0x7f7cf9e94000+0x209e95]\n2019-12-04T21:41:29.446935-08:00 CRITICAL     /opt/couchbase/bin/../lib/ep.so() [0x7f7cf9e94000+0xe39e2]\n2019-12-04T21:41:29.446942-08:00 CRITICAL     /opt/couchbase/bin/../lib/ep.so() [0x7f7cf9e94000+0xe83dd]\n2019-12-04T21:41:29.446950-08:00 CRITICAL     /opt/couchbase/bin/../lib/ep.so() [0x7f7cf9e94000+0x139840]\n2019-12-04T21:41:29.446956-08:00 CRITICAL     /opt/couchbase/bin/../lib/ep.so() [0x7f7cf9e94000+0x13a711]\n2019-12-04T21:41:29.446965-08:00 CRITICAL     /opt/couchbase/bin/../lib/ep.so() [0x7f7cf9e94000+0x134404]\n2019-12-04T21:41:29.446970-08:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f7cf8063000+0x8d37]\n2019-12-04T21:41:29.447035-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7f7cf5a84000+0x7dd5]\n2019-12-04T21:41:29.447070-08:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7f7cf56b7000+0xfdead]\n[*** LOG ERROR ***] [2019-12-04 21:41:29] [spdlog_file_logger] async log: thread pool doesn't exist anymore"}
      

      QE Note:

      -p num_items=250000,GROUP=P1;default,magma_storage=True -t rebalance_new.rebalance_in.RebalanceInTests.rebalance_in_with_compaction_and_ops,nodes_init=3,replicas=1,num_items=100000,doc_ops=create:update:delete,GROUP=P1;default,skip_cleanup=True, -p rerun=False,infra_log_level=debug,log_level=debug -m rest'
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            scott.lashley Scott Lashley
            ritesh.agarwal Ritesh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty