Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51026

[magma] ns_server' exited with status 3

    XMLWordPrintable

Details

    Description

      Note:
      Test was writing to /data file system, which was an encrypted file system(LUKS)

      [root@cen-s604 ~]# lsblk
      NAME                  MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
      sda                     8:0    0 931.5G  0 disk
      |-sda1                  8:1    0     1G  0 part  /boot
      `-sda2                  8:2    0 929.5G  0 part
        |-cl_cen--s604-root 253:0    0 898.1G  0 lvm   /
        `-cl_cen--s604-swap 253:1    0  31.4G  0 lvm
      sdb                     8:16   0 447.1G  0 disk
      `-sdb1                  8:17   0 446.8G  0 part
        `-cbefs             253:2    0 446.8G  0 crypt /data
      sr0                    11:0    1  1024M  0 rom
      

      /dev/mapper/cl_cen--s604-root: UUID="cc5ba19a-ad03-44cf-bf70-3fd4e6a30a2a" TYPE="xfs"
      /dev/sda2: UUID="oBZuw1-SZfu-8jqK-QvwG-u5yl-b4Az-SneF3k" TYPE="LVM2_member"
      /dev/sdb1: UUID="b76e560e-fa5c-42c1-a753-691d6dae435d" TYPE="crypto_LUKS"
      /dev/sda1: UUID="663a1a84-29b7-45a2-9d24-243e6c42c711" TYPE="xfs"
      /dev/mapper/cl_cen--s604-swap: UUID="ddf037ec-5be9-4956-8689-78038f841c83" TYPE="swap"
      /dev/mapper/cbefs: UUID="af938561-f8b5-4a4d-8293-98da803575fa" TYPE="xfs
      

      Steps to repro:

      1. Create a 3 node cluster(172.23.100.161, 172.23.100.162, 172.23.100.163
      2. Creat a magma bucket with replicas = 1
      3. Load 7M docs
      4. Start get operation on all 7M docs using multi threads(4 threads are reading all docs)
      5. Observed ns_server' exited with status 3 on node 172.23.100.161 and eventually observed the same on other two nodes as well

      ns_serv Error:

      Service 'ns_server' exited with status 3. Restarting. Messages:
      Crap error:{badmatch,
      {error,
      {{shutdown,
      {failed_to_start_child,dist_manager,
      {{badmatch,{error,enoent}},
      [{ns_server,read_cookie_file,1,
      [{file,"src/ns_server.erl"},{line,246}]},
      {dist_manager,bringup,2,
      [{file,"src/dist_manager.erl"},{line,254}]},
      {dist_manager,init,1,
      [{file,"src/dist_manager.erl"},{line,199}]},
      {proc_lib,init_p_do_apply,3,
      [{file,"proc_lib.erl"},{line,226}]}]}}},
      {ns_server,start,[normal,[]]}}}}
      [{ns_bootstrap,start,0,[{file,"src/ns_bootstrap.erl"},{line,31}]},
      {child_erlang,do_child_start,1,[{file,"src/child_erlang.erl"},{line,105}]},
      {child_erlang,child_start,1,[{file,"src/child_erlang.erl"},{line,83}]},
      {init,start_em,1,[]},
      {init,do_boot,3,[]}]
      

      Observed this on 172.23.100.163

      [user:info,2022-02-16T22:17:47.734-08:00,ns_1@172.23.100.163:<0.428.0>:ns_log:consume_log:76]Service 'ns_server' exited with status 3. Restarting. Messages:
      Crap error:{badmatch,
              {error,
                {{shutdown,
                  {failed_to_start_child,dist_manager,
                    {{badmatch,{error,enoent}},
                     [{ns_server,read_cookie_file,1,
                       [{file,"src/ns_server.erl"},{line,246}]},
                     {dist_manager,bringup,2,
                       [{file,"src/dist_manager.erl"},{line,254}]},
                     {dist_manager,init,1,
                       [{file,"src/dist_manager.erl"},{line,199}]},
                     {proc_lib,init_p_do_apply,3,
                       [{file,"proc_lib.erl"},{line,226}]}]}}},
                {ns_server,start,[normal,[]]}}}}
      [{ns_bootstrap,start,0,[{file,"src/ns_bootstrap.erl"},{line,31}]},
       {child_erlang,do_child_start,1,[{file,"src/child_erlang.erl"},{line,105}]},
       {child_erlang,child_start,1,[{file,"src/child_erlang.erl"},{line,83}]},
       {init,start_em,1,[]},
       {init,do_boot,3,[]}]
      [user:info,2022-02-16T22:17:47.785-08:00,ns_1@172.23.100.163:<0.428.0>:ns_log:consume_log:76]Service 'ns_server' exited with status 3. Restarting. Messages:
      Crap error:{badmatch,
              {error,
                {{shutdown,
                  {failed_to_start_child,dist_manager,
                    {{badmatch,{error,enoent}},
                     [{ns_server,read_cookie_file,1,
                       [{file,"src/ns_server.erl"},{line,246}]},
                     {dist_manager,bringup,2,
                       [{file,"src/dist_manager.erl"},{line,254}]},
                     {dist_manager,init,1,
                       [{file,"src/dist_manager.erl"},{line,199}]},
                     {proc_lib,init_p_do_apply,3,
                       [{file,"proc_lib.erl"},{line,226}]}]}}},
                {ns_server,start,[normal,[]]}}}}
      [{ns_bootstrap,start,0,[{file,"src/ns_bootstrap.erl"},{line,31}]},
       {child_erlang,do_child_start,1,[{file,"src/child_erlang.erl"},{line,105}]},
       {child_erlang,child_start,1,[{file,"src/child_erlang.erl"},{line,83}]},
       {init,start_em,1,[]},
       {init,do_boot,3,[]}]
      [ns_server:info,2022-02-16T22:17:47.785-08:00,ns_1@172.23.100.163:ns_log<0.421.0>:ns_log:is_duplicate_log:156]suppressing duplicate log ns_log:0([<<"Service 'ns_server' exited with status 3. Restarting. Messages:\nCrap error:{badmatch,\n        {error,\n          {{shutdown,\n            {failed_to_start_child,dist_manager,\n              {{badmatch,{error,enoent}},\n               [{ns_server,read_cookie_file,1,\n                 [{file,\"src/ns_server.erl\"},{line,246}]},\n               {dist_manager,bringup,2,\n                 [{file,\"src/dist_manager.erl\"},{line,254}]},\n               {dist_manager,init,1,\n                 [{file,\"src/dist_manager.erl\"},{line,199}]},\n               {proc_lib,init_p_do_apply,3,\n                 [{file,\"proc_lib.erl\"},{line,226}]}]}}},\n          {ns_server,start,[normal,[]]}}}}\n[{ns_bootstrap,start,0,[{file,\"src/ns_bootstrap.erl\"},{line,31}]},\n {child_erlang,do_child_start,1,[{file,\"src/child_erlang.erl\"},{line,105}]},\n {child_erlang,child_start,1,[{file,\"src/child_erlang.erl\"},{line,83}]},\n {init,start_em,1,[]},\n {init,do_boot,3,[]}]">>]) because it's been seen 1 times in the past 0.050377 secs (last seen 0.050377 secs ago
      

      QE-TEST:

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/ankush_temp_job.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,randomize_value=True,rerun=false,replicas=1,deep_copy=True,fragmentation=50,enable_dp=false,get-cbcollect-info=True,autoCompactionDefined=true,get-cbcollect-info=True,infra_log_level=info,log_level=info,bucket_storage=magma -t storage.magma.magma_get.BasicReadTests.test_read_docs_using_multithreads,num_items=5000000,nodes_init=4,key_size=22,sdk_timeout=60'
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ankush.sharma Ankush Sharma
            ankush.sharma Ankush Sharma
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty