Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-37273

Server crashing on OEL8

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Test Blocker
    • Resolution: Fixed
    • 6.5.0, 6.0.4
    • 6.5.0, 6.0.4
    • ns_server
    • Enterprise Edition 6.5.0 build 4947

    Description

      Install couchbase sever on OEL8

      Build here -
      http://172.23.120.24/builds/latestbuilds/couchbase-server/mad-hatter/4947/couchbase-server-enterprise-6.5.0-4947-oel8.x86_64.rpm

      OEL8 Box -
      [root@changeme logs]# cat /etc/oracle-release
      Oracle Linux Server release 8.1

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          [error_logger:error,2019-12-16T00:25:20.166-08:00,ns_1@cb.local:error_logger<0.32.0>:ale_error_logger_handler:do_log:203]** Generic server ns_heart terminating 
          ** Last message in was beat
          ** When Server state == {state,undefined,<0.29983.60>,[],0}
          ** Reason for termination == 
          ** {enotsup,[{misc,raw_read_loop,2,[{file,"src/misc.erl"},{line,934}]},
                       {misc,with_file,3,[{file,"src/misc.erl"},{line,1782}]},
                       {ns_heart,'-grab_procfs_files/0-lc$^0/1-0-',1,
                                 [{file,"src/ns_heart.erl"},{line,509}]},
                       {ns_heart,'-grab_procfs_files/0-lc$^0/1-0-',1,
                                 [{file,"src/ns_heart.erl"},{line,515}]},
                       {ns_heart,current_status_slow_inner,0,
                                 [{file,"src/ns_heart.erl"},{line,347}]},
                       {ns_heart,current_status_slow,1,
                                 [{file,"src/ns_heart.erl"},{line,250}]},
                       {ns_heart,update_current_status,1,
                                 [{file,"src/ns_heart.erl"},{line,187}]},
                       {ns_heart,handle_info,2,[{file,"src/ns_heart.erl"},{line,118}]}]}
           
          [ns_server:debug,2019-12-16T00:25:20.166-08:00,ns_1@cb.local:<0.29983.60>:ns_pubsub:do_subscribe_link_continue:152]Parent process of subscription {buckets_events,<0.29982.60>} exited with reason {enotsup,
                                                                                           [{misc,
                                                                                             raw_read_loop,
                                                                                             2,
                                                                                             [{file,
                                                                                               "src/misc.erl"},
                                                                                              {line,
                                                                                               934}]},
                                                                                            {misc,
                                                                                             with_file,
                                                                                             3,
                                                                                             [{file,
                                                                                               "src/misc.erl"},
                                                                                              {line,
                                                                                               1782}]},
                                                                                            {ns_heart,
                                                                                             '-grab_procfs_files/0-lc$^0/1-0-',
                                                                                             1,
                                                                                             [{file,
                                                                                               "src/ns_heart.erl"},
                                                                                              {line,
                                                                                               509}]},
                                                                                            {ns_heart,
                                                                                             '-grab_procfs_files/0-lc$^0/1-0-',
                                                                                             1,
                                                                                             [{file,
                                                                                               "src/ns_heart.erl"},
                                                                                              {line,
                                                                                               515}]},
                                                                                            {ns_heart,
                                                                                             current_status_slow_inner,
                                                                                             0,
                                                                                             [{file,
                                                                                               "src/ns_heart.erl"},
                                                                                              {line,
                                                                                               347}]},
                                                                                            {ns_heart,
                                                                                             current_status_slow,
                                                                                             1,
                                                                                             [{file,
                                                                                               "src/ns_heart.erl"},
                                                                                              {line,
                                                                                               250}]},
                                                                                            {ns_heart,
                                                                                             update_current_status,
                                                                                             1,
                                                                                             [{file,
                                                                                               "src/ns_heart.erl"},
                                                                                              {line,
                                                                                               187}]},
                                                                                            {ns_heart,
                                                                                             handle_info,
                                                                                             2,
                                                                                             [{file,
                                                                                               "src/ns_heart.erl"},
                                                                                              {line,
                                                                                               118}]}]}
          [error_logger:error,2019-12-16T00:25:20.167-08:00,ns_1@cb.local:error_logger<0.32.0>:ale_error_logger_handler:do_log:203]
          =========================CRASH REPORT=========================
            crasher:
              initial call: ns_heart:init/1
              pid: <0.29982.60>
              registered_name: ns_heart
              exception error: enotsup
                in function  misc:raw_read_loop/2 (src/misc.erl, line 934)
                in call from misc:with_file/3 (src/misc.erl, line 1782)
                in call from ns_heart:'-grab_procfs_files/0-lc$^0/1-0-'/1 (src/ns_heart.erl, line 509)
                in call from ns_heart:'-grab_procfs_files/0-lc$^0/1-0-'/1 (src/ns_heart.erl, line 515)
                in call from ns_heart:current_status_slow_inner/0 (src/ns_heart.erl, line 347)
                in call from ns_heart:current_status_slow/1 (src/ns_heart.erl, line 250)
                in call from ns_heart:update_current_status/1 (src/ns_heart.erl, line 187)
                in call from ns_heart:handle_info/2 (src/ns_heart.erl, line 118)
              ancestors: [ns_heart_sup,ns_server_sup,ns_server_nodes_sup,<0.206.0>,
                            ns_server_cluster_sup,root_sup,<0.118.0>]
              message_queue_len: 0
              messages: []
              links: [<0.29981.60>,<0.29983.60>,<0.293.0>]
              dictionary: []
              trap_exit: true
              status: running
              heap_size: 6772
              stack_size: 27
              reductions: 8403
            neighbours:
          

          ritam.sharma Ritam Sharma added a comment - [error_logger:error,2019-12-16T00:25:20.166-08:00,ns_1@cb.local:error_logger<0.32.0>:ale_error_logger_handler:do_log:203]** Generic server ns_heart terminating ** Last message in was beat ** When Server state == {state,undefined,<0.29983.60>,[],0} ** Reason for termination == ** {enotsup,[{misc,raw_read_loop,2,[{file,"src/misc.erl"},{line,934}]}, {misc,with_file,3,[{file,"src/misc.erl"},{line,1782}]}, {ns_heart,'-grab_procfs_files/0-lc$^0/1-0-',1, [{file,"src/ns_heart.erl"},{line,509}]}, {ns_heart,'-grab_procfs_files/0-lc$^0/1-0-',1, [{file,"src/ns_heart.erl"},{line,515}]}, {ns_heart,current_status_slow_inner,0, [{file,"src/ns_heart.erl"},{line,347}]}, {ns_heart,current_status_slow,1, [{file,"src/ns_heart.erl"},{line,250}]}, {ns_heart,update_current_status,1, [{file,"src/ns_heart.erl"},{line,187}]}, {ns_heart,handle_info,2,[{file,"src/ns_heart.erl"},{line,118}]}]}   [ns_server:debug,2019-12-16T00:25:20.166-08:00,ns_1@cb.local:<0.29983.60>:ns_pubsub:do_subscribe_link_continue:152]Parent process of subscription {buckets_events,<0.29982.60>} exited with reason {enotsup, [{misc, raw_read_loop, 2, [{file, "src/misc.erl"}, {line, 934}]}, {misc, with_file, 3, [{file, "src/misc.erl"}, {line, 1782}]}, {ns_heart, '-grab_procfs_files/0-lc$^0/1-0-', 1, [{file, "src/ns_heart.erl"}, {line, 509}]}, {ns_heart, '-grab_procfs_files/0-lc$^0/1-0-', 1, [{file, "src/ns_heart.erl"}, {line, 515}]}, {ns_heart, current_status_slow_inner, 0, [{file, "src/ns_heart.erl"}, {line, 347}]}, {ns_heart, current_status_slow, 1, [{file, "src/ns_heart.erl"}, {line, 250}]}, {ns_heart, update_current_status, 1, [{file, "src/ns_heart.erl"}, {line, 187}]}, {ns_heart, handle_info, 2, [{file, "src/ns_heart.erl"}, {line, 118}]}]} [error_logger:error,2019-12-16T00:25:20.167-08:00,ns_1@cb.local:error_logger<0.32.0>:ale_error_logger_handler:do_log:203] =========================CRASH REPORT========================= crasher: initial call: ns_heart:init/1 pid: <0.29982.60> registered_name: ns_heart exception error: enotsup in function misc:raw_read_loop/2 (src/misc.erl, line 934) in call from misc:with_file/3 (src/misc.erl, line 1782) in call from ns_heart:'-grab_procfs_files/0-lc$^0/1-0-'/1 (src/ns_heart.erl, line 509) in call from ns_heart:'-grab_procfs_files/0-lc$^0/1-0-'/1 (src/ns_heart.erl, line 515) in call from ns_heart:current_status_slow_inner/0 (src/ns_heart.erl, line 347) in call from ns_heart:current_status_slow/1 (src/ns_heart.erl, line 250) in call from ns_heart:update_current_status/1 (src/ns_heart.erl, line 187) in call from ns_heart:handle_info/2 (src/ns_heart.erl, line 118) ancestors: [ns_heart_sup,ns_server_sup,ns_server_nodes_sup,<0.206.0>, ns_server_cluster_sup,root_sup,<0.118.0>] message_queue_len: 0 messages: [] links: [<0.29981.60>,<0.29983.60>,<0.293.0>] dictionary: [] trap_exit: true status: running heap_size: 6772 stack_size: 27 reductions: 8403 neighbours:
          drigby Dave Rigby added a comment -

          Ritam Sharma What is the last known good build for this test?

          drigby Dave Rigby added a comment - Ritam Sharma What is the last known good build for this test?
          drigby Dave Rigby added a comment - - edited

          By strace'ing the beam.smp process which was running at ~100% CPU I see a number of read syscalls failing with EOPNOTSUPP - as per the ns_heart error seen in the logs:

          $ strace -p 76463 -f -e trace=%desc
          ...
          76469 openat(AT_FDCWD, "/proc/pressure/cpu", O_RDONLY) = 47
          76469 fstat(47, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
          76469 read(47, 0x7f8dea68b6c8, 16384)   = -1 EOPNOTSUPP (Operation not supported)
          76469 close(47) 
          ...
          

          It appears that something is preventing ns_heart from reading from the /proc file /proc/pressure/cpu.

          From https://patchwork.ozlabs.org/patch/1031852/, it appears that those files can be present, but unreadable if CONFIG_PSI_DEFAULT_DISABLED is set:

          Kernel v4.20 added /proc/pressure/

          Unknown macro: {cpu,io,memory}

          , when enabled
          CONFIG_PSI ("Pressure stall information tracking").
          When CONFIG_PSI_DEFAULT_DISABLED is also enabled (and psi=1 kernel
          parameter is not set) these files are created, but not readable.

          Signed-off-by: Petr Vorel <pvorel@suse.cz>

           testcases/kernel/fs/proc/proc01.c | 1 +
           1 file changed, 1 insertion(+)
           
          diff --git a/testcases/kernel/fs/proc/proc01.c b/testcases/kernel/fs/proc/proc01.c
          index c886e7e85..d917591bf 100644
          --- a/testcases/kernel/fs/proc/proc01.c
          +++ b/testcases/kernel/fs/proc/proc01.c
          @@ -111,6 +111,7 @@ static const struct mapping known_issues[] = {
           	{"read", "/proc/sys/vm/nr_hugepages", EOPNOTSUPP},
           	{"read", "/proc/sys/vm/nr_overcommit_hugepages", EOPNOTSUPP},
           	{"read", "/proc/sys/vm/nr_hugepages_mempolicy", EOPNOTSUPP},
          +	{"read", "/proc/pressure/*", EOPNOTSUPP},
           	{"", "", 0}
           };
          

          I believe that EOPNOTSUPP needs to be handled as possible error code in this path.

          drigby Dave Rigby added a comment - - edited By strace 'ing the beam.smp process which was running at ~100% CPU I see a number of read syscalls failing with EOPNOTSUPP - as per the ns_heart error seen in the logs: $ strace -p 76463 -f -e trace=%desc ... 76469 openat(AT_FDCWD, "/proc/pressure/cpu" , O_RDONLY) = 47 76469 fstat(47, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 76469 read (47, 0x7f8dea68b6c8, 16384) = -1 EOPNOTSUPP (Operation not supported) 76469 close(47) ... It appears that something is preventing ns_heart from reading from the /proc file /proc/pressure/cpu . From https://patchwork.ozlabs.org/patch/1031852/ , it appears that those files can be present, but unreadable if CONFIG_PSI_DEFAULT_DISABLED is set: Kernel v4.20 added /proc/pressure/ Unknown macro: {cpu,io,memory} , when enabled CONFIG_PSI ("Pressure stall information tracking"). When CONFIG_PSI_DEFAULT_DISABLED is also enabled (and psi=1 kernel parameter is not set) these files are created, but not readable. Signed-off-by: Petr Vorel <pvorel@suse.cz> — testcases/kernel/fs/proc/proc01.c | 1 + 1 file changed, 1 insertion(+)   diff --git a/testcases/kernel/fs/proc/proc01.c b/testcases/kernel/fs/proc/proc01.c index c886e7e85..d917591bf 100644 --- a/testcases/kernel/fs/proc/proc01.c +++ b/testcases/kernel/fs/proc/proc01.c @@ -111,6 +111,7 @@ static const struct mapping known_issues[] = { {"read", "/proc/sys/vm/nr_hugepages", EOPNOTSUPP}, {"read", "/proc/sys/vm/nr_overcommit_hugepages", EOPNOTSUPP}, {"read", "/proc/sys/vm/nr_hugepages_mempolicy", EOPNOTSUPP}, + {"read", "/proc/pressure/*", EOPNOTSUPP}, {"", "", 0} }; I believe that EOPNOTSUPP needs to be handled as possible error code in this path.

          Build couchbase-server-6.0.4-3044 contains ns_server commit 4006df2 with commit message:
          MB-37273 Report errors uniformly in misc:raw_read_file().

          build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.4-3044 contains ns_server commit 4006df2 with commit message: MB-37273 Report errors uniformly in misc:raw_read_file().

          Build couchbase-server-7.0.0-1135 contains ns_server commit 4006df2 with commit message:
          MB-37273 Report errors uniformly in misc:raw_read_file().

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1135 contains ns_server commit 4006df2 with commit message: MB-37273 Report errors uniformly in misc:raw_read_file().

          Build couchbase-server-6.5.0-4952 contains ns_server commit 4006df2 with commit message:
          MB-37273 Report errors uniformly in misc:raw_read_file().

          build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4952 contains ns_server commit 4006df2 with commit message: MB-37273 Report errors uniformly in misc:raw_read_file().

          Build couchbase-server-6.5.1-6016 contains ns_server commit 4006df2 with commit message:
          MB-37273 Report errors uniformly in misc:raw_read_file().

          build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.1-6016 contains ns_server commit 4006df2 with commit message: MB-37273 Report errors uniformly in misc:raw_read_file().

          Installed CBS 6.5.0-4952 on OEL 8.1 and configured the cluster for one node

          hemant.rajput Hemant Rajput added a comment - Installed CBS 6.5.0-4952 on OEL 8.1 and configured the cluster for one node

          People

            Aliaksey Artamonau Aliaksey Artamonau
            ritam.sharma Ritam Sharma
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty