Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60934

[Eventing] Fatal error: libcouchbase experienced an unrecoverable error and terminates the program eventing-consumer to avoid undefined behavior

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • 7.6.2
    • 7.6.0
    • clients, eventing
    • 7.6.0-2164-enterprise
    • Untriaged
    • Linux x86_64
    • 0
    • No

    Description

      QE Test

      ./testrunner -i /data/workspace/debian-p0-eventing-vset00-00-rebalance_timer_op_curl_6.5_P1/testexec.4779.ini -p get-cbcollect-info=True,GROUP=timer_op_curl_p1,skip_log_scan=True,default_bucket=False,get-cbcollect-info=True,sirius_url=http://172.23.120.103:4000 -t eventing.eventing_rebalance.EventingRebalance.test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance,doc-per-day=20,dataset=default,nodes_init=5,services_init=kv-kv-eventing-eventing-index:n1ql,groups=simple,reset_services=True,handler_code=timer_op_curl_jenkins,curl=True,GROUP=timer_op_curl_p1,host=http://qa.sc.couchbase.com/

      The test performs the following steps:

      1. Set up & Deploy an eventing function on an existing Eventing node.
      2. Create a new Eventing node
      3. Add the new Eventing node to the cluster via a rebalance
      4. Once the rebalance has completed ~10% then kill the eventing consumer and producer processes
      5. Restart the rebalance
      6. The expectation is that the rebalance should complete

      LCB crash on Eventing node 172.23.109.114

      2024-02-25T06:52:07.614-08:00 [Info] eventing-consumer [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_1:/tmp/127.0.0.1:8091_1_3293547145.sock:2831] FATAL ERROR:
      2024-02-25T06:52:07.614-08:00 [Info] eventing-consumer [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_1:/tmp/127.0.0.1:8091_1_3293547145.sock:2831]     libcouchbase experienced an unrecoverable error and terminates the program
      2024-02-25T06:52:07.614-08:00 [Info] eventing-consumer [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_1:/tmp/127.0.0.1:8091_1_3293547145.sock:2831]     to avoid undefined behavior.
      2024-02-25T06:52:07.614-08:00 [Info] eventing-consumer [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_1:/tmp/127.0.0.1:8091_1_3293547145.sock:2831]     The program should have generated a "corefile" which may used
      2024-02-25T06:52:07.614-08:00 [Info] eventing-consumer [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_1:/tmp/127.0.0.1:8091_1_3293547145.sock:2831]     to gather more information about the problem.
      2024-02-25T06:52:07.614-08:00 [Info] eventing-consumer [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_1:/tmp/127.0.0.1:8091_1_3293547145.sock:2831]     If your system doesn't create "corefiles" I can tell you that the
      2024-02-25T06:52:07.614-08:00 [Info] eventing-consumer [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_1:/tmp/127.0.0.1:8091_1_3293547145.sock:2831]     assertion failed in /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/mc/mcreq-flush-inl.h at line 63
      2024-02-25T06:52:07.658-08:00 [Info] Consumer::addToAggChan [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_0:/tmp/127.0.0.1:8091_0_3293547145.sock:2823] vb: 347 STREAMREQ metadataUpdated not found
      2024-02-25T06:52:07.696-08:00 [Info] eventing-consumer [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_1:/tmp/127.0.0.1:8091_1_3293547145.sock:2831] 
      2024-02-25T06:52:07.696-08:00 [Info] eventing-consumer [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_1:/tmp/127.0.0.1:8091_1_3293547145.sock:2831] == Minidump location: /opt/couchbase/var/lib/couchbase/crash/865b6c46-d019-41b1-9eec1da7-409ec52e.dmp Status: 1 ==
      2024-02-25T06:52:07.700-08:00 [Warn] client::Serve [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_1:/tmp/127.0.0.1:8091_1_3293547145.sock:2831] Failed to read from stdout pipe, err: EOF
      2024-02-25T06:52:07.700-08:00 [Warn] client::Serve [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_1:/tmp/127.0.0.1:8091_1_3293547145.sock:2831] Failed to read from stderr pipe, err: EOF
      2024-02-25T06:52:07.700-08:00 [Warn] client::Serve [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_1:/tmp/127.0.0.1:8091_1_3293547145.sock:2831] Exiting c++ worker with error: signal: aborted
      2024-02-25T06:52:07.700-08:00 [Error] Consumer::feedbackReadMessageLoop [worker_src_bucket/_default/Function_92694541_test_erl_crash_on_kv_and_eventing_node_during_eventing_rebalance_1:/tmp/127.0.0.1:8091_1_3293547145.sock:2831] Read from client socket failed, err: EOF
      

      Note

      Please find attached logs from all the nodes present in the cluster as well as the dump file.

      Reran the same test 8 times in a loop on 2164 build and hit this issue only once.
      Seems like a one off to me hence marking it as not a regression.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-60934
          # Subject Branch Project Status CR V

          Activity

            People

              avsej Sergey Avseyev
              sujay.gad Sujay Gad
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There is 1 open Gerrit change

                  PagerDuty