Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31324

Eventing Rebalance fails because of C++ worker crash

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 6.0.0
    • 6.0.0
    • eventing
    • 6.0.0-1643

    Description

      Scripts to Repro

      ./testrunner -i /tmp/testexec.29004.ini -p get-cbcollect-info=True,GROUP=bucket_op_with_cron_timers -t eventing.eventing_rebalance.EventingRebalance.test_eventing_rebalance_in_when_existing_eventing_node_is_processing_mutations,nodes_init=4,services_init=kv-eventing-index-n1ql,dataset=default,groups=simple,reset_services=True,doc-per-day=20,handler_code=bucket_op_with_cron_timers,GROUP=bucket_op_with_cron_timers
       
      ./testrunner -i /tmp/testexec.29004.ini -p get-cbcollect-info=True,GROUP=bucket_op_with_cron_timers -t eventing.eventing_rebalance.EventingRebalance.test_eventing_rebalance_with_multiple_eventing_nodes,doc-per-day=5,dataset=default,nodes_init=5,services_init=kv-eventing-eventing-eventing-index:n1ql,groups=simple,reset_services=True,handler_code=bucket_op_with_cron_timers,GROUP=bucket_op_with_cron_timers
      

      Log : http://qa.sc.couchbase.com/job/test_suite_executor/92153/consoleText

      Logs attached.

      Failure stacktrace for 1st failure

      2018-09-16 03:55:58 | INFO | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] Latest logs from UI on 172.23.106.208:
      2018-09-16 03:55:58 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.120.150', u'code': 0, u'text': u"Service 'eventing' exited with status 137. Restarting. Messages:\n2018-09-16T03:55:49.905-07:00 [Info] Consumer::Stop [worker_Function_797808232_test_eventing_rebalance_in_when_existing_eventing_node_is_processing_mu_1:39278:3975] Gracefully shutting down consumer routine\n2018-09-16T03:55:50.000-07:00 [Info] Producer::KillAndRespawnEventingConsumer [Function_797808232_test_eventing_rebalance_in_when_existing_eventing_node_is_processing_mu:2] ConsumerIndex: 1 respawning the Eventing.Consumer instance\n2018-09-16T03:55:50.368-07:00 [Info] Producer::handleV8Consumer [Function_797808232_test_eventing_rebalance_in_when_existing_eventing_node_is_processing_mu:2] Spawning consumer to listen on socket: 59701 feedback socket: 43881 index: 1 vbs len: 171 dump: [683-853]\n2018-09-16T03:55:50.566-07:00 [Info] util::GetProgress endpointURL: http://172.23.120.150:8096/getRebalanceProgress VbsRemainingToShuffle: 512 VbsOwnedPerPlan: 171\n2018-09-16T03:55:50.907-07:00 [Info] Consumer::Stop [worker_Function_797808232_test_eventing_rebalance_in_when_existing_eventing_node_is_processing_mu_1:39278:3975] Issued close for go-couchbase and gocb handles\n[goport(/opt/couchbase/bin/eventing-producer)] 2018/09/16 03:55:53 child process exited with status 137\n", u'shortText': u'message', u'serverTime': u'2018-09-16T03:55:53.623Z', u'module': u'ns_log', u'tstamp': 1537095353623, u'type': u'info'}
      2018-09-16 03:55:58 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.106.208', u'code': 0, u'text': u'Rebalance exited with reason {service_rebalance_failed,eventing,\n                                 {lost_connection,shutdown}}', u'shortText': u'message', u'serverTime': u'2018-09-16T03:55:53.541Z', u'module': u'ns_orchestrator', u'tstamp': 1537095353541, u'type': u'critical'}
      

      Failure stacktrace for 2nd failure

      2018-09-16 07:00:53,697] - [rest_client:3134] INFO - Latest logs from UI on 172.23.106.208:
      [2018-09-16 07:00:53,698] - [rest_client:3135] ERROR - {u'node': u'ns_1@172.23.106.208', u'code': 0, u'text': u'Rebalance exited with reason {service_rebalance_failed,eventing,\n                                 {lost_connection,shutdown}}', u'shortText': u'message', u'serverTime': u'2018-09-16T07:00:52.646Z', u'module': u'ns_orchestrator', u'tstamp': 1537106452646, u'type': u'critical'}
      [2018-09-16 07:00:53,699] - [rest_client:3135] ERROR - {u'node': u'ns_1@172.23.120.164', u'code': 0, u'text': u"Service 'eventing' exited with status 137. Restarting. Messages:\n2018-09-16T07:00:49.449-07:00 [Info] Consumer::Stop [worker_Function_797808232_test_eventing_rebalance_with_multiple_eventing_nodes_2:55644:29186] Requested to stop supervisor for Eventing.Consumer. Exiting Consumer::Stop\n2018-09-16T07:00:49.451-07:00 [Info] Consumer::handleFailoverLog [worker_Function_797808232_test_eventing_rebalance_with_multiple_eventing_nodes_2:55644:29186] Exiting failover log handling routine\n2018-09-16T07:00:49.547-07:00 [Info] Producer::handleV8Consumer [Function_797808232_test_eventing_rebalance_with_multiple_eventing_nodes:2] Consumer: worker_Function_797808232_test_eventing_rebalance_with_multiple_eventing_nodes_2 notifying about cluster state change\n2018-09-16T07:00:49.565-07:00 [Info] Consumer::SetRebalanceStatus [worker_Function_797808232_test_eventing_rebalance_with_multiple_eventing_nodes_2:44789:0] Updated isRebalanceOngoing to true\n2018-09-16T07:00:50.010-07:00 [Info] Consumer::sendUpdateProcessedSeqNo [worker_Function_797808232_test_eventing_rebalance_with_multiple_eventing_nodes_2:55644:29186] vb: 1007 seqNo: 7 sending update seqno data to C++\n[goport(/opt/couchbase/bin/eventing-producer)] 2018/09/16 07:00:52 child process exited with status 137\n", u'shortText': u'message', u'serverTime': u'2018-09-16T07:00:52.632Z', u'module': u'ns_log', u'tstamp': 1537106452632, u'type': u'info'}
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            satya.nand Satya Nand (Inactive)
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty