Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-30897

Race in updating metadata to signify if successful STREAMREQ has been issued for vb

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 6.0.0
    • 6.0.0
    • eventing
    • Yes

    Description

      Centos longevity - 6.0.0-1493 - following rebalance exit observed:

      2018-08-13T15:26:37.106-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.108.103) - Rebalance exited with reason {service_rebalance_failed,eventing,
                                    {rebalance_failed,
                                     {service_error,
                                      <<"eventing rebalance hasn't made progress for past 600 secs">>}}}
      

      Around the same timestamp in eventing log of .135:

      018-08-13T15:26:37.063-07:00 [Info] util::GetProgress endpointURL: http://172.23.96.168:8096/getRebalanceProgress VbsRemainingToShuffle: 0 VbsOwnedPerPlan: 0
      2018-08-13T15:26:37.075-07:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_complex_function_0:/tmp/127.0.0.1:8091_worker_bucket_op_complex_function_0.sock:9891] vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: []
      2018-08-13T15:26:37.077-07:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_complex_function_1:/tmp/127.0.0.1:8091_worker_bucket_op_complex_function_1.sock:9901] vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: []
      2018-08-13T15:26:37.078-07:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_complex_function_2:/tmp/127.0.0.1:8091_worker_bucket_op_complex_function_2.sock:9892] vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 1 dump: [872
      2018-08-13T15:26:37.078-07:00 [Info] ServiceMgr::getRebalanceProgress Function: bucket_op_complex_function rebalance progress from node with rest port: 8091 progress: &{0 1 1 170 <nil>}
      2018-08-13T15:26:37.080-07:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_function_0:/tmp/127.0.0.1:8091_worker_bucket_op_function_0.sock:9777] vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: []
      2018-08-13T15:26:37.081-07:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_function_1:/tmp/127.0.0.1:8091_worker_bucket_op_function_1.sock:9793] vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: []
      2018-08-13T15:26:37.082-07:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_function_2:/tmp/127.0.0.1:8091_worker_bucket_op_function_2.sock:9763] vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: []
      2018-08-13T15:26:37.082-07:00 [Info] ServiceMgr::getRebalanceProgress Function: bucket_op_function rebalance progress from node with rest port: 8091 progress: &{0 0 0 0 <nil>}
      2018-08-13T15:26:37.082-07:00 [Info] util::GetProgress endpointURL: http://172.23.98.135:8096/getRebalanceProgress VbsRemainingToShuffle: 1 VbsOwnedPerPlan: 170
      2018-08-13T15:26:37.083-07:00 [Info] util::GetProgress endpointURL: http://127.0.0.1:8096/getAggRebalanceProgress VbsRemainingToShuffle: 1 VbsOwnedPerPlan: 170
      2018-08-13T15:26:37.089-07:00 [Info] rebalancer::gatherProgress total vbs to shuffle: 1866 remaining to shuffle: 1 progress: 99.94640943193997 counter: 200 cmp: true
      2018-08-13T15:26:37.089-07:00 [Error] rebalancer::gatherProgress Failing rebalance as progress hasn't made progress for past 600 secs
      2018-08-13T15:26:37.090-07:00 [Info] rebalancer::stopRebalanceCallback Updating metakv to signify rebalance cancellation
      2018-08-13T15:26:37.091-07:00 [Info] SuperSupervisor::TopologyChangeNotifCallback [2] Path => /eventing/rebalanceToken/5f122797b691fd6eaae9b826422ee53c value => stopRebalance
      2018-08-13T15:26:37.095-07:00 [Info] ServiceMgr::GetCurrentTopology rev: service.Revision{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0xb3}
      2018-08-13T15:26:37.095-07:00 [Info] ServiceMgr::GetTaskList rev: service.Revision(nil)
      2018-08-13T15:26:37.095-07:00 [Info] ServiceMgr::GetTaskList rev: service.Revision{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0xb3}
      2018-08-13T15:26:37.096-07:00 [Info] ServiceMgr::GetCurrentTopology rev: service.Revision(nil)
      2018-08-13T15:26:37.097-07:00 [Info] ServiceMgr::GetTaskList rev: service.Revision{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0xb4}
      2018-08-13T15:26:37.097-07:00 [Info] ServiceMgr::GetCurrentTopology rev: service.Revision{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0xb4}
      2018-08-13T15:26:37.106-07:00 [Info] SuperSupervisor::TopologyChangeNotifCallback [2] Apps in primary store: [bucket_op_complex_function bucket_op_function], running apps: map[bucket_op_function:Producer => app: bucket_op_function tcpPort: /tmp/127.0.0.1:8091_worker_bucket_op_function_2.sock bucket_op_complex_function:Producer => app: bucket_op_complex_function tcpPort: /tmp/127.0.0.1:8091_worker_bucket_op_complex_function_2.sock]
      2018-08-13T15:26:37.112-07:00 [Info] SuperSupervisor::TopologyChangeNotifCallback [2] App: bucket_op_complex_function deployment_status: true processing_status: true runningProducer: Producer => app: bucket_op_complex_function tcpPort: /tmp/127.0.0.1:8091_worker_bucket_op_complex_function_2.sock
      2018-08-13T15:26:37.117-07:00 [Info] SuperSupervisor::TopologyChangeNotifCallback [2] App: bucket_op_function deployment_status: true processing_status: true runningProducer: Producer => app: bucket_op_function tcpPort: /tmp/127.0.0.1:8091_worker_bucket_op_function_2.sock
      2018-08-13T15:26:37.117-07:00 [Info] Producer::Serve [bucket_op_complex_function:3] Got topology change msg: &{stop-rebalance} from super_supervisor
      2018-08-13T15:26:37.117-07:00 [Info] Producer::Serve [bucket_op_complex_function:3] Consumer: worker_bucket_op_complex_function_0 sent stop rebalance message from producer
      2018-08-13T15:26:37.117-07:00 [Info] Consumer::NotifyRebalanceStop [worker_bucket_op_complex_function_0:/tmp/127.0.0.1:8091_worker_bucket_op_complex_function_0.sock:9891] Got notification about rebalance stop
      2018-08-13T15:26:37.117-07:00 [Info] Producer::Serve [bucket_op_function:3] Got topology change msg: &{stop-rebalance} from super_supervisor
      2018-08-13T15:26:37.117-07:00 [Info] Producer::Serve [bucket_op_function:3] Consumer: worker_bucket_op_function_0 sent stop rebalance message from producer
      2018-08-13T15:26:37.117-07:00 [Info] Consumer::NotifyRebalanceStop [worker_bucket_op_function_0:/tmp/127.0.0.1:8091_worker_bucket_op_function_0.sock:9777] Got notification about rebalance stop
      2018-08-13T15:26:37.117-07:00 [Info] Consumer::NotifyRebalanceStop [worker_bucket_op_function_0:/tmp/127.0.0.1:8091_worker_bucket_op_function_0.sock:9777] Updated isRebalanceOngoing to false
      2018-08-13T15:26:37.117-07:00 [Info] Producer::Serve [bucket_op_function:3] Consumer: worker_bucket_op_function_1 sent stop rebalance message from producer
      2018-08-13T15:26:37.117-07:00 [Info] Consumer::NotifyRebalanceStop [worker_bucket_op_function_1:/tmp/127.0.0.1:8091_worker_bucket_op_function_1.sock:9793] Got notification about rebalance stop
      2018-08-13T15:26:37.117-07:00 [Info] Consumer::NotifyRebalanceStop [worker_bucket_op_function_1:/tmp/127.0.0.1:8091_worker_bucket_op_function_1.sock:9793] Updated isRebalanceOngoing to false
      2018-08-13T15:26:37.117-07:00 [Info] Producer::Serve [bucket_op_function:3] Consumer: worker_bucket_op_function_2 sent stop rebalance message from producer
      2018-08-13T15:26:37.117-07:00 [Info] Consumer::NotifyRebalanceStop [worker_bucket_op_function_2:/tmp/127.0.0.1:8091_worker_bucket_op_function_2.sock:9763] Got notification about rebalance stop
      2018-08-13T15:26:37.117-07:00 [Info] Consumer::NotifyRebalanceStop [worker_bucket_op_function_2:/tmp/127.0.0.1:8091_worker_bucket_op_function_2.sock:9763] Updated isRebalanceOngoing to false
      2018-08-13T15:26:37.117-07:00 [Info] Consumer::NotifyRebalanceStop [worker_bucket_op_complex_function_0:/tmp/127.0.0.1:8091_worker_bucket_op_complex_function_0.sock:9891] Updated isRebalanceOngoing to false
      2018-08-13T15:26:37.117-07:00 [Info] Producer::Serve [bucket_op_complex_function:3] Consumer: worker_bucket_op_complex_function_1 sent stop rebalance message from producer
      2018-08-13T15:26:37.117-07:00 [Info] Consumer::NotifyRebalanceStop [worker_bucket_op_complex_function_1:/tmp/127.0.0.1:8091_worker_bucket_op_complex_function_1.sock:9901] Got notification about rebalance stop
      2018-08-13T15:26:37.117-07:00 [Info] Consumer::NotifyRebalanceStop [worker_bucket_op_complex_function_1:/tmp/127.0.0.1:8091_worker_bucket_op_complex_function_1.sock:9901] Updated isRebalanceOngoing to false
      2018-08-13T15:26:37.117-07:00 [Info] Producer::Serve [bucket_op_complex_function:3] Consumer: worker_bucket_op_complex_function_2 sent stop rebalance message from producer
      2018-08-13T15:26:37.117-07:00 [Info] Consumer::NotifyRebalanceStop [worker_bucket_op_complex_function_2:/tmp/127.0.0.1:8091_worker_bucket_op_complex_function_2.sock:9892] Got notification about rebalance stop
      2018-08-13T15:26:37.117-07:00 [Info] Consumer::NotifyRebalanceStop [worker_bucket_op_complex_function_2:/tmp/127.0.0.1:8091_worker_bucket_op_complex_function_2.sock:9892] Updated isRebalanceOngoing to false
      2018-08-13T15:27:07.097-07:00 [Info] ServiceMgr::GetTaskList rev: service.Revision(nil)
      2018-08-13T15:27:07.097-07:00 [Info] ServiceMgr::GetCurrentTopology rev: service.Revision(nil)
      2018-08-13T15:27:07.098-07:00 [Info] ServiceMgr::GetTaskList rev: service.Revision{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0xb4}
      2018-08-13T15:27:07.099-07:00 [Info] ServiceMgr::GetCurrentTopology rev: service.Revision{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0xb4}
      

      Attachments

        For Gerrit Dashboard: MB-30897
        # Subject Branch Project Status CR V

        Activity

          People

            vikas.chaudhary Vikas Chaudhary
            arunkumar Arunkumar Senthilnathan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty