Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-52474

[Upgrade] Timers handler stuck in deploying state after offline upgrade from 6.6.5 to 7.1.1

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Morpheus
    • 7.1.1
    • eventing
    • 6.6.5-10080
      7.1.1-3115

    Description

      STEPS

      1. Cluster consists of 5 nodes (2 kv, 1 eventing, index and query node each) running CB server 6.6.5-10080.
      2. Create the required set of buckets.
      3. Create 2 eventing functions - bucket op and timers.
      4. Deploy timer function.
      5. Perform offline upgrade of the cluster of 7.1.1-3115.

      OBSERVATION
      Timers handler is stuck in deploying state.

      errors in eventing.log post upgrade

      2022-06-07T23:28:07.055-07:00 [Error] Producer::Serve [timers:0] Failed to get meta data handle , err: App: timers is not registered by any of the member pools
      2022-06-07T23:28:56.610-07:00 [Info] {"allocated":65564296,"heap_allocated":65564296,"heap_idle":35356672,"heap_in_use":68321280,"heap_objects":220087,"heap_os_related":1417216,"heap_system":103677952,"last_gc_cycle":"2022-06-07 23:28:01.989445202 -0700 PDT","memory_allocations":790202,"memory_frees":570115,"next_gc_cycle":"95ms","num_gc":16,"stack_cache_in_use":4800,"stack_in_use":1179648,"stack_span_in_use":268600,"stack_system":1179648,"total_allocated":161863920}
      2022-06-07T23:29:56.611-07:00 [Info] {"allocated":72030136,"heap_allocated":72030136,"heap_idle":29523968,"heap_in_use":74219520,"heap_objects":282242,"heap_os_related":1417216,"heap_system":103743488,"last_gc_cycle":"2022-06-07 23:28:01.989445202 -0700 PDT","memory_allocations":855780,"memory_frees":573538,"next_gc_cycle":"95ms","num_gc":16,"stack_cache_in_use":4800,"stack_in_use":1114112,"stack_span_in_use":338232,"stack_system":1114112,"total_allocated":168329760}
      2022-06-07T23:30:56.611-07:00 [Info] {"allocated":6802928,"heap_allocated":6802928,"heap_idle":93437952,"heap_in_use":10371072,"heap_objects":54150,"heap_os_related":92807168,"heap_system":103809024,"last_gc_cycle":"2022-06-07 23:30:02.05224944 -0700 PDT","memory_allocations":871422,"memory_frees":817272,"next_gc_cycle":"11ms","num_gc":17,"stack_cache_in_use":4800,"stack_in_use":1048576,"stack_span_in_use":153680,"stack_system":1048576,"total_allocated":170257480}
      2022-06-07T23:31:14.846-07:00 [Error] Eventing::main metakv observe error for global config, err: unexpected EOF. Retrying...
      2022-06-07T23:31:14.846-07:00 [Error] Eventing::main metakv observe error for event handler code, err: unexpected EOF. Retrying...
      2022-06-07T23:31:14.846-07:00 [Warn] servicesChangeNotifier: Connection terminated for collection manifest notifier instance of http://%40eventing@127.0.0.1:8091, default, bucket: n1ql_op_dst, (unexpected EOF). Retrying...
      2022-06-07T23:31:14.846-07:00 [Error] ServiceMgr::initService metakv observe error for temp store, err: unexpected EOF. Retrying...
      2022-06-07T23:31:14.846-07:00 [Error] ServiceMgr::initService metakv observe error for setting store, err: unexpected EOF. Retrying...
      2022-06-07T23:31:14.846-07:00 [Error] Eventing::main metakv observe error for debugger, err: unexpected EOF. Retrying.
      

      Also following set of log statements indicate internal undeployment was triggered for timers handler due to deletion of metadata collection.
      Can confirm that there was no step involving deletion of metadata collection in the test so not sure how it was triggered.

      2022-06-07T23:28:02.243-07:00 [Info] SuperSupervisor::checkDeletedCid Undeploying timers Reason: metadata collection delete err: <nil>
      2022-06-07T23:28:02.243-07:00 [Info] SuperSupervisor::StopProducer [1] Function: timers stopping running producer instance, DeleteFunction: false, SkipMetadataCleanup: true, UpdateMetakv: true
      2022-06-07T23:28:02.243-07:00 [Info] SuperSupervisor::deleteFromLocallyDeployedApps [1] Function: timers deleting from locally deployed apps map
      2022-06-07T23:28:02.243-07:00 [Info] SuperSupervisor::CleanupProducer [1] Function: timers stopping running instance of Eventing.Producer, DeleteFunction: false, SkipMetadataCleanup: true, UpdateMetakv: true
      2022-06-07T23:28:02.243-07:00 [Info] Producer::StopRunningConsumers [timers:0] Stopping running instances of Eventing.Consumer
      2022-06-07T23:28:02.248-07:00 [Info] SuperSupervisor::deleteFromRunningProducers [0] Function: timers deleted from running functions
      

      NOTE
      Issue is not reproducible consistently, test has ran into it only once.
      PFA logs for eventing node.

      Attachments

        Issue Links

          Activity

            People

              sujay.gad Sujay Gad
              sujay.gad Sujay Gad
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty