Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40293

Pause hung with timer handler

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 6.6.0
    • Fix Version/s: 6.6.0
    • Component/s: eventing
    • Labels:
      None
    • Triage:
      Untriaged
    • Story Points:
      1
    • Is this a Regression?:
      Unknown

      Description

      Build: 6.6.0-7854

      While testing some scenario , encountered pause hung on vagrant machine

      function OnUpdate(doc, meta) {
          var expiry = new Date();
          expiry.setSeconds(expiry.getSeconds() + 30);
          if (meta.id.includes("_")){
              var res = meta.id.split("_")[1];
              if(parseInt(res)%5==0){
                  var context = {docID : meta.id, random_text : 'e6cZZGHuh0R7Aumoe6cZZGHuh0R7Aumoe6cZZGHuh0R7Aumoe6cZZGHuh0R7Aumoe6cZZGHuh0R7Aumoe6cZZGHuh0R7Aumoe6cZZGHuh0R7Aumoe6cZZGHuh0R7Aumoe6cZZGHuh0R7Aumoe6cZZGHuh0R7Aumoe6cZZGHuh0R7Aumoe6cZZGHuh0R7Aumoe6cZZGHuh0R7Aumoe6cZZGHuh0'};
                  createTimer(timerCallback,  expiry, meta.id, context);
                  }
          }
      }
      function OnDelete(meta) {
          var expiry = new Date();
          expiry.setSeconds(expiry.getSeconds() + 30);
          if (meta.id.includes("_")){
              var res = meta.id.split("_")[1];
              if(parseInt(res)%5==0){
                  var context = {docID : meta.id};
                  var cancelled=cancelTimer(NDtimerCallback,meta.id);
                  if(cancelled){
                  createTimer(NDtimerCallback,  expiry, meta.id, context);
                  }
              }
          }    
      }
      function NDtimerCallback(context) {
          delete dst_bucket[context.docID];
      }
      function timerCallback(context) {
          dst_bucket[context.docID] = 'from timerCallback';
      } 

      API status

      {
          "apps": [
              {
                  "composite_status": "pausing",
                  "name": "timer_op",
                  "num_bootstrapping_nodes": 0,
                  "num_deployed_nodes": 1,
                  "deployment_status": true,
                  "processing_status": false
              }
          ],
          "num_eventing_nodes": 1
      } 

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          jeelan.poola Jeelan Poola added a comment -

          Vikas Chaudhary Request you verify this on latest 6.6.0 builds and confirm if it is fixed. If not, please reopen the ticket. Thank you!

          Show
          jeelan.poola Jeelan Poola added a comment - Vikas Chaudhary Request you verify this on latest 6.6.0 builds and confirm if it is fixed. If not, please reopen the ticket. Thank you!
          Hide
          raju Raju Suravarjjala added a comment -

          Bulk closing invalid, duplicate and won't fix bugs

          Show
          raju Raju Suravarjjala added a comment - Bulk closing invalid, duplicate and won't fix bugs
          Hide
          ankit.prabhu Ankit Prabhu added a comment -

          Resolving this as duplicate as its fix went as part of  https://issues.couchbase.com/browse/MB-40293 

          Show
          ankit.prabhu Ankit Prabhu added a comment - Resolving this as duplicate as its fix went as part of  https://issues.couchbase.com/browse/MB-40293  
          Hide
          ankit.prabhu Ankit Prabhu added a comment -

          worker_timer_op_2 unable to acquire the termination lock and not able to terminate the javascript after script timeout.

          2020-07-06T04:14:26.563-07:00 [Info] eventing-consumer [worker_timer_op_2:/tmp/127.0.0.1:8091_2_3559826642.sock:3504] Task took: 60022423495ns, trying to obtain lock to terminate its execution
          

          Due to this new timer scan message was not able to pop and agg_queue_size remained 1.

          2020-07-06T04:13:32.948-07:00 [Info] Consumer::processStatsEvents [worker_timer_op_2:/tmp/127.0.0.1:8091_2_3559826642.sock:3504] CPP worker stats. Failure stats: {"app_worker_setting_events_lost":0,"bucket_op_exception_count":0,"checkpoint_failure_count":0,"curl_non_200_response":0,"dcp_events_lost":0,"debugger_events_lost":0,"delete_events_lost":0,"mutation_events_lost":0,"n1ql_op_exception_count":0,"timeout_count":0,"timer_callback_missing_counter":0,"timer_context_size_exceeded_counter":0,"timer_events_lost":0,"timestamp":"2020-07-06T11:13:32Z","v8worker_events_lost":0} execution stats: {"agg_queue_memory":0,"agg_queue_size":0,"curl":{"delete":0,"get":0,"head":0,"post":0,"put":0},"dcp_delete_msg_counter":0,"dcp_delete_parse_failure":0,"dcp_mutation_msg_counter":0,"dcp_mutation_parse_failure":0,"enqueued_dcp_delete_msg_counter":0,"enqueued_dcp_mutation_msg_counter":0,"enqueued_timer_msg_counter":0,"feedback_queue_size":0,"filtered_dcp_delete_counter":0,"filtered_dcp_mutation_counter":0,"lcb_retry_failure":0,"messages_parsed":2194,"num_processed_events":0,"on_delete_failure":0,"on_delete_success":0,"on_update_failure":0,"on_update_success":0,"processed_events_size":0,"timer_cancel_counter":0,"timer_create_counter":0,"timer_create_failure":0,"timer_msg_counter":1,"timer_responses_sent":0,"timestamp":"2020-07-06T11:13:32Z","uv_msg_parse_failure":0,"uv_try_write_failure_counter":0}
          

          The termination of javascript execution won't happen if the execution of javascript method control is in c++ which is the case when we call createTimer. Earlier if any retriable error comes from libcouchbase, eventing would retry till max_retry_count or if its 0(default value for max_retry_count) eventing will do infinite retry till its succeed.

          This behaviour is changed in https://github.com/couchbase/eventing/commit/ec28eb3bb6e0fc9d98dfd0acc0e552ddcb8077ee.
          In this we try till execution timeout or max_retry_count and if it didn't succeed it will return error and with this it will release termination lock and next message can be processed.

          Vikas ChaudharyCould you please try with latest build and check whether the issue still exist or not?

          Show
          ankit.prabhu Ankit Prabhu added a comment - worker_timer_op_2 unable to acquire the termination lock and not able to terminate the javascript after script timeout. 2020-07-06T04:14:26.563-07:00 [Info] eventing-consumer [worker_timer_op_2:/tmp/127.0.0.1:8091_2_3559826642.sock:3504] Task took: 60022423495ns, trying to obtain lock to terminate its execution Due to this new timer scan message was not able to pop and agg_queue_size remained 1. 2020-07-06T04:13:32.948-07:00 [Info] Consumer::processStatsEvents [worker_timer_op_2:/tmp/127.0.0.1:8091_2_3559826642.sock:3504] CPP worker stats. Failure stats: {"app_worker_setting_events_lost":0,"bucket_op_exception_count":0,"checkpoint_failure_count":0,"curl_non_200_response":0,"dcp_events_lost":0,"debugger_events_lost":0,"delete_events_lost":0,"mutation_events_lost":0,"n1ql_op_exception_count":0,"timeout_count":0,"timer_callback_missing_counter":0,"timer_context_size_exceeded_counter":0,"timer_events_lost":0,"timestamp":"2020-07-06T11:13:32Z","v8worker_events_lost":0} execution stats: {"agg_queue_memory":0,"agg_queue_size":0,"curl":{"delete":0,"get":0,"head":0,"post":0,"put":0},"dcp_delete_msg_counter":0,"dcp_delete_parse_failure":0,"dcp_mutation_msg_counter":0,"dcp_mutation_parse_failure":0,"enqueued_dcp_delete_msg_counter":0,"enqueued_dcp_mutation_msg_counter":0,"enqueued_timer_msg_counter":0,"feedback_queue_size":0,"filtered_dcp_delete_counter":0,"filtered_dcp_mutation_counter":0,"lcb_retry_failure":0,"messages_parsed":2194,"num_processed_events":0,"on_delete_failure":0,"on_delete_success":0,"on_update_failure":0,"on_update_success":0,"processed_events_size":0,"timer_cancel_counter":0,"timer_create_counter":0,"timer_create_failure":0,"timer_msg_counter":1,"timer_responses_sent":0,"timestamp":"2020-07-06T11:13:32Z","uv_msg_parse_failure":0,"uv_try_write_failure_counter":0} The termination of javascript execution won't happen if the execution of javascript method control is in c++ which is the case when we call createTimer. Earlier if any retriable error comes from libcouchbase, eventing would retry till max_retry_count or if its 0(default value for max_retry_count) eventing will do infinite retry till its succeed. This behaviour is changed in https://github.com/couchbase/eventing/commit/ec28eb3bb6e0fc9d98dfd0acc0e552ddcb8077ee . In this we try till execution timeout or max_retry_count and if it didn't succeed it will return error and with this it will release termination lock and next message can be processed. Vikas Chaudhary Could you please try with latest build and check whether the issue still exist or not?
          Hide
          jeelan.poola Jeelan Poola added a comment -

          Root causing in-progress. Will have a better update in a day or two.

          Show
          jeelan.poola Jeelan Poola added a comment - Root causing in-progress. Will have a better update in a day or two.
          Hide
          jeelan.poola Jeelan Poola added a comment -

          Vikas Chaudhary Could you please try once with Build couchbase-server-6.6.0-7862? It has the fix for volume test issue. Also since it's vagrant, it's possible that we did not receive STREAMENDS from KV. We will go through the logs.

          Show
          jeelan.poola Jeelan Poola added a comment - Vikas Chaudhary Could you please try once with Build couchbase-server-6.6.0-7862? It has the fix for volume test issue. Also since it's vagrant, it's possible that we did not receive STREAMENDS from KV. We will go through the logs.

            People

            Assignee:
            vikas.chaudhary Vikas Chaudhary
            Reporter:
            vikas.chaudhary Vikas Chaudhary
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty