Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31691

Metakv potentially returning stale values

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • 6.5.0
    • 6.0.0
    • ns_server
    • None
    • Untriaged
    • Unknown

    Description

      Eventing depends on metakv to house all function lifecycle related metadata. Metakv apis that Eventing uses:

      • CRUD
      • List* family
      • ObservePath

      Recently, we started facing a issue - sequence of events looked like:

      • Eventing stored some config in metakv and then relevant observe callback got kicked off
      • Within the observe callback, Eventing tried to do lookup for the path for which callback was triggered - but lookup returned stale value. Lookup was done few(30 or so) seconds after callback was triggered.

      ====

      Relevant log messages from Eventing:

      • SettingsChangeCallback was triggered and value passed to it had "deployment_status":true & "processing_status":true

      2018-10-17T03:08:09.696+05:30 [Info] SuperSupervisor::SettingsChangeCallback [0] Path => /eventing/appsettings/bucket_op_on_update value => map[string]interface {}{"breakpad_on":false, "processing_status":true, "sock_batch_size":1, "cpp_worker_thread_count":1, "dcp_stream_boundary":"everything", "deadline_timeout":6,"execute_timer_routine_count":3, "execution_timeout":5, "timer_context_size":1.572864e+07, "curl_timeout":1, "deployment_status":true, "lcb_inst_capacity":5,"log_level":"INFO", "timer_storage_routine_count":3, "using_timer":false, "worker_count":1, "description":"Sample app", "user_prefix":"eventing"}
      2018-10-17T03:08:09.696+05:30 [Info] SuperSupervisor::SettingsChangeCallback [0] Function: bucket_op_on_update current state: 1 requested status for deployment: true processing: true
      

      • 30+s later, Eventing did metakv lookup for path /eventing/appsettings/bucket_op_on_update (all this is happening within the callback) - in this case both "deployment_status" & "processing_status" are set to false.

      2018-10-17T03:08:45.470+05:30 [Info] SuperSupervisor::syncAppStatus [1] Function: bucket_op_on_update skipping add to deployed apps map. dStatus: false pStatus: false err: <nil>
      

      From initial look, one might say that Eventing didn't handle all old callbacks and had some backlog of callbacks to process with old values for that path. But based on log messages, the metakv callback got triggered exactly when test scripts tried to deploy a function - which suggests there might not be any backlog left.

      ====

      Relevant metakv logs:

      [metakv:debug,2018-10-17T03:08:09.692+05:30,n_0@172.31.11.122:<0.694.0>:menelaus_metakv:output_kv:165]Sent /eventing/appsettings/bucket_op_on_update rev: g2wAAAACaAJtAAAAIDY4MmU4YWEyMzI5NjI0NDMxOTBiMDBiNzEzZmRmMDk4aAJhA24FAAPWOtUOaAJtAAAAIDgwMDUxYmYxYjJkNjFhYjBkZTQwNDllYjQxMTIyYjRlaAJhNW4FAEHWOtUOag==
      [metakv:debug,2018-10-17T03:08:09.692+05:30,n_0@172.31.11.122:<0.1660.0>:menelaus_metakv:output_kv:165]Sent /eventing/appsettings/bucket_op_on_update rev: g2wAAAACaAJtAAAAIDY4MmU4YWEyMzI5NjI0NDMxOTBiMDBiNzEzZmRmMDk4aAJhA24FAAPWOtUOaAJtAAAAIDgwMDUxYmYxYjJkNjFhYjBkZTQwNDllYjQxMTIyYjRlaAJhNW4FAEHWOtUOag==
      [metakv:debug,2018-10-17T03:08:09.692+05:30,n_0@172.31.11.122:<0.16756.12>:menelaus_metakv:handle_mutate:89]Updated <<"/eventing/appsettings/bucket_op_on_update">>. Elapsed time:0 ms.
      [metakv:debug,2018-10-17T03:08:27.035+05:30,n_0@172.31.11.122:<0.16756.12>:menelaus_metakv:handle_mutate:89]Updated <<"/eventing/appsettings/bucket_op_on_update">>. Elapsed time:0 ms.
      

      Unfortunately I don't have full cbcollects, instead just have raw server log files from Eventing CI server. Hoping some useful info can be grabbed from it.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              asingh Abhishek Singh (Inactive)
              asingh Abhishek Singh (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty