Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-35188

Clean up settings across multiple stores and cleanup_timers setting

    XMLWordPrintable

Details

    • Untriaged
    • Unknown

    Description

      I was doing some normal handler deploy/undeploy/editing/deploy cycles (slowly and carefully, not a stress test) and the function got stuck in Undeploying state.

      curl 'http://click:8096/api/v1/status'
      {
       "apps": [
        {
         "composite_status": "undeploying",
         "name": "test",
         "num_bootstrapping_nodes": 0,
         "num_deployed_nodes": 1,
         "deployment_status": false,
         "processing_status": false
        }
       ],
       "num_eventing_nodes": 1
      }
      

      The cluster is quite healthy. All buckets are online and UI works fine. I can even export the handler without trouble from UI after it's stuck in this state. See eventing.log

      Then at 2019-07-19T18:09:31.330 I did "sudo killall eventing-producer eventing-consumer". The processes restarted, but it is still stuck undeploying. See eventing2.log

      Out of desperation, I did "sudo killall -9 eventing-producer eventing-consumer" at 2019-07-19T18:15:58.577 and it restarted again (verified by observing PIDs of eventing-consumer and eventing-producer) change. It still is undeploying!

      At this point, I'm giving up and attaching [^collectinfo-2019-07-19T125046-ns_1@127.0.0.1.zip]

      Note that I did curl 'http://click:8096/api/v1/functions/test' and it shows:

      {
        "appcode": "function OnUpdate(doc, meta) {\n    var r = dst['idx'];\n    if (!r || !r.n) r = {'n': 0};\n    r.n++;\n    dst['idx'] = r;\n    var r = Math.floor(Math.random() * 6);\n    var m = {\"hello_m\": \"world_m\"};\n    var a = [\"hello_a\", \"world_a\"];\n    var i = (r.n) % 7;\n    log(\"Starting run:\", i);\n    switch (i) {\n        case 0: throw \"foo1\";\n        case 1: no_such_method();\n        case 2: bad_var++;\n        case 3: throw m;\n        case 4: throw a;\n        case 5: throw 123.45;\n        case 6: throw true;\n    }\n    log(\"Finished run:\", i);\n}",
        "depcfg": {
          "buckets": [
            {
              "alias": "dst",
              "bucket_name": "dst",
              "access": "rw"
            }
          ],
          "curl": [],
          "metadata_bucket": "meta",
          "source_bucket": "src"
        },
        "version": "evt-6.5.0-0000-ee",
        "function_id": 3048009367,
        "id": 0,
        "function_instance_id": "E5SjJ3",
        "appname": "test",
        "settings": {
          "dcp_stream_boundary": "everything",
          "deadline_timeout": 62,
          "deployment_status": false,
          "description": "",
          "execution_timeout": 60,
          "log_level": "INFO",
          "processing_status": false,
          "user_prefix": "eventing",
          "using_timer": false,
          "worker_count": 1
        },
        "using_timer": false,
        "src_mutation": false
      }
      

      *Note:
      "deployment_status": false,
      "processing_status": false*

      But REST API says:

      {
       "apps": [
        {
         "composite_status": "undeploying",
         "name": "test",
         "num_bootstrapping_nodes": 0,
         "num_deployed_nodes": 1,
         "deployment_status": false,
         "processing_status": false
        }
       ],
       "num_eventing_nodes": 1
      }
      

      Attachments

        1. all-logs.tar.gz
          10.89 MB
        2. collectinfo-2019-07-19T125046-ns_1@127.0.0.1.zip
          18.25 MB
        3. eventing.log
          1.07 MB
        4. eventing2.log
          1.23 MB
        5. eventing3.log
          1.34 MB
        6. eventing-after112314ps1.log
          33.95 MB
        7. Screen Shot 2019-07-19 at 6.07.04 PM.png
          Screen Shot 2019-07-19 at 6.07.04 PM.png
          658 kB
        8. test.json
          1 kB

        Issue Links

          For Gerrit Dashboard: MB-35188
          # Subject Branch Project Status CR V

          Activity

            Build couchbase-server-6.5.0-3841 contains eventing commit 66e2ccf with commit message:
            MB-35188: Do not write to metakvappsettings path at the end of deployment

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-3841 contains eventing commit 66e2ccf with commit message: MB-35188 : Do not write to metakvappsettings path at the end of deployment

            Build couchbase-server-6.5.0-4060 contains eventing commit 230d3a6 with commit message:
            MB-35188 MB-35462: Add retries around all metakv writes

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4060 contains eventing commit 230d3a6 with commit message: MB-35188 MB-35462 : Add retries around all metakv writes
            lynn.straus Lynn Straus added a comment -

            Per offline review from Keshav and Jeelan, this is potentially deferrable.

            lynn.straus Lynn Straus added a comment - Per offline review from Keshav and Jeelan, this is potentially deferrable.

            Will this be addressed again?

            lisa.krueger Lisa Krueger (Inactive) added a comment - Will this be addressed again?
            jeelan.poola Jeelan Poola added a comment -

            Lisa Krueger There are no outstanding known issues that block undeploy/deploy/pause/resume as of 6.6.3/7.0.2 in eventing. This ticket is kept open more to track some clean up of internal metadata storage entities that are redundant. I updated description of this ticket to capture this point. We plan to take it up in future as per appropriate priority. Is there anything specific you are looking for?

            jeelan.poola Jeelan Poola added a comment - Lisa Krueger There are no outstanding known issues that block undeploy/deploy/pause/resume as of 6.6.3/7.0.2 in eventing. This ticket is kept open more to track some clean up of internal metadata storage entities that are redundant. I updated description of this ticket to capture this point. We plan to take it up in future as per appropriate priority. Is there anything specific you are looking for?

            People

              srinivasan.raman Srinivasan Raman
              siri Sriram Melkote (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty