Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47851

Bucket deletion hangs indefinitely if warmup is cancelled before VBucketMap is populated with outstanding requests

    XMLWordPrintable

Details

    • Triaged
    • 1
    • No
    • KV-Engine Sprint 2021 August

    Description

      Summary

      If warmup is cancelled before it gets past the "populateVBucketMap" phase, and there are outstanding client requests which are blocked until populateVBucketMap completes, then KV-Engine never notifies the waiting client connections.

      As such, any subsequent deletion of the affected bucket is blocked forever, as the KV-Engine frontend is waiting in vain for those connections to finish their current operation.

      Details

      This kind of scenario can occur during delta-node recovery; if ns_server doesn't read vBucket states for all warming up buckets within the prepare_delta_recovery_bucket timeout value - default 180s. If that timeout is reached, it cancels the delta recovery and deletes the Buckets on the affected node. If that "DELETE_BUCKET" command is received before ep-engine has finished the populateVBucketMap phase of warmup then the waiting GET_VBSTATE command from ns_server is stuck forever and delete cannot complete.

      This looks like the following in the logs:

      2021-08-01T00:01:44.901492-04:00 INFO 4 Delete bucket [Customers]. Notifying engine
      2021-08-01T00:01:44.901662-04:00 INFO 4 Delete bucket [Customers]. Engine ready for shutdown
      2021-08-01T00:01:44.901669-04:00 INFO 4 Delete bucket [Customers]. Wait for 4 clients to disconnect
      2021-08-01T00:02:14.911226-04:00 INFO 4 Delete bucket [Customers]. Still waiting: 1 clients connected: ...
      2021-08-01T00:02:44.917036-04:00 INFO 4 Delete bucket [Customers]. Still waiting: 1 clients connected (state is unchanged).
      2021-08-01T00:03:14.923323-04:00 INFO 4 Delete bucket [Customers]. Still waiting: 1 clients connected (state is unchanged).
      

      And the waiting connection is typically an ns_server STATS(vbucket} request:

      2021-08-01T00:02:14.911226-04:00 INFO 4 Delete bucket [Customers]. Still waiting: 1 clients connected: {"420":{"agent_name":"regular","bucket_index":1,"clustermap_revno":"unknown","connection":"0x00006fe0ae035580","connection_id":"127.0.0.1:43663","cookies":[{"aiostat":"would block","connection":"[ 127.0.0.1:43663 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x0000000000000000","ewouldblock":true,"packet":{"bodylen":7,"cas":0,"datatype":"raw","extlen":0,"key":"<ud>vbucket</ud>","keylen":7,"magic":"ClientRequest","opaque":0,"opcode":"STAT","vbucket":0},"refcount":0}],"datatype":"json","dcp":false,"dcp_no_value":false,"dcp_xattr_aware":false,"features":["tcp nodelay"],"internal":true,"iov":{"size":10,"used":0},"itemlist":{"size":0},"libevent":{"ev_flags":{"decoded":["read","persist"],"raw":"0x0012"},"registered":false,"which":{"decoded":["read"],"raw":"0x0002"}},"max_reqs_per_event":20,"max_sched_time":"31252","min_sched_time":"887","msglist":{"bytes":0,"curr":0,"used":1},"nevents":20,"parent_port":11209,"peername":"127.0.0.1:43663","priority":"Medium","protocol":"memcached","read":{"buffer":123256223688704,"empty":false,"read_head":0,"size":2048,"write_head":31},"refcount":1,"sasl_enabled":true,"socket":420,"sockname":"127.0.0.1:11209","ssl":{"enabled":false},"state":"execute","temp_alloc_list":{"size":0},"thread":47,"total_cpu_time":"3010287","total_recv":262,"total_send":28610,"username":"@ns_server","write_and_go":"new_cmd","yields":0}}
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              drigby Dave Rigby (Inactive)
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  PagerDuty