Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48059

Delete bucket times out waiting for no nodes

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      I noticed this during some testing were I was creating and deleting buckets in rapid succession. During the delete ns_server waits until the bucket is considered to be "not active" on the KV nodes in the cluster. In this case, ns_server waited even though there were no nodes to be waited on. Delete bucket then timed out. Here's the key log trace:

      [ns_server:warn,2021-08-19T10:49:23.656-07:00,n_0@127.0.0.1:<0.1628.0>:ns_orchestrator:idle:640]Nodes [] failed to delete bucket "b_0" within expected time.
      

      You can see that the list of nodes that didn't respond in time is empty. The issue appears to be that we wait even though there are no nodes to wait on. This change appears to address the issue: http://review.couchbase.org/c/ns_server/+/159720 and may be useful to start from.

      For completeness, here's where the bucket was created:

      ns_server:info,2021-08-19T10:48:53.632-07:00,n_0@127.0.0.1:ns_memcached-b_0<0.4374.0>:ns_memcached:do_ensure_bucket:1303]Created bucket "b_0" with config string "max_size=104857600;dbname=/Users/dfinlay/work8/ns_server/data/n_0/data/b_0;backend=couchdb;couch_bucket=b_0;max_vbuckets=64;alog_path=/Users/dfinlay/work8/ns_server/data/n_0/data/b_0/access.log;data_traffic_enabled=false;max_num_workers=3;uuid=910c2397af610d21ac57e5a6ce842154;conflict_resolution_type=seqno;bucket_type=persistent;durability_min_level=none;pitr_enabled=false;pitr_granularity=600;pitr_max_history_age=86400;magma_fragmentation_percentage=50;item_eviction_policy=value_only;persistent_metadata_purge_age=259200;max_ttl=0;ht_locks=47;compression_mode=passive;failpartialwarmup=false"
      

      Immediately the bucket is deleted. There's no log message for this but we can see it in the following traces:

      [ns_server:debug,2021-08-19T10:48:53.652-07:00,n_0@127.0.0.1:ns_janitor_server<0.1625.0>:ns_janitor_server:handle_call:101]Deleted bucket "b_0" from janitor_requests
      ...
      [ns_server:debug,2021-08-19T10:48:53.653-07:00,n_0@127.0.0.1:ns_bucket_worker<0.617.0>:ns_bucket_worker:stop_one_bucket:108]Stopping child for dead bucket: "b_0"
      ...
      [ns_server:debug,2021-08-19T10:48:53.653-07:00,n_0@127.0.0.1:chronicle_kv_log<0.393.0>:chronicle_kv_log:log:61]update (key: bucket_names, rev: {<<"f9584bdff866f34dc3dcce65b25cdd6a">>,25659})
      ["b_3502","default","travel-sample"]
      

      And 30 s later the request times out even though there aren't any nodes that need to be waited on.

      Attachments

        1. n_0.zip
          10.15 MB
        2. n_1.zip
          6.58 MB
        3. n_2.zip
          9.71 MB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              artem Artem Stemkovski
              dfinlay Dave Finlay
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty