Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-38456

[System Test] : All buckets got deleted from the cluster

    XMLWordPrintable

    Details

      Description

      Build : 6.5.1-6271
      Test : -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml
      Scale : 3
      Day : 2 days

      After bucket compaction got done at 2020-03-26T03:09:51, all buckets in the cluster got deleted starting from 2020-03-26T03:10:04. The test does not delete the buckets. Exact reason for bucket deletion could not be determined.

      Following can be seen in the debug logs for 172.23.108.103

      [ns_server:debug,2020-03-26T03:10:05.126-07:00,ns_1@172.23.108.103:<0.32096.0>:ns_pubsub:do_subscribe_link_continue:152]Parent process of subscription {ns_stats_event,<0.32095.0>} exited with reason shutdown
      [ns_server:debug,2020-03-26T03:10:05.127-07:00,ns_1@172.23.108.103:<0.32094.0>:ns_pubsub:do_subscribe_link_continue:152]Parent process of subscription {ns_tick_event,<0.32093.0>} exited with reason shutdown
      [ns_server:debug,2020-03-26T03:10:05.260-07:00,ns_1@172.23.108.103:terse_bucket_info_uploader-CUSTOMER<0.32077.0>:terse_bucket_info_uploader:refresh_cluster_config:93]Bucket CUSTOMER is dead
      [ns_server:error,2020-03-26T03:10:05.280-07:00,ns_1@172.23.108.103:<0.12631.595>:dcp_proxy:handle_info:117]Socket #Port<0.634174> was closed. Closing myself. State = {state,
                                                                  #Port<0.634174>,
                                                                  {producer,
                                                                   "replication:ns_1@172.23.104.61->ns_1@172.23.108.103:CUSTOMER",
                                                                   'ns_1@172.23.104.61',
                                                                   "CUSTOMER"},
                                                                  undefined,<<>>,
                                                                  dcp_producer_conn,
                                                                  [],
                                                                  #Port<0.634173>,
                                                                  <0.20598.596>,
                                                                  false}
      [ns_server:debug,2020-03-26T03:10:05.281-07:00,ns_1@172.23.108.103:dcp_traffic_monitor<0.22105.728>:dcp_traffic_monitor:handle_info:82]Deleting Node:'ns_1@172.23.104.61' Bucket:"CUSTOMER" Pid:<0.12631.595>
      [error_logger:error,2020-03-26T03:10:05.281-07:00,ns_1@172.23.108.103:error_logger<0.32.0>:ale_error_logger_handler:do_log:203]** Generic server <0.12631.595> terminating
      ** Last message in was {tcp_closed,#Port<0.634174>}
      ** When Server state == {state,#Port<0.634174>,
                                  {producer,
                                      "replication:ns_1@172.23.104.61->ns_1@172.23.108.103:CUSTOMER",
                                      'ns_1@172.23.104.61',"CUSTOMER"},
                                  undefined,<<>>,dcp_producer_conn,[],
                                  #Port<0.634173>,<0.20598.596>,false}
      ** Reason for termination ==
      ** socket_closed
       
      [error_logger:error,2020-03-26T03:10:05.282-07:00,ns_1@172.23.108.103:error_logger<0.32.0>:ale_error_logger_handler:do_log:203]
      =========================CRASH REPORT=========================
        crasher:
          initial call: dcp_proxy:init/1
          pid: <0.12631.595>
          registered_name: []
          exception exit: socket_closed
            in function  gen_server:handle_common_reply/8 (gen_server.erl, line 726)
          ancestors: ['dcp_replicator-CUSTOMER-ns_1@172.23.104.61',
                        'dcp_sup-CUSTOMER','single_bucket_kv_sup-CUSTOMER',
                        ns_bucket_sup,ns_bucket_worker_sup,ns_server_sup,
                        ns_server_nodes_sup,<0.205.0>,ns_server_cluster_sup,
                        root_sup,<0.117.0>]
          message_queue_len: 0
          messages: []
          links: [<0.19323.596>]
          dictionary: []
          trap_exit: false
          status: running
          heap_size: 1598
          stack_size: 27
          reductions: 36617911
      neighbours:
       
      [ns_server:debug,2020-03-26T03:10:05.282-07:00,ns_1@172.23.108.103:dcp_traffic_monitor<0.22105.728>:dcp_traffic_monitor:handle_info:82]Deleting Node:'ns_1@172.23.108.103' Bucket:"CUSTOMER" Pid:<0.20598.596>
      [error_logger:error,2020-03-26T03:10:05.282-07:00,ns_1@172.23.108.103:error_logger<0.32.0>:ale_error_logger_handler:do_log:203]** Generic server 'dcp_replicator-CUSTOMER-ns_1@172.23.104.61' terminating
      ** Last message in was {'EXIT',<0.12631.595>,socket_closed}
      ** When Server state == {state,[{<0.20598.596>,#Port<0.634173>},
                                      {<0.12631.595>,#Port<0.634174>}],
                                     <0.20598.596>,
                                     "replication:ns_1@172.23.104.61->ns_1@172.23.108.103:CUSTOMER",
                                     'ns_1@172.23.104.61',"CUSTOMER"}
      ** Reason for termination ==
      ** socket_closed
       
      [error_logger:error,2020-03-26T03:10:05.283-07:00,ns_1@172.23.108.103:error_logger<0.32.0>:ale_error_logger_handler:do_log:203]
      =========================CRASH REPORT=========================
        crasher:
          initial call: dcp_replicator:init/1
          pid: <0.19323.596>
          registered_name: 'dcp_replicator-CUSTOMER-ns_1@172.23.104.61'
          exception exit: socket_closed
            in function  gen_server:handle_common_reply/8 (gen_server.erl, line 726)
          ancestors: ['dcp_sup-CUSTOMER','single_bucket_kv_sup-CUSTOMER',
                        ns_bucket_sup,ns_bucket_worker_sup,ns_server_sup,
                        ns_server_nodes_sup,<0.205.0>,ns_server_cluster_sup,
                        root_sup,<0.117.0>]
          message_queue_len: 1
          messages: [{'EXIT',<0.20598.596>,killed}]
          links: [<0.32079.0>]
          dictionary: []
          trap_exit: true
          status: running
          heap_size: 2586
          stack_size: 27
          reductions: 1201008
      
      

      The test and environment have been unchanged since 6.5.0, so this could be a regression.

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          ritam.sharma Ritam Sharma added a comment -

          Last good build that ran for 5 days - 6.5.1-6196.
          Start a new run with build - 6.5.1-6284
          Daniel Owen - Marking this as test blocker for now, since system testing is blocked. New run started to rule out any env issues, while investigation can start on parallel.

          Show
          ritam.sharma Ritam Sharma added a comment - Last good build that ran for 5 days - 6.5.1-6196. Start a new run with build - 6.5.1-6284 Daniel Owen - Marking this as test blocker for now, since system testing is blocked. New run started to rule out any env issues, while investigation can start on parallel.
          Hide
          owend Daniel Owen added a comment -

          Looking at the master_events.log on 172.23.104.157:

          {"type":"deleteBucket","ts":1585217404.634882,"bucket":"CUSTOMER"}
          {"type":"deleteBucket","ts":1585217408.537056,"bucket":"DISTRICT"}
          {"type":"deleteBucket","ts":1585217424.297954,"bucket":"HISTORY"}
          {"type":"deleteBucket","ts":1585217429.13534,"bucket":"ITEM"}
          {"type":"deleteBucket","ts":1585217433.234421,"bucket":"NEW_ORDER"}
          {"type":"deleteBucket","ts":1585217437.880829,"bucket":"ORDERS"}
          {"type":"deleteBucket","ts":1585217441.330714,"bucket":"ORDER_LINE"}
          {"type":"deleteBucket","ts":1585217446.008308,"bucket":"STOCK"}
          {"type":"deleteBucket","ts":1585217449.084517,"bucket":"WAREHOUSE"}
          {"type":"deleteBucket","ts":1585217451.226919,"bucket":"default"}
          {"type":"deleteBucket","ts":1585263053.31393,"bucket":"test"}
          

          The epoch time is Thursday, 26 March 2020 10:10:04.634, which corresponds to 03:10:04 (logs are -7 from GMT)

          Looking at ns_server.info.log on 172.23.97.119

          [user:info,2020-03-26T03:10:05.106-07:00,ns_1@172.23.97.119:ns_memcached-CUSTOMER<0.4669.868>:ns_memcached:do_terminate:750]Shutting down bucket "CUSTOMER" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T03:10:08.807-07:00,ns_1@172.23.97.119:ns_memcached-DISTRICT<0.23325.867>:ns_memcached:do_terminate:750]Shutting down bucket "DISTRICT" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T03:10:24.603-07:00,ns_1@172.23.97.119:ns_memcached-HISTORY<0.8971.867>:ns_memcached:do_terminate:750]Shutting down bucket "HISTORY" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T03:10:29.586-07:00,ns_1@172.23.97.119:ns_memcached-ITEM<0.26790.866>:ns_memcached:do_terminate:750]Shutting down bucket "ITEM" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T03:10:33.845-07:00,ns_1@172.23.97.119:ns_memcached-NEW_ORDER<0.12659.866>:ns_memcached:do_terminate:750]Shutting down bucket "NEW_ORDER" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T03:10:38.405-07:00,ns_1@172.23.97.119:ns_memcached-ORDERS<0.13.866>:ns_memcached:do_terminate:750]Shutting down bucket "ORDERS" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T03:10:42.105-07:00,ns_1@172.23.97.119:ns_memcached-ORDER_LINE<0.18526.865>:ns_memcached:do_terminate:750]Shutting down bucket "ORDER_LINE" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T03:10:46.175-07:00,ns_1@172.23.97.119:ns_memcached-STOCK<0.5557.865>:ns_memcached:do_terminate:750]Shutting down bucket "STOCK" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T03:10:49.383-07:00,ns_1@172.23.97.119:ns_memcached-WAREHOUSE<0.24365.864>:ns_memcached:do_terminate:750]Shutting down bucket "WAREHOUSE" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T03:10:51.460-07:00,ns_1@172.23.97.119:ns_memcached-default<0.21663.868>:ns_memcached:do_terminate:750]Shutting down bucket "default" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T15:50:53.428-07:00,ns_1@172.23.97.119:ns_memcached-test<0.22991.1050>:ns_memcached:do_terminate:750]Shutting down bucket "test" on 'ns_1@172.23.97.119' for deletion
          

          Also worth noting on that node we also delete buckets at ~1:20am

          [user:info,2020-03-26T01:20:59.429-07:00,ns_1@172.23.97.119:ns_memcached-WAREHOUSE<0.32247.0>:ns_memcached:do_terminate:750]Shutting down bucket "WAREHOUSE" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T01:21:00.832-07:00,ns_1@172.23.97.119:ns_memcached-STOCK<0.30589.0>:ns_memcached:do_terminate:750]Shutting down bucket "STOCK" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T01:21:03.455-07:00,ns_1@172.23.97.119:ns_memcached-ORDER_LINE<0.29059.0>:ns_memcached:do_terminate:750]Shutting down bucket "ORDER_LINE" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T01:21:05.162-07:00,ns_1@172.23.97.119:ns_memcached-ORDERS<0.27926.0>:ns_memcached:do_terminate:750]Shutting down bucket "ORDERS" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T01:21:07.750-07:00,ns_1@172.23.97.119:ns_memcached-NEW_ORDER<0.26777.0>:ns_memcached:do_terminate:750]Shutting down bucket "NEW_ORDER" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T01:21:10.259-07:00,ns_1@172.23.97.119:ns_memcached-ITEM<0.25790.0>:ns_memcached:do_terminate:750]Shutting down bucket "ITEM" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T01:21:11.924-07:00,ns_1@172.23.97.119:ns_memcached-HISTORY<0.24630.0>:ns_memcached:do_terminate:750]Shutting down bucket "HISTORY" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T01:21:13.865-07:00,ns_1@172.23.97.119:ns_memcached-DISTRICT<0.23580.0>:ns_memcached:do_terminate:750]Shutting down bucket "DISTRICT" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T01:21:15.520-07:00,ns_1@172.23.97.119:ns_memcached-CUSTOMER<0.22560.0>:ns_memcached:do_terminate:750]Shutting down bucket "CUSTOMER" on 'ns_1@172.23.97.119' for deletion
          [user:info,2020-03-26T01:21:17.487-07:00,ns_1@172.23.97.119:ns_memcached-default<0.21500.0>:ns_memcached:do_terminate:750]Shutting down bucket "default" on 'ns_1@172.23.97.119' for deletion
          

          So in conclusion KV is doing what it is told, and hence removing couchbase-bucket from the affected component.
          It looks like the test or another user is deleting the buckets.

          Show
          owend Daniel Owen added a comment - Looking at the master_events.log on 172.23.104.157: {"type":"deleteBucket","ts":1585217404.634882,"bucket":"CUSTOMER"} {"type":"deleteBucket","ts":1585217408.537056,"bucket":"DISTRICT"} {"type":"deleteBucket","ts":1585217424.297954,"bucket":"HISTORY"} {"type":"deleteBucket","ts":1585217429.13534,"bucket":"ITEM"} {"type":"deleteBucket","ts":1585217433.234421,"bucket":"NEW_ORDER"} {"type":"deleteBucket","ts":1585217437.880829,"bucket":"ORDERS"} {"type":"deleteBucket","ts":1585217441.330714,"bucket":"ORDER_LINE"} {"type":"deleteBucket","ts":1585217446.008308,"bucket":"STOCK"} {"type":"deleteBucket","ts":1585217449.084517,"bucket":"WAREHOUSE"} {"type":"deleteBucket","ts":1585217451.226919,"bucket":"default"} {"type":"deleteBucket","ts":1585263053.31393,"bucket":"test"} The epoch time is Thursday, 26 March 2020 10:10:04.634, which corresponds to 03:10:04 (logs are -7 from GMT) Looking at ns_server.info.log on 172.23.97.119 [user:info,2020-03-26T03:10:05.106-07:00,ns_1@172.23.97.119:ns_memcached-CUSTOMER<0.4669.868>:ns_memcached:do_terminate:750]Shutting down bucket "CUSTOMER" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T03:10:08.807-07:00,ns_1@172.23.97.119:ns_memcached-DISTRICT<0.23325.867>:ns_memcached:do_terminate:750]Shutting down bucket "DISTRICT" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T03:10:24.603-07:00,ns_1@172.23.97.119:ns_memcached-HISTORY<0.8971.867>:ns_memcached:do_terminate:750]Shutting down bucket "HISTORY" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T03:10:29.586-07:00,ns_1@172.23.97.119:ns_memcached-ITEM<0.26790.866>:ns_memcached:do_terminate:750]Shutting down bucket "ITEM" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T03:10:33.845-07:00,ns_1@172.23.97.119:ns_memcached-NEW_ORDER<0.12659.866>:ns_memcached:do_terminate:750]Shutting down bucket "NEW_ORDER" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T03:10:38.405-07:00,ns_1@172.23.97.119:ns_memcached-ORDERS<0.13.866>:ns_memcached:do_terminate:750]Shutting down bucket "ORDERS" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T03:10:42.105-07:00,ns_1@172.23.97.119:ns_memcached-ORDER_LINE<0.18526.865>:ns_memcached:do_terminate:750]Shutting down bucket "ORDER_LINE" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T03:10:46.175-07:00,ns_1@172.23.97.119:ns_memcached-STOCK<0.5557.865>:ns_memcached:do_terminate:750]Shutting down bucket "STOCK" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T03:10:49.383-07:00,ns_1@172.23.97.119:ns_memcached-WAREHOUSE<0.24365.864>:ns_memcached:do_terminate:750]Shutting down bucket "WAREHOUSE" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T03:10:51.460-07:00,ns_1@172.23.97.119:ns_memcached-default<0.21663.868>:ns_memcached:do_terminate:750]Shutting down bucket "default" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T15:50:53.428-07:00,ns_1@172.23.97.119:ns_memcached-test<0.22991.1050>:ns_memcached:do_terminate:750]Shutting down bucket "test" on 'ns_1@172.23.97.119' for deletion Also worth noting on that node we also delete buckets at ~1:20am [user:info,2020-03-26T01:20:59.429-07:00,ns_1@172.23.97.119:ns_memcached-WAREHOUSE<0.32247.0>:ns_memcached:do_terminate:750]Shutting down bucket "WAREHOUSE" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T01:21:00.832-07:00,ns_1@172.23.97.119:ns_memcached-STOCK<0.30589.0>:ns_memcached:do_terminate:750]Shutting down bucket "STOCK" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T01:21:03.455-07:00,ns_1@172.23.97.119:ns_memcached-ORDER_LINE<0.29059.0>:ns_memcached:do_terminate:750]Shutting down bucket "ORDER_LINE" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T01:21:05.162-07:00,ns_1@172.23.97.119:ns_memcached-ORDERS<0.27926.0>:ns_memcached:do_terminate:750]Shutting down bucket "ORDERS" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T01:21:07.750-07:00,ns_1@172.23.97.119:ns_memcached-NEW_ORDER<0.26777.0>:ns_memcached:do_terminate:750]Shutting down bucket "NEW_ORDER" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T01:21:10.259-07:00,ns_1@172.23.97.119:ns_memcached-ITEM<0.25790.0>:ns_memcached:do_terminate:750]Shutting down bucket "ITEM" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T01:21:11.924-07:00,ns_1@172.23.97.119:ns_memcached-HISTORY<0.24630.0>:ns_memcached:do_terminate:750]Shutting down bucket "HISTORY" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T01:21:13.865-07:00,ns_1@172.23.97.119:ns_memcached-DISTRICT<0.23580.0>:ns_memcached:do_terminate:750]Shutting down bucket "DISTRICT" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T01:21:15.520-07:00,ns_1@172.23.97.119:ns_memcached-CUSTOMER<0.22560.0>:ns_memcached:do_terminate:750]Shutting down bucket "CUSTOMER" on 'ns_1@172.23.97.119' for deletion [user:info,2020-03-26T01:21:17.487-07:00,ns_1@172.23.97.119:ns_memcached-default<0.21500.0>:ns_memcached:do_terminate:750]Shutting down bucket "default" on 'ns_1@172.23.97.119' for deletion So in conclusion KV is doing what it is told, and hence removing couchbase-bucket from the affected component. It looks like the test or another user is deleting the buckets.
          Hide
          mihir.kamdar Mihir Kamdar added a comment -

          We are doing another run to see if this issue is seen again. Will confirm in a couple of days.

          Show
          mihir.kamdar Mihir Kamdar added a comment - We are doing another run to see if this issue is seen again. Will confirm in a couple of days.
          Hide
          raju Raju Suravarjjala added a comment -

          Thanks Mihir Kamdar and Ritam Sharma If possible please update with the latest results

          Show
          raju Raju Suravarjjala added a comment - Thanks Mihir Kamdar and Ritam Sharma If possible please update with the latest results
          Hide
          mihir.kamdar Mihir Kamdar added a comment -

          This issue isn't seen in the re-run on the same env with build 6.5.1-6284 after 2d 15h. Will close this bug after the test completes 3d without the issue.

          Show
          mihir.kamdar Mihir Kamdar added a comment - This issue isn't seen in the re-run on the same env with build 6.5.1-6284 after 2d 15h. Will close this bug after the test completes 3d without the issue.
          Hide
          ritam.sharma Ritam Sharma added a comment -

          Re-run has shown no issues after 3 days. Closing the defect as "User Error". Will reopen if the issue is seen again.

          Show
          ritam.sharma Ritam Sharma added a comment - Re-run has shown no issues after 3 days. Closing the defect as "User Error". Will reopen if the issue is seen again.

            People

            Assignee:
            ritam.sharma Ritam Sharma
            Reporter:
            mihir.kamdar Mihir Kamdar
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty