Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-58120

Intermittent test failure in BucketMigrationTest.migrate_storage_mode_via_failover_test

    XMLWordPrintable

Details

    Description

      https://cv.jenkins.couchbase.com/job/ns-server-cluster-tests/3285

      The test:

      1. Creates a couchstore bucket named bucket-2
      2. Starts migration to magma
      3. For each node in the cluster it:
      4. # Fails over the node
      5. # Recovers it
      6. # Tests that the upgrade was successful

      One of the recoveries here failed which was caught as a rebalance failure. We failed when waiting for some seqno in memcached because the socket closed:

      [ns_server:info,2023-08-02T17:37:47.809Z,n_16@127.0.0.1:janitor_agent-bucket-2<0.3208.0>:janitor_agent:handle_info:793]Rebalancer <34410.3966.0> died with reason {unexpected_exit,
                                                  {'EXIT',<34410.4241.0>,
                                                   {{{{{badmatch,
                                                        [{<34410.4251.0>,
                                                          {done,exit,
                                                           {socket_closed,
                                                            {gen_server,call,
                                                             [<34410.4056.0>,
                                                              {takeover,31},
                                                              infinity]}},
                                                           [{gen_server,call,3,
                                                             [{file,
                                                               "gen_server.erl"},
                                                              {line,385}]},
                                                            {dcp_replicator,
                                                             '-spawn_and_wait/1-fun-0-',
                                                             1,
                                                             [{file,
                                                               "src/dcp_replicator.erl"},
                                                              {line,336}]}]}}]}, 

      Looking at the memcached logs on node n_15 we hit a GSL assertion causing us to take down the connection. It's in collections code so seems unlikely that it is storage migration related.

      2023-08-02T17:37:47.039262+00:00 ERROR 159: Exception occurred during packet execution. Closing connection [ {"ip":"127.0.0.1","port":52354} - {"ip":"127.0.0.1","port":11939} (System, <ud>@ns_server</ud>) ]: GSL: Precondition failure: '!changes.scopesToModify.empty() || changes.changeScopeWithDataLimitExists' at /home/couchbase/jenkins/workspace/ns-server-cluster-tests/kv_engine/engines/ep/src/collections/vbucket_manifest.cc:317. Cookies: [{"aiostat":"success","ewouldblock":false,"packet":{"bodylen":1,"cas":0,"datatype":"raw","extlen":1,"extras":{"state":1},"keylen":0,"magic":"ClientRequest","opaque":23,"opcode":"DCP_SET_VBUCKET_STATE","vbucket":31},"refcount":1,"started":"53104068593626967 (476 us ago)","throttled":false}] 
      2023-08-02T17:37:47.039305+00:00 INFO 159: (No Engine) DCP (Consumer) eq_dcpq:replication:n_16@127.0.0.1->n_15@127.0.0.1:bucket-2 - Removing connection [ {"ip":"127.0.0.1","port":52354} - {"ip":"127.0.0.1","port":11939} (System, <ud>@ns_server</ud>) ]

       

      CC Abhijeeth Nuthan, this was caught in one of our cluster tests, but looks like it might be a memcached issue.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              owend Daniel Owen
              ben.huddleston Ben Huddleston
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty