Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34849

Not seeing any error messages in memcached logs after rebalance fails with mover crashed error.

    XMLWordPrintable

Details

    • Untriaged
    • Yes

    Description

      When the rebalance fails with mover crash error, we don't see any logs in the memcached logs. Also the memcached log is ends abruptly. 

      This was found when analysing rebalance failures with jepsen tests. The test does the following:

      1. Setup a 6 node cluster
      2. Load 30 documents and keep the document load with updates running continuously with durability level to replicate_to_majority
      3. Remove a node out and start rebalance.
      4. The rebalance fails with mover crash error

      The following is the abruptly ended memcached file 

      2019-07-03T01:10:27.138141-07:00 WARNING 50: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.104.255->ns_1@172.23.105.3:default - (vb:71) Setting stream to dead state, last_seqno is 0, unAckedBytes is 0, status is The stream closed early because the conn was disconnected
      2019-07-03T01:10:27.138147-07:00 WARNING 50: (default) DCP (Cons
      

      Attaching the logs we collected from the tests. CB version: 6.5.0-3644

      Another interesting log we see in memcached log is 

      2019-07-03T01:10:26.610863-07:00 ERROR (default) VBucket::abort (vb:439) failed as HashTable value is not CommittedState::Pending - <ud> SV @0x7f2a39d5f810 ..J ..R.Cp temp:    seq:4 rev:1 cas:1562141406129225728 key:"cid:0x0:jepsen0022, size:b" exp:0 age:2 nru:0 fc:4 vallen:1 val age:2 :"8"</ud>
      2019-07-03T01:10:26.610874-07:00 WARNING 54: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.105.2->ns_1@172.23.105.3:default - PassiveStream::processAbort: vb:439 Got error 'invalid arguments' while trying to process abort
      

      To run the test again, start a job in http://qa.sc.couchbase.com/job/jepsen-durability-trigger/ with params as cb_version=6.5.0, cb_build=<latest build> and build the job. Wait for 10 mins for the sanity job to be finished and then trigger another job from http://qa.sc.couchbase.com/job/jepsen-durability-rebalance-daily with same parameters. 

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              richard.demellow Richard deMellow
              bharath.gp Bharath G P
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty