Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-4030

enable traffic for for ready nodes even if not all nodes are up/healthy/ready (aka partial janitor) (was: After two nodes crashed, curr_items remained 0 after warmup for extended period of time)

    XMLWordPrintable

Details

    • 02/Sep/2013 - 20/Sep/2013

    Description

      we had two nodes crash at a customer, possibly related to a disk space issue, but I don't think so.

      After they crashed, the nodes warmed up relatively quickly, but immediately "discarded" their items. I say that because I see that they warmed up ~10m items, but the current item counts were both 0.

      I tried shutting down the service and had to kill memcached manually (kill -9). Restarting it went through the same process of warming up and then nothing.

      While I was looking around, I left it sit for a little while and magically all of the items came back. I seem to recall this bug previously where a node wouldn't be told to be active until all the nodes in the cluster were active...and it got into trouble when not all of the nodes restarted.

      Diags for all nodes will be attached

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Abhijeeth.Nuthan Abhijeeth Nuthan
            perry Perry Krug
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty