Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
5.0.0
-
5.0 Beta
-
Untriaged
-
Unknown
Description
bug #2. We have seen this twice during testing. Ephemeral bucket setup with one replica on a three node cluster. We take down one node to ensure autofailover works. This is by shutting down the host. Node does autofailover. When the node boots back up, sometimes it joins back in and sometimes have to tell it to join existing cluster (this is not the bug).
When the node joins back into the cluster, because the data is ephemeral, there are no documents in the bucket on this node, because they were not on disk to load back in. When the other other nodes see it, they rebalance their replicas onto the node. This creates a bucket with around 100 documents and 3000 replicas. The current setup we are testing uses the NRU to push old docs out as new docs come in (not using TTL). As new docs are inserted to the node, new replicas are also inserted, and old docs and old replicas are pushed out by NRU. The docs (primary) never actually get to use their half of the memory because the replicas as using 90% of memory, and eventually the node is zero docs and 100% replicas.
We can resolve this by flushing the bucket and all three nodes start from same zero and fill evenly.