Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-15114

Docker/CoreOS: Rebalance fails with error 'badmatch' on new install. No or very few docuements.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 4.0.0
    • 3.0.1, 3.0.3
    • cloud_marketplace
    • Security Level: Public
    • couchbase server in Docker on CoreOS under AWS
    • Untriaged
    • Centos 64-bit
    • Unknown

    Description

      On Amazon EC2

      Start up 2 completely fresh couchbase servers from:
      https://github.com/couchbaselabs/couchbase-server-coreos

      Ensure /var/lib/couchbase is mounted to EBS storage and mapped to couchbase docker container at /opt/couchbase/var/lib/couchbase. This is formatted as ext4.

      Add both beer sample and game sample. Use 100MB for memory size of default bucket. Leave it as couchbase and change nothing else.

      Add a second server inside the same VPC.

      Click rebalance.
      Rebalance fails part way though.

      Rebalance exited with reason {badmatch,
      {error,

      {failed_nodes,['ns_1@172.31.44.247']}

      }}

      The problem also occurs when there is only a single bucket containing no documents. When publishing a view, the other server becomes unavailable and rebalance fails with the same error.

      This happens in both 3.0.1 community edition and 3.0.3 enterprise edition.

      Immediately after this occurs, CPU load on the down node is very high.
      The culprit is beam.smp.

      Connecting strace to it and it appears to be that it's just trying over and over again to connect to memcached. It appears it connects then gets cut short or something. I end up with thousands of connections like this:

      Literally, 6000+

      tcp 0 0 localhost:36527 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:55519 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:54337 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:32772 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:45226 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:55206 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:33358 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:55473 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:56703 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:38388 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:40668 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:54342 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:58936 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:45226 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:55206 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:33358 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:55473 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:56703 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:38388 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:40668 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:54342 localhost:11209 TIME_WAIT
      tcp 0 0 localhost:58936 localhost:11209 TIME_WAIT

      I've attached a collectdb of when it occurs. (different buckets but same issue).

      I'd like to point out that couchbase is running inside docker on CoreOS on AWS.

      I've upped the ulimits inside the container which are showing:

      core file size (blocks, -c) unlimited
      data seg size (kbytes, -d) unlimited
      scheduling priority (-e) 0
      file size (blocks, -f) unlimited
      pending signals (-i) 29972
      max locked memory (kbytes, -l) unlimited
      max memory size (kbytes, -m) unlimited
      open files (-n) 1048576
      pipe size (512 bytes, -p) 8
      POSIX message queues (bytes, -q) 819200
      real-time priority (-r) 0
      stack size (kbytes, -s) 8192
      cpu time (seconds, -t) unlimited
      max user processes (-u) 1048576
      virtual memory (kbytes, -v) unlimited
      file locks (-x) unlimited

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            traun Traun Leyden (Inactive)
            Matthew Hook Matthew Hook
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty