Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 4.0.0
Affects Version/s: 3.0.1, 3.0.3
Component/s: cloud_marketplace
Security Level: Public
Labels:
- container
- ga
Environment:
couchbase server in Docker on CoreOS under AWS

Triage:
Untriaged
Operating System:
Centos 64-bit
Is this a Regression?:
Unknown

Description

On Amazon EC2

Start up 2 completely fresh couchbase servers from:
https://github.com/couchbaselabs/couchbase-server-coreos

Ensure /var/lib/couchbase is mounted to EBS storage and mapped to couchbase docker container at /opt/couchbase/var/lib/couchbase. This is formatted as ext4.

Add both beer sample and game sample. Use 100MB for memory size of default bucket. Leave it as couchbase and change nothing else.

Add a second server inside the same VPC.

Click rebalance.
Rebalance fails part way though.

Rebalance exited with reason {badmatch,
{error,

{failed_nodes,['ns_1@172.31.44.247']}

}}

The problem also occurs when there is only a single bucket containing no documents. When publishing a view, the other server becomes unavailable and rebalance fails with the same error.

This happens in both 3.0.1 community edition and 3.0.3 enterprise edition.

Immediately after this occurs, CPU load on the down node is very high.
The culprit is beam.smp.

Connecting strace to it and it appears to be that it's just trying over and over again to connect to memcached. It appears it connects then gets cut short or something. I end up with thousands of connections like this:

Literally, 6000+

tcp 0 0 localhost:36527 tcp 0 0 localhost:55519 tcp 0 0 localhost:54337 tcp 0 0 localhost:32772 tcp 0 0 localhost:45226 tcp 0 0 localhost:55206 tcp 0 0 localhost:33358 tcp 0 0 localhost:55473 tcp 0 0 localhost:56703 tcp 0 0 localhost:38388 tcp 0 0 localhost:40668 tcp 0 0 localhost:54342 tcp 0 0 localhost:58936 tcp 0 0 localhost:45226 tcp 0 0 localhost:55206 tcp 0 0 localhost:33358 tcp 0 0 localhost:55473 tcp 0 0 localhost:56703 tcp 0 0 localhost:38388 tcp 0 0 localhost:40668 tcp 0 0 localhost:54342 tcp 0 0 localhost:58936 localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT
localhost:11209 TIME_WAIT

I've attached a collectdb of when it occurs. (different buckets but same issue).

I'd like to point out that couchbase is running inside docker on CoreOS on AWS.

I've upped the ulimits inside the container which are showing:

core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 29972
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1048576
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

collectinfo-2015-05-22T001820-ns_1@172.31.44.247.zip
16.07 MB
21/May/15 8:46 PM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Traun Leyden (Inactive)

Reporter:: Matthew Hook

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 21/May/15 8:46 PM

Updated:: 15/Apr/16 11:40 AM

Resolved:: 15/Apr/16 11:39 AM

Gerrit Reviews

There are no open Gerrit changes

Docker/CoreOS: Rebalance fails with error 'badmatch' on new install. No or very few docuements.

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty