- 8 centos 6.2 64bit server with 4 cores CPU
- Each server has 32 GB RAM and 400 GB SSD disk.
- 24.8 GB RAM for couchbase server at each node
- SSD disk format ext4 on /data
- Each server has its own SSD drive, no disk sharing with other server.
- Create cluster with 6 nodes installed couchbase server 2.0.0-1908
- Link to manifest file http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1908-rel.rpm.manifest.xml
- Cluster has 2 buckets, default (12GB with 2 replica) and saslbucket (12GB with 1 replica).
- Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
- Load 16 million items to default bucket and 20 million items to saslbuckett. Each key has size from 512 bytes to 1024 bytes
- After done loading, wait until initial index. Disable view compaction.
- After initial indexing done, mutate all items with size from 1024 to 1512 bytes.
- Queries all 4 views from 2 docs
- Do swap rebalance, remove node 39, 40 and add node 44, 45.
- At the end of rebalance saslbucket, rebalance exited with timeout on node 43
- Then see a lot of reset connection to mccouch. Updated bug
- Kill all loads pointing to this cluster. Node 43 did not back to stable state.
- beam.smp is running but node 43 still down.
- Kill beam.smp by sigusr1 to create erlang core dump
Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201210/orange-ci-1908-node43-down-erl-hang-20121030.tgz