Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
2.0
-
Security Level: Public
-
None
-
Ubuntu 12.04 LTS ec2 xlarge instances (15GB Memory)
http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_x86_64_2.0.0-1944-rel.deb.manifest.xml
Live clusters:
C1: http://ec2-177-71-167-196.sa-east-1.compute.amazonaws.com:8091/
C2: http://ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com:8091/
biXDCR_bucket: C1 <--> C2
uniXDCR_src: C1 --> C2Ubuntu 12.04 LTS ec2 xlarge instances (15GB Memory) http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_x86_64_2.0.0-1944-rel.deb.manifest.xml Live clusters: C1: http://ec2-177-71-167-196.sa-east-1.compute.amazonaws.com:8091/ C2: http://ec2-122-248-217-156.ap-southeast-1.compute.amazonaws.com:8091/ biXDCR_bucket: C1 <--> C2 uniXDCR_src: C1 --> C2
Description
- Front end loads for biXDCR_bucket on C1 and C2 and for uniXDCR_src on C1, and replication going on
- On C2:
- 3 nodes down: With erl_crash.dump files generated (will be attached)
- 2 nodes with erlang possibly hung, and in pend state. (In top, beam.smp keeps appearing and disappearing using up 1.0G of resident memory, but no cores generated, no erl_crash.dump files, memcached seems to be still running)
- Unable to grab diags off any of these nodes.
- Result - All items in biXDCR_bucket on C2 lost .
- Half the items in uniXDCR_dest on C2 lost.
Noticed a whole bunch of these crash reports on one of the "Pending" nodes on C2:
-
- Reason for termination ==
- {noproc,
{gen_server,call,
[remote_clusters_info,
Unknown macro: {get_remote_bucket, [{hostname, "ec2-177-71-147-19.sa-east-1.compute.amazonaws.com:8091"}, {uuid,<<"0b3a63d5d8805e0c6670c619cc346299">>}, {name,"SANPAULO (C2)"},
{username,"Administrator"},
{password,"password"}],
"biXDCR_bucket",false,30000},
infinity]}}
[error_logger:error,2012-11-07T5:57:56.025,ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: xdc_vbucket_rep:init/1
pid: <0.28161.8>
registered_name: []
exception exit: {noproc,
{gen_server,call,
[remote_clusters_info,
{get_remote_bucket,
[{hostname, "ec2-177-71-147-19.sa-east-1.compute.amazonaws.com:8091"},
{uuid, <<"0b3a63d5d8805e0c6670c619cc346299">>},
{name,"SANPAULO (C2)"}, {username,"Administrator"}, {password,"password"}], "biXDCR_bucket",false,30000},
infinity]}}
in function gen_server:terminate/6
ancestors: [<0.3608.5>,<0.3603.5>,xdc_replication_sup,ns_server_sup,
ns_server_cluster_sup,<0.64.0>]
messages: []
links: [<0.3608.5>]
dictionary: []
trap_exit: true
status: running
heap_size: 514229
stack_size: 24
reductions: 35035
neighbours:
-
- Reason for termination ==
- killed
[error_logger:error,2012-11-07T5:58:41.704,ns_1@ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: couch_db:init/1
pid: <0.19405.4>
registered_name: []
exception exit: killed
in function gen_server:terminate/6
ancestors: [couch_server,couch_primary_services,couch_server_sup,
cb_couch_sup,ns_server_cluster_sup,<0.64.0>]
messages: []
links: []
dictionary: []
trap_exit: true
status: running
heap_size: 1597
stack_size: 24
reductions: 11968
neighbours:
Attached are the grabbed diags from one of the non-down nodes on C2.