Description
Splitting this off as a separate issue from https://issues.couchbase.com/browse/MB-18934
Using cluster_run dev OSX environment, 3 nodes, using Aruna's script...
./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,index_retry=10,GROUP=P0 -t fts.moving_topology_fts.MovingTopFTS.rebalance_out_during_index_building,items=30000,cluster=D,F,F,index_replicas=1,standard_buckets=2,sasl_buckets=2,GROUP=P0
|
Finally, I did see FTS rebalance get into a seeming "stuck" state, where it could no longer connect to memcached...
2016-03-29T14:51:21.659-07:00 [INFO] feed_dcp: OnError, name: sasl_bucket_1_index_1_3d56f8785bd9ef60: bucketName: sasl_bucket_1, bucketUUID: , err: worker connect, server: 10.17.6.158:12000, err: dial tcp 10.17.6.158:12000: getsockopt: operation timed out
|
2016-03-29T14:51:21.659-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: worker connect, server: 10.17.6.158:12000, err: dial tcp 10.17.6.158:12000: getsockopt: operation timed out
|
2016-03-29T14:51:23.739-07:00 [INFO] feed_dcp: OnError, name: sasl_bucket_1_index_1_3d56f8785bd9ef60: bucketName: sasl_bucket_1, bucketUUID: , err: worker connect, server: 10.17.6.158:12000, err: dial tcp 10.17.6.158:12000: getsockopt: operation timed out
|
2016-03-29T14:51:23.739-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: worker connect, server: 10.17.6.158:12000, err: dial tcp 10.17.6.158:12000: getsockopt: operation timed out
|
RELATED, after it gets into this state of "stuck rebalance", doing a manual telnet to port 12000 also fails.
Attachments
Issue Links
- relates to
-
MB-18934 [FTS] MCP: Rebalance fails due to cbft getting killed by OOM killer
- Closed