Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 6.5.0
Affects Version/s: 6.5.0
Component/s: couchbase-bucket
Labels:
- releasenote
Environment:
6.5.0-3883-enterprise

Triage:
Triaged
Operating System:
Centos 64-bit
Is this a Regression?:
Yes
Sprint:
KV-Engine MH Beta part 2

Description

Script to Repro

./testrunner -i /tmp/testexec.3494.ini -p get-cbcollect-info=True,flusher_batch_split_trigger=10 -t rebalance.rebalance_high_ops_pillowfight.RebalanceHighOpsWithPillowFight.test_graceful_failover_addback,node_out=3,replicas=2,nodes_init=4,items=2000000,batch_size=1000,rate_limit=100000,recovery_type=delta,instances=2,threads=5,loader=high_ops,flusher_batch_split_trigger=1

Steps

Create a 4 node cluster with 2 replicas, set flusher_batch_split_trigger=1
Do a dataload with high ops dataloader
Gracefully failover a node.
Start high ops dataloader again.
do a delta recovery.
Start a Rebalance again.

Rebalance fails as shown below.

{u'node': u'ns_1@172.23.105.105', u'code': 0, u'text': u'Rebalance exited with reason {{badmatch,\n                                  {error,\n                                      {failed_nodes,[\'ns_1@172.23.105.47\']}}},\n                              [{ns_janitor,cleanup_apply_config_body,4,\n                                   [{file,"src/ns_janitor.erl"},{line,286}]},\n                               {ns_janitor,\'-cleanup_apply_config/4-fun-0-\',\n                                   4,\n                                   [{file,"src/ns_janitor.erl"},{line,209}]},\n                               {async,\'-async_init/4-fun-2-\',3,\n                                   [{file,"src/async.erl"},{line,211}]}]}.\nRebalance Operation Id = 28ffeff813a1d2e394ea0f10d72cbccf', u'shortText': u'message', u'serverTime': u'2019-07-27T23:42:38.878Z', u'module': u'ns_orchestrator', u'tstamp': 1564296158878, u'type': u'critical'}

[2019-07-27 23:42:48,906] - [rest_client:3250] ERROR - {u'node': u'ns_1@172.23.105.47', u'code': 0, u'text': u'Control connection to memcached on \'ns_1@172.23.105.47\' disconnected: {lost_connection,\n                                                                       [{ns_memcached,\n                                                                         worker_loop,\n                                                                         3,\n                                                                         [{file,\n                                                                           "src/ns_memcached.erl"},\n                                                                          {line,\n                                                                           231}]},\n                                                                        {proc_lib,\n                                                                         init_p_do_apply,\n                                                                         3,\n                                                                         [{file,\n                                                                           "proc_lib.erl"},\n                                                                          {line,\n                                                                           247}]}]}', u'shortText': u'message', u'serverTime': u'2019-07-27T23:42:38.844Z', u'module': u'ns_memcached', u'tstamp': 1564296158844, u'type': u'info'}

I also see a memcached crash on 172.23.105.47.

 {u'node': u'ns_1@172.23.105.47', u'code': 0, u'text': u"Service 'memcached' exited with status 134. Restarting. Messages:\n2019-07-27T23:42:38.784342-07:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f4c6e1e2000+0x8f213]\n2019-07-27T23:42:38.784356-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f4c68f6c000+0x70842]\n2019-07-27T23:42:38.784366-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f4c68f6c000+0xee6eb]\n2019-07-27T23:42:38.784378-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f4c68f6c000+0x13ca45]\n2019-07-27T23:42:38.784392-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f4c68f6c000+0x13cf0d]\n2019-07-27T23:42:38.784399-07:00 CRITICAL     /opt/couchbase/bin/../lib/../lib/ep.so() [0x7f4c68f6c000+0x1362ef]\n2019-07-27T23:42:38.784404-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f4c7007d000+0x8f27]\n2019-07-27T23:42:38.784410-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7f4c6daad000+0x7dd5]\n2019-07-27T23:42:38.784443-07:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7f4c6d6e0000+0xfdead]\n[*** LOG ERROR ***] [2019-07-27 23:42:38] [spdlog_file_logger] async log: thread pool doesn't exist anymore", u'shortText': u'message', u'serverTime': u'2019-07-27T23:42:38.838Z', u'module': u'ns_log', u'tstamp': 1564296158838, u'type': u'info'}

cbcollect_info attached from all the nodes in the cluster.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

test_9.zip
74.54 MB
29/Jul/19 11:49 PM

Issue Links

is cloned by

MB-35408 Re-implement fix for MB-35326, cleaner KVStore API changes

Closed

is duplicated by

MB-35334 Snapshot range invariant (snapshot_range_t(a,b) requires start <= end) broken after crash/restart

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Balakumaran Gopal

Reporter:: Balakumaran Gopal

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 29/Jul/19 11:51 PM

Updated:: 08/Aug/19 11:45 PM

Resolved:: 02/Aug/19 9:48 AM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

MB-35326: Reset cached vbucket_state on VBucket creation: Gerrit Review:

Rebalance of delta recovered node fails: snapshot_range_t(a,b) requires start <= end

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty