Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: 7.0.0
Affects Version/s: Cheshire-Cat
Component/s: ns_server
Labels:
Environment:
7.0.0-4454-enterprise

Triage:
Untriaged
Operating System:
Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
Final nodes in cluster logs snapshot:
https://supportal.couchbase.com/snapshot/0dbc436bd5361a9a468e5f954fa962a8::0

Rebalanced_out nodes' logs:
https://supportal.couchbase.com/snapshot/4810a59bfc230bf7e9be20029969bfbe::0
https://supportal.couchbase.com/snapshot/7b3908d6234eec0c778cea5599d82227::0
https://supportal.couchbase.com/snapshot/6725412e94068a676af22e8cd53c2124::0
https://supportal.couchbase.com/snapshot/c8e11601dadf936fd9010adb0659efdc::0

Show
Final nodes in cluster logs snapshot: https://supportal.couchbase.com/snapshot/0dbc436bd5361a9a468e5f954fa962a8::0 Rebalanced_out nodes' logs: https://supportal.couchbase.com/snapshot/4810a59bfc230bf7e9be20029969bfbe::0 https://supportal.couchbase.com/snapshot/7b3908d6234eec0c778cea5599d82227::0 https://supportal.couchbase.com/snapshot/6725412e94068a676af22e8cd53c2124::0 https://supportal.couchbase.com/snapshot/c8e11601dadf936fd9010adb0659efdc::0
Story Points:
1
Is this a Regression?:
Unknown

Description

Build: 7.0.0-4454

Scenario:

Initialize cluster with two nodes (kv, kv+index+n1ql)
Create couchbase bucket with replica=1
Rebalance_in 2 nodes into the cluster with doc cruds in parallel (Success)
Rebalance_in 2 more nodes with doc_cruds (Success)

Final Cluster stats:

+----------------+-----------------+------+------------+------------+----------------------+------------------+

| Node           | Services        | CPU  | Mem_total  | Mem_free   | Swap_mem_used        | Active / Replica |

+----------------+-----------------+------+------------+------------+----------------------+------------------+

| 172.23.105.126 | kv              | 6.43 | 4201627648 | 3425701888 | 1048576 / 3758092288 | 4989 / 5108      |

| 172.23.105.128 | kv              | 6.16 | 4201627648 | 3424706560 | 0 / 3758092288       | 5127 / 4934      |

| 172.23.104.172 | index, kv, n1ql | 11.7 | 3947372544 | 3012943872 | 221184 / 3758092288  | 4982 / 5103      |

| 172.23.105.127 | kv              | 4.88 | 4201627648 | 3397816320 | 0 / 3758092288       | 5075 / 5043      |

| 172.23.105.158 | kv              | 5.79 | 4201631744 | 3393880064 | 0 / 3758092288       | 4936 / 4884      |

| 172.23.104.158 | kv              | 14.8 | 4201676800 | 3443019776 | 1310720 / 3758092288 | 4891 / 4928      |

+----------------+-----------------+------+------------+------------+----------------------+------------------+

Rebalance out all nodes

Observation:

During final rebalance out of all nodes, seeing rebalance failure due to memcached getting killing with exit code 137 with following logs,

Service 'memcached' exited with status 137. Restarting. Messages:WARNING: Logging before InitGoogleLogging() is written to STDERRW0216 00:59:13.516377 22657 HazptrDomain.h:671] Using the default inline executor for asynchronous reclamation may be susceptible to deadlock if the current thread happens to hold a resource needed by the deleter of a reclaimable object

Rebalance failure UI logs:

Node 'ns_1@172.23.104.172' saw that node 'ns_1@172.23.105.158' went down. Details: [{nodedown_reason, connection_closed}]

Node 'ns_1@172.23.104.158' saw that node 'ns_1@172.23.105.158' went down. Details: [{nodedown_reason, connection_closed}]

Rebalance exited with reason shun_failed.

Rebalance Operation Id = 9ebe83ad7194372a38613770f88d57a1

Node 'ns_1@172.23.105.158' is leaving cluster."}

Node 'ns_1@172.23.104.172' saw that node 'ns_1@172.23.105.127' went down. Details: [{nodedown_reason, connection_closed}]

Node 'ns_1@172.23.104.158' saw that node 'ns_1@172.23.105.127' went down. Details: [{nodedown_reason, connection_closed}]

Node 'ns_1@172.23.105.158' saw that node 'ns_1@172.23.105.127' went down. Details: [{nodedown_reason, connection_closed}]

Node 'ns_1@172.23.105.127' is leaving cluster.

Node 'ns_1@172.23.104.172' saw that node 'ns_1@172.23.105.128' went down. Details: [{nodedown_reason, connection_closed}]

Node 'ns_1@172.23.104.158' saw that node 'ns_1@172.23.105.128' went down. Details: [{nodedown_reason, connection_closed}]

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

node_status_after_shun_failed.png
368 kB
18/Feb/21 5:38 AM
test.log
243 kB
16/Feb/21 1:39 AM

Issue Links

duplicates

MB-44272 Rebalance exited with reason shun_failed

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Artem Stemkovski

Reporter:: Ashwin Govindarajulu

Votes:: 0 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 16/Feb/21 1:39 AM

Updated:: 17/Jun/21 2:49 PM

Resolved:: 22/Feb/21 5:51 PM

Gerrit Reviews

There are no open Gerrit changes

Rebalance out fails with reason "shun_failed"

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty