Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.0.2, 7.1.0
Affects Version/s: Cheshire-Cat
Component/s: secondary-index
Labels:
- approved-for-7.0.1
- system-test

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.104.16.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.104.17.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.104.18.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.104.19.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.104.21.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.104.23.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.121.165.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.96.30.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.96.31.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.97.77.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.97.82.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.97.83.zip

Show
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.104.16.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.104.17.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.104.18.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.104.19.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.104.21.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.104.23.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.121.165.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.96.30.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.96.31.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.97.77.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.97.82.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621156109/collectinfo-2021-05-16T090831-ns_1%40172.23.97.83.zip
Story Points:
1
Is this a Regression?:
Unknown

Description

Build : 7.0.0-5169
Test : -test tests/2i/cheshirecat/test_idx_clusterops_cheshire_cat_recovery.yml -scope tests/2i/cheshirecat/scope_idx_cheshire_cat_dgm.yml
Scale : 2
Iteration : 1st

This is the new GSI component test with more recovery steps. After the steady state phase, a rebalance operation is started to add a new indexer node 172.23.96.31 to the cluster. While this rebalance is on, after a few mins, indexer process on 172.23.97.77 is killed. Rebalance fails as expected. This rebalance is automatically retried in a couple of mins. The retried rebalance is hung for about 22 hrs now as 1 index is stuck in moving state.

Details of the index stuck in moving state :

         "bucket" : "bucket2",

         "collection" : "coll_9",

         "completion" : 100,

         "definition" : "CREATE INDEX `idx1_YXvO` ON `bucket2`.`scope_1`.`coll_9`(`country`,(distinct (array ((`r`.`ratings`).`Check in / front desk`) for `r` in `reviews` end)),array_count(`public_likes`),array_count(`reviews`) DESC,`type`,`phone`,`price`,`email`,`address`,`name`,`url`) WITH {  \"defer_build\":true, \"nodes\":[ \"172.23.96.30:8091\",\"172.23.97.77:8091\",\"172.23.97.82:8091\",\"172.23.97.83:8091\" ], \"num_replica\":2 }",

         "defnId" : 11843842764277554498,

         "hosts" : [

            "172.23.96.30:8091",

            "172.23.97.82:8091"

],

         "indexName" : "idx1_YXvO",

         "indexType" : "plasma",

         "instId" : 12561991181710981895,

         "lastScanTime" : "Sun May 16 13:06:50 PDT 2021",

         "name" : "idx1_YXvO",

         "numPartition" : 2,

         "numReplica" : 2,

         "partitionMap" : {

            "172.23.96.30:8091" : [

],

            "172.23.97.82:8091" : [

},

         "partitioned" : false,

         "progress" : 100,

         "replicaId" : 0,

         "scheduled" : false,

         "scope" : "scope_1",

         "secExprs" : [

            "`country`",

            "(distinct (array ((`r`.`ratings`).`Check in / front desk`) for `r` in `reviews` end))",

            "array_count(`public_likes`)",

            "array_count(`reviews`)",

            "`type`",

            "`phone`",

            "`price`",

            "`email`",

            "`address`",

            "`name`",

            "`url`"

],

         "stale" : false,

         "status" : "Moving"

The rebalance was initiated at 2021-05-15T17:36:26. Following is from the test console :

[2021-05-15T17:36:26-07:00, sequoiatools/couchbase-cli:7.0:68fafa] server-add -c 172.23.104.16:8091 --server-add https://172.23.96.31 -u Administrator -p password --server-add-username Administrator --server-add-password password --services index

[2021-05-15T17:36:36-07:00, sequoiatools/couchbase-cli:7.0:6951cf] rebalance -c 172.23.104.16:8091 -u Administrator -p password

[2021-05-15T17:36:41-07:00, sequoiatools/cmd:e19b37] 60

[2021-05-15T17:37:47-07:00, sequoiatools/cmd:622ca9] 300

[pull] vijayviji/sshpass

[2021-05-15T17:43:21-07:00, vijayviji/sshpass:fbd7e7] sshpass -p couchbase ssh -o StrictHostKeyChecking=no root@172.23.97.77 kill -SIGKILL $(pgrep indexer)

→

Error occurred on container - sequoiatools/couchbase-cli:7.0:[rebalance -c 172.23.104.16:8091 -u Administrator -p password]

docker logs 6951cf

docker start 6951cf

*Unable to display progress bar on this os

JERROR: Rebalance failed. See logs for detailed reason. You can try again.

[2021-05-15T17:43:26-07:00, sequoiatools/cmd:cb2101] 420

[2021-05-15T17:50:32-07:00, appropriate/curl:e82955] -s -u Administrator:password 172.23.104.16:8091/pools/default/rebalanceProgress

This issue could be similar to ~~MB-46319~~, but the builds are different, and so are the tests.

Indexer nodes in the cluster : 172.23.121.165, 172.23.96.30, 172.23.96.31, 172.23.97.77, 172.23.97.82, 172.23.97.83

The latest getIndexStatus output is attached. Also, the logs are from ~2 AM on 5/16. Let me know if you need logs from before or after this time.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

cpu.svg
144 kB
27/May/21 9:21 AM
profile001.svg
134 kB
17/May/21 1:41 PM

Issue Links

relates to

MB-48189 Indexer upgrade to golang 1.16.5

Closed

MB-48190 Set environment variable GODEBUG=madvdontneed=1 for indexer

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-46323
#	Subject	Branch	Project	Status	CR	V
154410,2	MB-46323: use jemalloc and golang memorys stats for process memory approximation	unstable	plasma	Status: NEW	+1	0
154582,2	MB-46323: Fix a minor issue in reporting IO usage in aggregated stats	unstable	plasma	Status: MERGED	+2	+1
154751,4	MB-46323 Log MSpanInuse, MSpanSys, StackInuse with memstats	unstable	indexing	Status: MERGED	+2	+1
155679,3	MB-46323 Move indexer and projector go-runtime to version 1.16.5	unstable	indexing	Status: MERGED	+2	+1
157332,1	MB-46323 [BP to 7.0.1] Move indexer and projector go-runtime to version 1.16.5	cheshire-cat	indexing	Status: MERGED	+2	+1
159348,2	MB-46323 reduce channel size for stream reader	unstable	indexing	Status: MERGED	+2	+1
160073,1	Revert "MB-46323 [BP to 7.0.1] Move indexer and projector go-runtime to version 1.16.5"	cheshire-cat	indexing	Status: ABANDONED	0	0
160074,1	Revert "MB-44731 Move indexer and projector go-runtime to version 1.16.5"	unstable	indexing	Status: ABANDONED	+2	+1
160075,2	Revert "MB-44731 [BP to 7.0.1] Move indexer and projector go-runtime to version 1.16.5"	cheshire-cat	indexing	Status: MERGED	+2	+1
160082,2	Revert "MB-46323 reduce channel size for stream reader"	unstable	indexing	Status: MERGED	+2	+1
160331,4	Revert "Revert "MB-46323 reduce channel size for stream reader""	unstable	indexing	Status: MERGED	+2	+1