Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
7.2.0
-
7.2.0-5285
-
Untriaged
-
Centos 64-bit
-
-
0
-
Unknown
Description
Note:
While debugging MB-56318 Donald Haggart noticed system was hung because of
goroutine 920054 [select, 137 minutes]:
|
net/http.(*persistConn).roundTrip(0xc021666240, 0xc014b47f00)
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.19.7/go/src/net/http/transport.go:2620 +0x974
|
net/http.(*Transport).roundTrip(0x4046ac0, 0xc006ce5e00)
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.19.7/go/src/net/http/transport.go:595 +0x7ba
|
net/http.(*Transport).RoundTrip(0xeeafdf?, 0x0?)
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.19.7/go/src/net/http/roundtrip.go:17 +0x19
|
github.com/couchbase/cbauth.(*cbauthRoundTripper).RoundTrip(0xc00039c1c0, 0xc006ce5d00)
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/convenience.go:76 +0x43f
|
net/http.send(0xc006ce5d00, {0x2d31b60, 0xc00039c1c0}, {0x26415c0?, 0x1?, 0x0?})
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.19.7/go/src/net/http/client.go:251 +0x5f7
|
net/http.(*Client).send(0xc000190d50, 0xc006ce5d00, {0x0?, 0xeeafdf?, 0x0?})
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.19.7/go/src/net/http/client.go:175 +0x9b
|
net/http.(*Client).do(0xc000190d50, 0xc006ce5d00)
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.19.7/go/src/net/http/client.go:715 +0x8fc
|
net/http.(*Client).Do(...)
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.19.7/go/src/net/http/client.go:581
|
github.com/couchbase/cbauth/metakv.doCallInner(0xc0003948f0, {0x2706758, 0x3}, {0x272d225?, 0x40?}, 0x0)
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/metakv/metakv.go:102 +0x30e
|
github.com/couchbase/cbauth/metakv.doCall(0xc00291b360?, {0x2706758?, 0xeeb327?}, {0x272d225?, 0x2553ee0?}, 0x261c201?)
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/metakv/metakv.go:119 +0x65
|
github.com/couchbase/cbauth/metakv.doJSONCall(0xc02c9a7470?, {0x2706758?, 0xc004a42a60?}, {0x272d225?, 0xffffffffffffffff?}, 0x7b?, {0x22f94e0, 0xc003a7caf0})
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/metakv/metakv.go:130 +0x2d
|
github.com/couchbase/cbauth/metakv.(*store).get(0xc00291b4a0?, {0x272d225, 0x19})
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/metakv/metakv.go:171 +0xa9
|
github.com/couchbase/cbauth/metakv.Get(...)
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/metakv/metakv.go:333
|
github.com/couchbase/indexing/secondary/common.GetSettingsConfig.func1(0xf33469?, {0xc00291b540?, 0xfd098a?})
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/common/settings.go:37 +0x19e
|
github.com/couchbase/indexing/secondary/common.(*RetryHelper).Run(0xc00291b570)
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/common/retry_helper.go:36 +0x83
|
github.com/couchbase/indexing/secondary/common.GetSettingsConfig(0x2724553?)
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/common/settings.go:49 +0x90
|
github.com/couchbase/indexing/secondary/queryport/n1ql.NewGSIIndexer2({0xc0150840f0, 0x15}, {0x270c095, 0x7}, {0xc02333b728, 0xf}, {0xc01a379068, 0x8}, {0xc015084078, 0x11}, ...)
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/queryport/n1ql/secondary_index.go:170 +0x3e9
|
github.com/couchbase/query/datastore/couchbase.(*collection).loadIndexes(0xc031853e00)
|
/home/couchbase/jenkins/workspace/couchbase-server-
|
which has been holding the collection's lock for 137 minutes. (Mutex is locked at the start of loadIndexes().)
Based on Donald Haggart suggestion, logging a seprate bug
Steps:
- Create a 2 KV and 1 index/query node cluster.
- Create a magma bucket(replicas=1) and collections(total collection count including default collections is 51)
- Create 500000000 items sequentially(After creation of few thousands of documents update
- Update 500000000 created in above step
- Create 500000000 items sequentially
- Update 500000000 created in above step
- Create five indexes Wait for index building.
- Rebalance in KV with Loading of docs. (Rebalance completed successfully)
- Rebalance Out KV with Loading of docs.(Rebalance completed successfully)
- Rebalance In_Out KV with Loading of docs.
- Pause the rebalance and Enable CDC bucket_history_retention_seconds=259200,bucket_history_retention_bytes=10000000000000)
- Again trigger rebalance in_out KV with loading of docs (Rebalance completed successfully)
- Gracefull failover a node , Add a node and trigger rebalance(A swap rebalance)
- Rebalance exited with reason {service_rebalance_failed,index,
{agent_died,<34340.5783.0>,
QE-TEST:
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/ankush_temp_job3.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t aGoodDoctor.Hospital.Murphy.ClusterOpsVolume,nodes_init=2,graceful=True,skip_cleanup=True,num_items=50000000,num_buckets=1,bucket_names=GleamBook,doc_size=1024,bucket_type=membase,eviction_policy=fullEviction,iterations=5,batch_size=1000,sdk_timeout=60,log_level=info,infra_log_level=error,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=False,randomize_value=True,assert_crashes_on_load=True,num_collections=50,maxttl=10,num_indexes=5,pc=10,index_nodes=1,cbas_nodes=0,fts_nodes=0,ops_rate=200000,ramQuota=102400,doc_ops=create:update:delete:read,mutation_perc=100,rebl_ops_rate=30000,key_type=RandomKey -m rest'
|