Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
Cheshire-Cat
-
Untriaged
-
Centos 64-bit
-
-
1
-
Yes
Description
System Test:
Eventing handlers deployment{
[2021-03-17T16:06:53-07:00, sequoiatools/eventing:7.0:a87aaf] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s default.event_0.coll0 -m ITEM.event_0.coll0 -d dst_bucket.NEW_ORDER.event_0.coll0.rw -t timers -o create --name timers
|
[2021-03-17T16:07:01-07:00, sequoiatools/eventing:7.0:dfee66] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s default.event_0.coll0 -m ITEM.event_0.coll1 -d dst_bucket.NEW_ORDER.event_0.coll1.rw -t n1ql -o create --name n1ql
|
[2021-03-17T16:07:08-07:00, sequoiatools/eventing:7.0:dd9a54] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s WAREHOUSE.event_0.coll0 -m ITEM.event_0.coll2 -d dst_bucket.WAREHOUSE.event_0.coll0.rw -t sbm -o create --name sbm
|
[2021-03-17T16:07:17-07:00, sequoiatools/eventing:7.0:ef5686] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s WAREHOUSE.event_0.coll0 -m ITEM.event_0.coll3 -d dst_bucket.NEW_ORDER.event_0.coll2.rw -t curl -o create --name curl
|
[2021-03-17T16:07:25-07:00, sequoiatools/eventing:7.0:b3959c] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -o deploy
|
[2021-03-17T16:07:30-07:00, sequoiatools/eventing:7.0:d2d34d] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -o wait_for_state --state deployed
|
– At this point of time there should be not data in collections.
Current step
[2021-03-17T21:13:59-07:00, sequoiatools/couchbase-cli:7.0:e63846] server-add -c 172.23.104.232:8091 --server-add https://172.23.104.244 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
|
[2021-03-17T21:14:11-07:00, sequoiatools/couchbase-cli:7.0:d5525c] rebalance -c 172.23.104.232:8091 --server-remove 172.23.105.25 -u Administrator -p password
|
→
|
|
Error occurred on container - sequoiatools/couchbase-cli:7.0:[rebalance -c 172.23.104.232:8091 --server-remove 172.23.105.25 -u Administrator -p password]
|
|
docker logs d5525c
|
docker start d5525c
|
|
*Unable to display progress bar on this os
|
JERROR: Rebalance failed. See logs for detailed reason. You can try again.
|
Rebalance Failed -
Rebalance exited with reason {service_rebalance_failed,eventing,
|
{worker_died,
|
{'EXIT',<0.1247.496>,
|
{{badmatch,
|
{error,
|
{bad_nodes,eventing,prepare_rebalance,
|
[{'ns_1@172.23.104.214',
|
{error,
|
{unknown_error,
|
<<"Some apps are deploying or resuming on nodeId: d0c98164b79fbb3e57b4808d7c71ef3b Apps: map[timers_0:2021-03-17 16:07:27.988668711 -0700 PDT m=+2440.940195591]">>}}}]}}},
|
[{service_rebalancer,rebalance_worker,1,
|
[{file,"src/service_rebalancer.erl"},
|
{line,164}]},
|
{proc_lib,init_p,3,
|
[{file,"proc_lib.erl"},{line,234}]}]}}}}.
|
Rebalance Operation Id = 6c309e9434208e5789108f241ad43a4b
|
– All 4 handlers above are undeployed.
Attachments
Issue Links
- relates to
-
MB-45722 Remove the 'replace' directives for gocb/gocbcore in go.mod
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Description |
System Test:
Current step {noformat} [2021-03-17T21:13:59-07:00, sequoiatools/couchbase-cli:7.0:e63846] server-add -c 172.23.104.232:8091 --server-add https://172.23.104.244 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data [2021-03-17T21:14:11-07:00, sequoiatools/couchbase-cli:7.0:d5525c] rebalance -c 172.23.104.232:8091 --server-remove 172.23.105.25 -u Administrator -p password → Error occurred on container - sequoiatools/couchbase-cli:7.0:[rebalance -c 172.23.104.232:8091 --server-remove 172.23.105.25 -u Administrator -p password] docker logs d5525c docker start d5525c *Unable to display progress bar on this os JERROR: Rebalance failed. See logs for detailed reason. You can try again. {noformat} Rebalance Failed - {noformat} Rebalance exited with reason {service_rebalance_failed,eventing, {worker_died, {'EXIT',<0.1247.496>, {{badmatch, {error, {bad_nodes,eventing,prepare_rebalance, [{'ns_1@172.23.104.214', {error, {unknown_error, <<"Some apps are deploying or resuming on nodeId: d0c98164b79fbb3e57b4808d7c71ef3b Apps: map[timers_0:2021-03-17 16:07:27.988668711 -0700 PDT m=+2440.940195591]">>}}}]}}}, [{service_rebalancer,rebalance_worker,1, [{file,"src/service_rebalancer.erl"}, {line,164}]}, {proc_lib,init_p,3, [{file,"proc_lib.erl"},{line,234}]}]}}}}. Rebalance Operation Id = 6c309e9434208e5789108f241ad43a4b {noformat} |
System Test:
Eventing handlers deployment{ {noformat} [2021-03-17T16:06:53-07:00, sequoiatools/eventing:7.0:a87aaf] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s default.event_0.coll0 -m ITEM.event_0.coll0 -d dst_bucket.NEW_ORDER.event_0.coll0.rw -t timers -o create --name timers [2021-03-17T16:07:01-07:00, sequoiatools/eventing:7.0:dfee66] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s default.event_0.coll0 -m ITEM.event_0.coll1 -d dst_bucket.NEW_ORDER.event_0.coll1.rw -t n1ql -o create --name n1ql [2021-03-17T16:07:08-07:00, sequoiatools/eventing:7.0:dd9a54] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s WAREHOUSE.event_0.coll0 -m ITEM.event_0.coll2 -d dst_bucket.WAREHOUSE.event_0.coll0.rw -t sbm -o create --name sbm [2021-03-17T16:07:17-07:00, sequoiatools/eventing:7.0:ef5686] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s WAREHOUSE.event_0.coll0 -m ITEM.event_0.coll3 -d dst_bucket.NEW_ORDER.event_0.coll2.rw -t curl -o create --name curl [2021-03-17T16:07:25-07:00, sequoiatools/eventing:7.0:b3959c] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -o deploy [2021-03-17T16:07:30-07:00, sequoiatools/eventing:7.0:d2d34d] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -o wait_for_state --state deployed {noformat} -- At this point of time there should be not data in collections. Current step {noformat} [2021-03-17T21:13:59-07:00, sequoiatools/couchbase-cli:7.0:e63846] server-add -c 172.23.104.232:8091 --server-add https://172.23.104.244 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data [2021-03-17T21:14:11-07:00, sequoiatools/couchbase-cli:7.0:d5525c] rebalance -c 172.23.104.232:8091 --server-remove 172.23.105.25 -u Administrator -p password → Error occurred on container - sequoiatools/couchbase-cli:7.0:[rebalance -c 172.23.104.232:8091 --server-remove 172.23.105.25 -u Administrator -p password] docker logs d5525c docker start d5525c *Unable to display progress bar on this os JERROR: Rebalance failed. See logs for detailed reason. You can try again. {noformat} Rebalance Failed - {noformat} Rebalance exited with reason {service_rebalance_failed,eventing, {worker_died, {'EXIT',<0.1247.496>, {{badmatch, {error, {bad_nodes,eventing,prepare_rebalance, [{'ns_1@172.23.104.214', {error, {unknown_error, <<"Some apps are deploying or resuming on nodeId: d0c98164b79fbb3e57b4808d7c71ef3b Apps: map[timers_0:2021-03-17 16:07:27.988668711 -0700 PDT m=+2440.940195591]">>}}}]}}}, [{service_rebalancer,rebalance_worker,1, [{file,"src/service_rebalancer.erl"}, {line,164}]}, {proc_lib,init_p,3, [{file,"proc_lib.erl"},{line,234}]}]}}}}. Rebalance Operation Id = 6c309e9434208e5789108f241ad43a4b {noformat} -- All 4 handlers above are not undeployed. |
Description |
System Test:
Eventing handlers deployment{ {noformat} [2021-03-17T16:06:53-07:00, sequoiatools/eventing:7.0:a87aaf] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s default.event_0.coll0 -m ITEM.event_0.coll0 -d dst_bucket.NEW_ORDER.event_0.coll0.rw -t timers -o create --name timers [2021-03-17T16:07:01-07:00, sequoiatools/eventing:7.0:dfee66] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s default.event_0.coll0 -m ITEM.event_0.coll1 -d dst_bucket.NEW_ORDER.event_0.coll1.rw -t n1ql -o create --name n1ql [2021-03-17T16:07:08-07:00, sequoiatools/eventing:7.0:dd9a54] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s WAREHOUSE.event_0.coll0 -m ITEM.event_0.coll2 -d dst_bucket.WAREHOUSE.event_0.coll0.rw -t sbm -o create --name sbm [2021-03-17T16:07:17-07:00, sequoiatools/eventing:7.0:ef5686] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s WAREHOUSE.event_0.coll0 -m ITEM.event_0.coll3 -d dst_bucket.NEW_ORDER.event_0.coll2.rw -t curl -o create --name curl [2021-03-17T16:07:25-07:00, sequoiatools/eventing:7.0:b3959c] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -o deploy [2021-03-17T16:07:30-07:00, sequoiatools/eventing:7.0:d2d34d] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -o wait_for_state --state deployed {noformat} -- At this point of time there should be not data in collections. Current step {noformat} [2021-03-17T21:13:59-07:00, sequoiatools/couchbase-cli:7.0:e63846] server-add -c 172.23.104.232:8091 --server-add https://172.23.104.244 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data [2021-03-17T21:14:11-07:00, sequoiatools/couchbase-cli:7.0:d5525c] rebalance -c 172.23.104.232:8091 --server-remove 172.23.105.25 -u Administrator -p password → Error occurred on container - sequoiatools/couchbase-cli:7.0:[rebalance -c 172.23.104.232:8091 --server-remove 172.23.105.25 -u Administrator -p password] docker logs d5525c docker start d5525c *Unable to display progress bar on this os JERROR: Rebalance failed. See logs for detailed reason. You can try again. {noformat} Rebalance Failed - {noformat} Rebalance exited with reason {service_rebalance_failed,eventing, {worker_died, {'EXIT',<0.1247.496>, {{badmatch, {error, {bad_nodes,eventing,prepare_rebalance, [{'ns_1@172.23.104.214', {error, {unknown_error, <<"Some apps are deploying or resuming on nodeId: d0c98164b79fbb3e57b4808d7c71ef3b Apps: map[timers_0:2021-03-17 16:07:27.988668711 -0700 PDT m=+2440.940195591]">>}}}]}}}, [{service_rebalancer,rebalance_worker,1, [{file,"src/service_rebalancer.erl"}, {line,164}]}, {proc_lib,init_p,3, [{file,"proc_lib.erl"},{line,234}]}]}}}}. Rebalance Operation Id = 6c309e9434208e5789108f241ad43a4b {noformat} -- All 4 handlers above are not undeployed. |
System Test:
Eventing handlers deployment{ {noformat} [2021-03-17T16:06:53-07:00, sequoiatools/eventing:7.0:a87aaf] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s default.event_0.coll0 -m ITEM.event_0.coll0 -d dst_bucket.NEW_ORDER.event_0.coll0.rw -t timers -o create --name timers [2021-03-17T16:07:01-07:00, sequoiatools/eventing:7.0:dfee66] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s default.event_0.coll0 -m ITEM.event_0.coll1 -d dst_bucket.NEW_ORDER.event_0.coll1.rw -t n1ql -o create --name n1ql [2021-03-17T16:07:08-07:00, sequoiatools/eventing:7.0:dd9a54] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s WAREHOUSE.event_0.coll0 -m ITEM.event_0.coll2 -d dst_bucket.WAREHOUSE.event_0.coll0.rw -t sbm -o create --name sbm [2021-03-17T16:07:17-07:00, sequoiatools/eventing:7.0:ef5686] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -s WAREHOUSE.event_0.coll0 -m ITEM.event_0.coll3 -d dst_bucket.NEW_ORDER.event_0.coll2.rw -t curl -o create --name curl [2021-03-17T16:07:25-07:00, sequoiatools/eventing:7.0:b3959c] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -o deploy [2021-03-17T16:07:30-07:00, sequoiatools/eventing:7.0:d2d34d] eventing_helper.py -i 172.23.105.183 -u Administrator -p password -o wait_for_state --state deployed {noformat} -- At this point of time there should be not data in collections. Current step {noformat} [2021-03-17T21:13:59-07:00, sequoiatools/couchbase-cli:7.0:e63846] server-add -c 172.23.104.232:8091 --server-add https://172.23.104.244 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data [2021-03-17T21:14:11-07:00, sequoiatools/couchbase-cli:7.0:d5525c] rebalance -c 172.23.104.232:8091 --server-remove 172.23.105.25 -u Administrator -p password → Error occurred on container - sequoiatools/couchbase-cli:7.0:[rebalance -c 172.23.104.232:8091 --server-remove 172.23.105.25 -u Administrator -p password] docker logs d5525c docker start d5525c *Unable to display progress bar on this os JERROR: Rebalance failed. See logs for detailed reason. You can try again. {noformat} Rebalance Failed - {noformat} Rebalance exited with reason {service_rebalance_failed,eventing, {worker_died, {'EXIT',<0.1247.496>, {{badmatch, {error, {bad_nodes,eventing,prepare_rebalance, [{'ns_1@172.23.104.214', {error, {unknown_error, <<"Some apps are deploying or resuming on nodeId: d0c98164b79fbb3e57b4808d7c71ef3b Apps: map[timers_0:2021-03-17 16:07:27.988668711 -0700 PDT m=+2440.940195591]">>}}}]}}}, [{service_rebalancer,rebalance_worker,1, [{file,"src/service_rebalancer.erl"}, {line,164}]}, {proc_lib,init_p,3, [{file,"proc_lib.erl"},{line,234}]}]}}}}. Rebalance Operation Id = 6c309e9434208e5789108f241ad43a4b {noformat} -- All 4 handlers above are undeployed. |
Assignee | Jeelan Poola [ jeelan.poola ] | Ankit Prabhu [ ankit.prabhu ] |
Attachment | eventing_pprof.log [ 131500 ] | |
Attachment | goroutine5.out [ 131501 ] |
Assignee | Ankit Prabhu [ ankit.prabhu ] | Brett Lawson [ brett19 ] |
Component/s | clients [ 10042 ] | |
Component/s | eventing [ 14026 ] |
Assignee | Brett Lawson [ brett19 ] | Vikas Chaudhary [ vikas.chaudhary ] |
Assignee | Vikas Chaudhary [ vikas.chaudhary ] | Charles Dixon [ charles.dixon ] |
Assignee | Charles Dixon [ charles.dixon ] | Ankit Prabhu [ ankit.prabhu ] |
Priority | Critical [ 2 ] | Blocker [ 1 ] |
Priority | Blocker [ 1 ] | Test Blocker [ 6 ] |
Labels | system-test | affects-cc-testing system-test |
Assignee | Ankit Prabhu [ ankit.prabhu ] | Vikas Chaudhary [ vikas.chaudhary ] |
Assignee | Vikas Chaudhary [ vikas.chaudhary ] | Ankit Prabhu [ ankit.prabhu ] |
Assignee | Ankit Prabhu [ ankit.prabhu ] | Charles Dixon [ charles.dixon ] |
Labels | affects-cc-testing system-test | affects-cc-testing functional-test system-test |
Labels | affects-cc-testing functional-test system-test | affects-cc-testing functional-test performance system-test |
Assignee | Charles Dixon [ charles.dixon ] | Ankit Prabhu [ ankit.prabhu ] |
Priority | Test Blocker [ 6 ] | Critical [ 2 ] |
Assignee | Ankit Prabhu [ ankit.prabhu ] | Charles Dixon [ charles.dixon ] |
Assignee | Charles Dixon [ charles.dixon ] | Jeelan Poola [ jeelan.poola ] |
Assignee | Jeelan Poola [ jeelan.poola ] | Brett Lawson [ brett19 ] |
Assignee | Brett Lawson [ brett19 ] | Ankit Prabhu [ ankit.prabhu ] |
Assignee | Ankit Prabhu [ ankit.prabhu ] | Vikas Chaudhary [ vikas.chaudhary ] |
Assignee | Vikas Chaudhary [ vikas.chaudhary ] | Ankit Prabhu [ ankit.prabhu ] |
Assignee | Ankit Prabhu [ ankit.prabhu ] | Pablo Silberkasten [ JIRAUSER25235 ] |
Assignee | Pablo Silberkasten [ JIRAUSER25235 ] | Arunkumar Senthilnathan [ arunkumar ] |
Resolution | Fixed [ 1 ] | |
Status | Open [ 1 ] | Closed [ 6 ] |
Fix Version/s | 7.0.0 [ 17233 ] |
Fix Version/s | Cheshire-Cat [ 15915 ] |
timer_0 function is stuck in bootstrapping.
Looking at eventing logs it is trying to open the connection with the metadata bucket using gocb but its receiving timeout in WaitUntilReady operation.
1793:2021-03-17T16:08:38.945-07:00 [Error] Consumer::gocbConnectMetaBucketCallback [worker_timers_0_0:1] Failed to connect to metadata bucket ITEM (bucket got deleted?) , err: unambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"unambiguous timeout"}},"OperationID":"WaitUntilReady","Opaque":"","TimeObserved":5000155810,"RetryReasons":["NOT_READY"],"RetryAttempts":10,"LastDispatchedTo":"","LastDispatchedFrom":"","LastConnectionID":""}
Looking at the pprof looks like its stuck in closing the connection which blocked deployment of the function.
1 @ 0x93b320 0x90fb48 0x90fb1e 0x90f80b 0xd5ad86 0xd60d4b 0xdd5f42 0xdd75f4 0x1142c06 0xe412ba 0x11312f6 0xe656e3 0x969351
# 0xd5ad85 github.com/couchbase/gocbcore/v9.(*Agent).Close+0xc5 /tmp/workspace/toy-unix-simple/godeps/src/github.com/couchbase/gocbcore/v9/agent.go:495
# 0xd60d4a github.com/couchbase/gocbcore/v9.(*AgentGroup).Close+0x11a /tmp/workspace/toy-unix-simple/godeps/src/github.com/couchbase/gocbcore/v9/agentgroup.go:115
# 0xdd5f41 github.com/couchbase/gocb/v2.(*stdConnectionMgr).close+0x141 /tmp/workspace/toy-unix-simple/godeps/src/github.com/couchbase/gocb/v2/client.go:262
# 0xdd75f3 github.com/couchbase/gocb/v2.(*Cluster).Close+0xf3 /tmp/workspace/toy-unix-simple/godeps/src/github.com/couchbase/gocb/v2/cluster.go:379
# 0x1142c05 github.com/couchbase/eventing/consumer.glob..func2+0x4d5 /tmp/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing/consumer/bucket_ops.go:89
# 0xe412b9 github.com/couchbase/eventing/util.Retry+0x129 /tmp/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing/util/retry.go:65
# 0x11312f5 github.com/couchbase/eventing/consumer.(*Consumer).Serve+0x5d5 /tmp/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing/consumer/v8_consumer.go:196
# 0xe656e2 github.com/couchbase/eventing/suptree.(*Supervisor).runService.func1+0x72 /tmp/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing/suptree/supervisor.go:413
There is no more message after 16:08:38 so it looks like its stuck from that period.
goroutine5.out
https://github.com/couchbase/gocbcore/blob/e48d03a40861100a01753e1277952abaa0bce343/agent.go#L495
Could someone from gocb team take a look and check why its stuck in closing the connection and also the timeout?
goroutine dump: eventing_pprof.log