Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.2.5, 7.6.1
-
1.0.0-2203
-
Untriaged
-
0
-
Unknown
-
Analytics Sprint 46, Analytics Sprint 47
Description
The cluster had run through one full cycle of the system test. It's a 4-node cluster (4 vcpus + 64 GB) ingested about 10 billion items
Workload -
Type | Number of collections | Number of items in millions | Total count in millions |
---|---|---|---|
Remote | 80 | 75 | 6000 |
Standalone | 50 | 8 | 4000* |
Kafka | 5 | 10 | 50 |
*Some standalone collections have 8 mil and some have multiples of 8 million items. The total doc count is 4000 million ( 4 billion) items.
Number of links = 6 ( 2 remote + 2 external + 2 kafka). 1 remote link and 1 kafka link is active.
It went through scaling operations. From 4 to 8 to 16 to 32 back to 8 to 4 nodes.
Second cycle would repeat the same workload
But remote ingestion is very slow. It's been almost 18 hours and ingestion is not complete. In comparison, during the first cycle, remote ingestion was completed in around 6 to 8 hours.
There are still a bunch of datasets where ingestion is not complete -
some examples
Database0cFsFELXI.scope0NPwGeHgC.remotedatasetCuxGntPc = 52928724 |
Database0cFsFELXI.scope0NPwGeHgC.remotedatasetSdKQaBRi = 52934946 |
Database0cFsFELXI.scope0NPwGeHgC.remotedatasetStiHRVEF = 52941446 |
Before creating the second batch of collections, the link was disconnected, then all the remote datasets were created and then link was reconnected.
I see messages like these -
on node 006
"entityId":"linkIUWEhdXs/default1", "state":"STARTING", "prev state":"STOPPED", "suspended":false}) |
2024-07-10T18:01:12.523+00:00 |
|
"entityId":"linkIUWEhdXs/default1", "state":"STARTING", "prev state":"STOPPED", "suspended":false}) |
2024-07-10T17:52:55.357+00:00 INFO CBAS.adapter.CouchbaseConnector [cbas:linkIUWEhdXs:default1:f8bbb0059527fb8c59160733f2baae59:0 idle connection watchdog] will notify CC on idle streams after 120 seconds |
|
Unsure if this indicates any problems. Also, the /analytics/status/ingestion API would throw such responses intermittently -
{
|
"links": [ |
{
|
"name": "linkIUWEhdXs", |
"status": "stopped", |
"state": [] |
}
|
]
|
}
|
cbcollect ->
https://cb-engineering.s3.amazonaws.com/SysTestColumnarJul9/collectinfo-2024-07-11T090009-ns_1%40svc-da-node-006.adewm3olqtoa4vfw.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/SysTestColumnarJul9/collectinfo-2024-07-11T090009-ns_1%40svc-da-node-008.adewm3olqtoa4vfw.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/SysTestColumnarJul9/collectinfo-2024-07-11T090009-ns_1%40svc-da-node-016.adewm3olqtoa4vfw.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/SysTestColumnarJul9/collectinfo-2024-07-11T090009-ns_1%40svc-da-node-022.adewm3olqtoa4vfw.sandbox.nonprod-project-avengers.com.zip
Remote cluster logs ->
https://cb-engineering.s3.amazonaws.com/RemoteClusterMB62863/collectinfo-2024-07-11T102339-ns_1%40svc-d-node-001.cbzexddeqouqo8iv.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/RemoteClusterMB62863/collectinfo-2024-07-11T102339-ns_1%40svc-d-node-002.cbzexddeqouqo8iv.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/RemoteClusterMB62863/collectinfo-2024-07-11T102339-ns_1%40svc-d-node-003.cbzexddeqouqo8iv.sandbox.nonprod-project-avengers.com.zip
Attachments
Issue Links
- Clones
-
MB-62683 [System Test] Ingestion is slow - link state seems to change to "stopped" over and over
- Closed