Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.2.6, 7.6.4
Affects Version/s: 7.2.5, 7.6.1
Component/s: analytics
Labels:
Environment:
1.0.0-2203

Triage:
Untriaged
Story Points:
0
Is this a Regression?:
Unknown
Sprint:
Analytics Sprint 46, Analytics Sprint 47

Description

The cluster had run through one full cycle of the system test. It's a 4-node cluster (4 vcpus + 64 GB) ingested about 10 billion items

Workload -

Type	Number of collections	Number of items in millions	Total count in millions
Remote	80	75	6000
Standalone	50	8	4000*
Kafka	5	10	50

*Some standalone collections have 8 mil and some have multiples of 8 million items. The total doc count is 4000 million ( 4 billion) items.
Number of links = 6 ( 2 remote + 2 external + 2 kafka). 1 remote link and 1 kafka link is active.

It went through scaling operations. From 4 to 8 to 16 to 32 back to 8 to 4 nodes.

Second cycle would repeat the same workload

But remote ingestion is very slow. It's been almost 18 hours and ingestion is not complete. In comparison, during the first cycle, remote ingestion was completed in around 6 to 8 hours.
There are still a bunch of datasets where ingestion is not complete -

some examples

Database0cFsFELXI.scope0NPwGeHgC.remotedatasetCuxGntPc = 52928724

Database0cFsFELXI.scope0NPwGeHgC.remotedatasetSdKQaBRi = 52934946

Database0cFsFELXI.scope0NPwGeHgC.remotedatasetStiHRVEF = 52941446

Before creating the second batch of collections, the link was disconnected, then all the remote datasets were created and then link was reconnected.

I see messages like these -

on node 006

"entityId":"linkIUWEhdXs/default1", "state":"STARTING", "prev state":"STOPPED", "suspended":false})

2024-07-10T18:01:12.523+00:00

"entityId":"linkIUWEhdXs/default1", "state":"STARTING", "prev state":"STOPPED", "suspended":false})

2024-07-10T17:52:55.357+00:00 INFO CBAS.adapter.CouchbaseConnector [cbas:linkIUWEhdXs:default1:f8bbb0059527fb8c59160733f2baae59:0 idle connection watchdog] will notify CC on idle streams after 120 seconds

Unsure if this indicates any problems. Also, the /analytics/status/ingestion API would throw such responses intermittently -

    "links": [

            "name": "linkIUWEhdXs",

            "status": "stopped",

            "state": []

cbcollect ->

https://cb-engineering.s3.amazonaws.com/SysTestColumnarJul9/collectinfo-2024-07-11T090009-ns_1%40svc-da-node-006.adewm3olqtoa4vfw.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/SysTestColumnarJul9/collectinfo-2024-07-11T090009-ns_1%40svc-da-node-008.adewm3olqtoa4vfw.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/SysTestColumnarJul9/collectinfo-2024-07-11T090009-ns_1%40svc-da-node-016.adewm3olqtoa4vfw.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/SysTestColumnarJul9/collectinfo-2024-07-11T090009-ns_1%40svc-da-node-022.adewm3olqtoa4vfw.sandbox.nonprod-project-avengers.com.zip

Remote cluster logs ->

https://cb-engineering.s3.amazonaws.com/RemoteClusterMB62863/collectinfo-2024-07-11T102339-ns_1%40svc-d-node-001.cbzexddeqouqo8iv.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/RemoteClusterMB62863/collectinfo-2024-07-11T102339-ns_1%40svc-d-node-002.cbzexddeqouqo8iv.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/RemoteClusterMB62863/collectinfo-2024-07-11T102339-ns_1%40svc-d-node-003.cbzexddeqouqo8iv.sandbox.nonprod-project-avengers.com.zip

Attachments

Issue Links

Clones

MB-62683 [System Test] Ingestion is slow - link state seems to change to "stopped" over and over

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-62744
#	Subject	Branch	Project	Status	CR	V
212614,2	MB-62744: disable remote auth refresh	neo	cbas-core	Status: MERGED	+2	+1
213335,1	MB-62845,MB-62744: merge branch 'neo' into 'trinity'	trinity	cbas	Status: MERGED	+2	+1
213336,1	MB-62845,MB-62744: merge branch 'neo' into 'trinity'	trinity	cbas-core	Status: MERGED	+2	+1

Activity

People

Assignee:: Umang

Reporter:: Michael Blow

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Jul/24 5:51 AM

Updated:: 29/Aug/24 4:05 PM

Resolved:: 25/Jul/24 7:39 PM

Gerrit Reviews

There are no open Gerrit changes

Show There are 3 closed Gerrit changes

Hide There are 3 closed Gerrit changes

MB-62744: disable remote auth refresh: Gerrit Review:

MB-62845,MB-62744: merge branch 'neo' into 'trinity': Gerrit Review:

MB-62845,MB-62744: merge branch 'neo' into 'trinity': Gerrit Review:

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty