gocbcore v10.2.3+ can not perform CCCP polling

Description

Sync Gateway recently upgraded from github.com/couchbase/gocbcore/v10 v10.1.6 to github.com/couchbase/gocbcore/v10 v10.2.3-0.20230404070112-cab6da1895ae to fix the https://couchbasecloud.atlassian.net/browse/GOCBC-1401

In basic case in our test harness, we are no longer able to make a CCCP connection.

Our test case is

  1. start up CBS in docker

  2. run go test in sync gateway

  3. go test creates a bucket, fails with CCCP polling

  4. if successful, runs a test (in this case, a simple DCP test)

 

The interesting logs are from verbose_int.out.raw

Here's an example from enterprise-7.0.5 (failing)= https://jenkins.sgwdev.com/job/SyncGateway-Integration/1681/artifact/verbose_int.out.raw/view/

Here's a passing example:
https://jenkins.sgwdev.com/job/SyncGateway-Integration/1683/

The difference between these two builds is https://github.com/couchbase/sync_gateway/commit/b4dab6117732ba793bb83b9eb1406b7e18e990b1. I've also fixed this so sync gateway go.mod uses gocb v2.6.2 which we probably should have done originally, but I get the same failure: https://jenkins.sgwdev.com/job/SyncGateway-Integration/1684/

The automation code I use for starting CBS is https://github.com/couchbase/sync_gateway/pull/6176/files integration-test/start_server.sh.  This code will probably only work on linux right now where Jenkins is running but I expect to be modifying it to work on mac soon.

Environment

None

Gerrit Reviews

None

Release Notes Description

None

Activity

Show:

Charles Dixon April 12, 2023 at 8:20 AM
Edited

The issue here is that once a bucket is opened the server is responding with an invalid bucket config - the bucket config is missing the vbucket map:

2023-04-11T02:32:16.031Z [TRC] gocb+: Routing data is not valid, skipping update: 
Revision ID: 96
Revision Epoch: 1
Bucket: sg_int_0_1681180335489927701
Capi Eps:
  TLS:
  - [http://172.18.0.2:8092|http://172.18.0.2:8092/] seed: true
  - [http://172.18.0.3:8092|http://172.18.0.3:8092/] seed: false
  - [http://172.18.0.4:8092|http://172.18.0.4:8092/] seed: false
...
VBMap:
&{entries:[] numReplicas:1}
KetamaMap: not-used

This is happening against both versions of gocbcore, and leads the SDK to not apply the config. I guess this could be a server setup timing thing? Maybe someone from kv could comment on why that might be happening, ?

Regardless of that, the reason why the older version of gocbcore worked and the newer doesn't is a change in when the poller is started. In the newer version we have pipelined a config fetch into connection bootstrap and the CCCP poller waits until a connection has fetched a config as a part of bootstrap and the SDK has applied that config. Here the config is being rejected but the connection is already established and bootstrapped. This means that the SDK only knows about the single endpoint and isn't retrying the config fetch because a) the connection to that endoint is already bootstrapped and b) CCCP is still awaiting a connection to fetch a config. In the older version of gocbcore the CCCP poller would just fetch another config after x seconds (2.5 by default) at which point the returned config seems to be ok (which is why I think this is probably a server/bucket setup timing thing).

To fix this we probably just need to prevent CCCP from waiting until a config has been applied (I think that already confirmed that this does fix this issue), there is a reason why I added that logic though so I need to investigate that which may lead to more in depth changes. For informational purposes - this change was introduced in v10.2.0 - https://github.com/couchbase/gocbcore/commit/7a53c9ff53da680dba5ca1cf954dfc23b8942e6a

Unresolved
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Story Points

Fix versions

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created April 11, 2023 at 2:41 AM
Updated April 24, 2023 at 3:34 PM
Instabug