Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
7.0.0
-
Untriaged
-
1
-
Yes
Description
Summary
FTS incorrectly uses the alternate addresses set for a node for DCP streams.
This means in the worst (and common) case where the alternate address is not reachable within the cluster (e.g. it's in Kubernetes and it's a Load Balancer for an individual pod, a fairly common deployment scenario), FTS is completely broken, while indexes can be created they are never built and cannot be searched.
In the best case where the cluster is not deployed in Kubernetes but is using Alternate Addresses for things like XDCR, the DCP traffic is being incorrectly routed out over the public internet rather than the private network as it should be.
This is a regression, FTS works fine in this environment on 6.6.x.
Steps to Reproduce
- Spin up the latest RC (I used Docker here):
docker run -d --name cc-rc -p 8091-8097:8091-8097 registry.gitlab.com/cb-vanilla/server:7.0.0-5302
- Setup the cluster with FTS enabled. Ensure that you set the hostname of the node to its IP and NOT 127.0.0.1 at this step. I also enabled node-to-node encryption, not sure if this is required for reproduction.
- Set an alternate address for your node, you can set this to any URL that will resolve but will not be accessible, I just used a Couchbase Cloud public hostname:
/opt/couchbase/bin/couchbase-cli setting-alternate-address --cluster localhost --username Administrator --password password --set --hostname cb-0001.76aad0f6-de8a-46d8-9794-47df1b10f91f.dataplane.nonprod-project-avengers.com --node 172.17.0.2
- Restart the cbft process
- Create an Index in the UI
- Try to search the Index
Expected Result
The index search completes successfully
Actual Result
Investigation
Logs show that FTS is trying to use the external address:
2021-07-16T11:10:57.457+00:00 [INFO] (GOCBCORE) Creating new agent: &{MemdAddrs:[] HTTPAddrs:[127.0.0.1:8091] BucketName:test UserAgent:matt_7adf0964fce22708_4c1c5584 UseTLS:false NetworkType: Auth:0x1d368d0 TLSRootCAProvider:<nil> UseMutationTokens:false UseCompression:false UseDurations:false DisableDecompression:false UseOutOfOrderResponses:false DisableXErrors:false DisableJSONHello:false DisableSyncReplicationHello:false UseCollections:true CompressionMinSize:0 CompressionMinRatio:0 HTTPRedialPeriod:0s HTTPRetryDelay:0s HTTPMaxWait:0s CccpMaxWait:0s CccpPollPeriod:0s ConnectTimeout:1m0s KVConnectTimeout:7s KvPoolSize:0 MaxQueueSize:0 HTTPMaxIdleConns:0 HTTPMaxIdleConnsPerHost:0 HTTPIdleConnectionTimeout:0s Tracer:<nil> NoRootTraceSpans:false DefaultRetryStrategy:<nil> CircuitBreakerConfig:{Enabled:false VolumeThreshold:0 ErrorThresholdPercentage:0 SleepWindow:0s RollingWindow:0s CompletionCallback:<nil> CanaryTimeout:0s} UseZombieLogger:false ZombieLoggerInterval:0s ZombieLoggerSampleSize:0 AuthMechanisms:[]}
|
|
2021-07-16T11:11:02.517+00:00 [WARN] (GOCBCORE) Pipeline Client 0xc00069ee40 failed to bootstrap: dial tcp 18.210.140.11:11210: i/o timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:617
|
Note that 18.210.140.11 is what cb-0001.76aad0f6-de8a-46d8-9794-47df1b10f91f.dataplane.nonprod-project-avengers.com resolves to:
nslookup cb-0001.76aad0f6-de8a-46d8-9794-47df1b10f91f.dataplane.nonprod-project-avengers.com
|
Server: 8.8.8.8
|
Address: 8.8.8.8#53
|
|
Non-authoritative answer:
|
Name: cb-0001.76aad0f6-de8a-46d8-9794-47df1b10f91f.dataplane.nonprod-project-avengers.com
|
Address: 18.210.140.11
|
Also strangely it's trying to use 11210, but node-to-node encryption is enabled, which I would have expected meant that it would try to use encrypted ports.
Logs
https://cb-engineering.s3.amazonaws.com/MB-47457/collectinfo-2021-07-19T133501-ns_1%40172.17.0.2.zip