Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-39687

[BP 6.6] - Alternate IP Based XDCR Appears Broken

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 6.5.1
    • Fix Version/s: 6.6.0
    • Component/s: XDCR
    • Labels:
      None
    • Environment:
      Kubernetes (any), CAO 2.0.0
    • Triage:
      Untriaged
    • Operating System:
      Ubuntu 64-bit
    • Story Points:
      1
    • Is this a Regression?:
      Unknown

      Description

      What

      I noted when testing Operator 2.0.1 that XDCR in our sanity suite (e.g. should never fail) was throwing up an error.  I had only changed security settings so nothing to do with XDCR which was immediately suspect.

      Here's Operator 2.0.0 running against 6.5.0:

       

      $ tco -t TestXdcrCreateCluster --server-image couchbase/server:6.5.0 -c gke_couchbase-engineering_us-east1_spjmurray -c gke_couchbase-engineering_us-west1_spjmurray -i couchbase/operator:2.0.0 -I couchbase/admission-controller:2.0.0
      === RUN   TestOperator
      === RUN   TestOperator/TestXdcrCreateCluster
      PASS
      --- PASS: TestOperator (312.74s)
          --- PASS: TestOperator/TestXdcrCreateCluster (195.50s)
              crd_util.go:26: creating couchbase cluster: test-couchbase-czzbs
              crd_util.go:26: creating couchbase cluster: test-couchbase-724rs
          test_util.go:35: Suite Test Results: 
          test_util.go:64: 1: TestXdcrCreateCluster...PASS
          test_util.go:106: 
               Pass: 1.000000 
               Fail: 0.000000 
               Pass Rate: 100.000000

      and against 6.5.1:

      tco -t TestXdcrCreateCluster --server-image couchbase/server:6.5.1 -c gke_couchbase-engineering_us-east1_spjmurray -c gke_couchbase-engineering_us-west1_spjmurray -i couchbase/operator:2.0.0 -I couchbase/admission-controller:2.0.0
      === RUN   TestOperator
      === RUN   TestOperator/TestXdcrCreateCluster
      FAIL
      --- FAIL: TestOperator (891.61s)
          --- FAIL: TestOperator/TestXdcrCreateCluster (781.25s)
              crd_util.go:26: creating couchbase cluster: test-couchbase-f2sv5
              crd_util.go:26: creating couchbase cluster: test-couchbase-vdkwf
              util.go:1304: context deadline exceeded: document count 0, expected 10
              util.go:1305: goroutine 531 [running]:
                  runtime/debug.Stack(0xc000580300, 0xc000b7dc50, 0x1)
                  	/usr/local/go/src/runtime/debug/stack.go:24 +0xab
                  github.com/couchbase/couchbase-operator/test/e2e/e2eutil.Die(0xc000580300, 0x23481e0, 0xc000434a60)
                  	/home/simon/go/src/github.com/couchbase/couchbase-operator/test/e2e/e2eutil/util.go:1305 +0x85
                  github.com/couchbase/couchbase-operator/test/e2e/e2eutil.MustVerifyDocCountInBucket(0xc000580300, 0xc000331680, 0xc000334d80, 0x20392c7, 0x7, 0xa, 0x8bb2c97000)
                  	/home/simon/go/src/github.com/couchbase/couchbase-operator/test/e2e/e2eutil/xdcr_util.go:120 +0xb5
                  github.com/couchbase/couchbase-operator/test/e2e.TestXdcrCreateCluster(0xc000580300)
                  	/home/simon/go/src/github.com/couchbase/couchbase-operator/test/e2e/xdcr_test.go:336 +0x784
                  github.com/couchbase/couchbase-operator/test/e2e/framework.RecoverDecorator.func1(0xc000580300)
                  	/home/simon/go/src/github.com/couchbase/couchbase-operator/test/e2e/framework/test_util.go:347 +0x85
                  testing.tRunner(0xc000580300, 0xc00084e5b0)
                  	/usr/local/go/src/testing/testing.go:909 +0x19a
                  created by testing.(*T).Run
                  	/usr/local/go/src/testing/testing.go:960 +0x652
                  
          test_util.go:35: Suite Test Results: 
          test_util.go:67: 1: TestXdcrCreateCluster...FAIL
          test_util.go:93: Failures: 
          test_util.go:95: 1: TestXdcrCreateCluster
          test_util.go:106: 
               Pass: 0.000000 
               Fail: 1.000000 
               Pass Rate: 0.000000
          test_util.go:117: suite contains failures

      The remote end is using IP based alternate addresses:

      kubectl --context gke_couchbase-engineering_us-west1_spjmurray -n remote exec -ti test-couchbase-vdkwf-0000 -- curl http://localhost:8091/pools/default/nodeServices -u Administrator:password | python3 -m json.tool
      {
          "rev": 39,
          "nodesExt": [
              {
                  "services": {
                      "mgmt": 8091,
                      "mgmtSSL": 18091,
                      "indexAdmin": 9100,
                      "indexScan": 9101,
                      "indexHttp": 9102,
                      "indexStreamInit": 9103,
                      "indexStreamCatchup": 9104,
                      "indexStreamMaint": 9105,
                      "indexHttps": 19102,
                      "kv": 11210,
                      "kvSSL": 11207,
                      "capi": 8092,
                      "capiSSL": 18092,
                      "projector": 9999,
                      "n1ql": 8093,
                      "n1qlSSL": 18093
                  },
                  "thisNode": true,
                  "hostname": "test-couchbase-vdkwf-0000.test-couchbase-vdkwf.remote.svc",
                  "alternateAddresses": {
                      "external": {
                          "hostname": "10.16.0.30",
                          "ports": {
                              "mgmt": 31671,
                              "mgmtSSL": 32548,
                              "kv": 31796,
                              "kvSSL": 31968,
                              "capi": 31383,
                              "capiSSL": 31979
                          }
                      }
                  }
              },
              {
                  "services": {
                      "mgmt": 8091,
                      "mgmtSSL": 18091,
                      "indexAdmin": 9100,
                      "indexScan": 9101,
                      "indexHttp": 9102,
                      "indexStreamInit": 9103,
                      "indexStreamCatchup": 9104,
                      "indexStreamMaint": 9105,
                      "indexHttps": 19102,
                      "kv": 11210,
                      "kvSSL": 11207,
                      "capi": 8092,
                      "capiSSL": 18092,
                      "projector": 9999,
                      "n1ql": 8093,
                      "n1qlSSL": 18093
                  },
                  "hostname": "test-couchbase-vdkwf-0001.test-couchbase-vdkwf.remote.svc",
                  "alternateAddresses": {
                      "external": {
                          "hostname": "10.16.0.34",
                          "ports": {
                              "mgmt": 31615,
                              "mgmtSSL": 31177,
                              "kv": 32342,
                              "kvSSL": 31076,
                              "capi": 32325,
                              "capiSSL": 32739
                          }
                      }
                  }
              },
              {
                  "services": {
                      "mgmt": 8091,
                      "mgmtSSL": 18091,
                      "indexAdmin": 9100,
                      "indexScan": 9101,
                      "indexHttp": 9102,
                      "indexStreamInit": 9103,
                      "indexStreamCatchup": 9104,
                      "indexStreamMaint": 9105,
                      "indexHttps": 19102,
                      "kv": 11210,
                      "kvSSL": 11207,
                      "capi": 8092,
                      "capiSSL": 18092,
                      "projector": 9999,
                      "n1ql": 8093,
                      "n1qlSSL": 18093
                  },
                  "hostname": "test-couchbase-vdkwf-0002.test-couchbase-vdkwf.remote.svc",
                  "alternateAddresses": {
                      "external": {
                          "hostname": "10.16.0.36",
                          "ports": {
                              "mgmt": 31020,
                              "mgmtSSL": 31086,
                              "kv": 31648,
                              "kvSSL": 31784,
                              "capi": 32130,
                              "capiSSL": 31562
                          }
                      }
                  }
              }
          ],
          "clusterCapabilitiesVer": [
              1,
              0
          ],
          "clusterCapabilities": {
              "n1ql": [
                  "enhancedPreparedStatements"
              ]
          }
      }
      

      However the UI is telling the story that it's attempting to use DNS based addresses:

      Why is this a Problem?

      We strongly discourage the use of IP based alternate addressing--as DNS based is far superior, and still works thankfully.  The reality of the situation is the vast majority of our customers use Red Hat Openshift, and that uses OVS as its networking layer, e.g. an overlay with a DNAT, forcing the use of IP based alternate addressing.

      The big risk here is anyone doing an upgrade will find themselves unable to rollback and have their XDCR connections stop working.

      Setup

      • X.Y.default.svc are the XDCR source
        • Establishes XDCR using an IP based "node port" URL
      • X.Y.remote.svc are the XDCR target
        • Has IP based alternate addresses exposed
      • Logs coming in a follow up as I don't trust the Mrs' internet connection...

       

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.0-7742 contains goxdcr commit adbc210 with commit message:
            MB-39687 - Added a network_mode flag similar to gocb to ensure users can skip heuristics to use external or default

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7742 contains goxdcr commit adbc210 with commit message: MB-39687 - Added a network_mode flag similar to gocb to ensure users can skip heuristics to use external or default
            Hide
            arunkumar Arunkumar Senthilnathan added a comment -

            Verified in 6.6.0-7846:

            14:55:57 — PASS: TestOperator/TestXdcrCreateCluster (231.37s)
            14:55:57 crd_util.go:26: creating couchbase cluster: test-couchbase-v2rw2
            14:55:57 crd_util.go:26: creating couchbase cluster: test-couchbase-vr48t

            http://qa.sc.couchbase.com/job/k8s-cbop-eks-sanity-2.0.x/118/console

            15:47:00 — PASS: TestOperator/TestXdcrCreateCluster (242.53s)
            15:47:00 crd_util.go:26: creating couchbase cluster: test-couchbase-mnggn
            15:47:00 crd_util.go:26: creating couchbase cluster: test-couchbase-9lqtq

            http://qa.sc.couchbase.com/job/k8s-cbop-gke-sanity-2.0.x/121/console

            16:06:14 — PASS: TestOperator/TestXdcrCreateCluster (346.93s)
            16:06:14 crd_util.go:26: creating couchbase cluster: test-couchbase-h792r
            16:06:14 crd_util.go:26: creating couchbase cluster: test-couchbase-8b4qn

            http://qa.sc.couchbase.com/job/k8s-cbop-aks-sanity-2.0.x/119/console

            Show
            arunkumar Arunkumar Senthilnathan added a comment - Verified in 6.6.0-7846: 14:55:57 — PASS: TestOperator/TestXdcrCreateCluster (231.37s) 14:55:57 crd_util.go:26: creating couchbase cluster: test-couchbase-v2rw2 14:55:57 crd_util.go:26: creating couchbase cluster: test-couchbase-vr48t http://qa.sc.couchbase.com/job/k8s-cbop-eks-sanity-2.0.x/118/console 15:47:00 — PASS: TestOperator/TestXdcrCreateCluster (242.53s) 15:47:00 crd_util.go:26: creating couchbase cluster: test-couchbase-mnggn 15:47:00 crd_util.go:26: creating couchbase cluster: test-couchbase-9lqtq http://qa.sc.couchbase.com/job/k8s-cbop-gke-sanity-2.0.x/121/console 16:06:14 — PASS: TestOperator/TestXdcrCreateCluster (346.93s) 16:06:14 crd_util.go:26: creating couchbase cluster: test-couchbase-h792r 16:06:14 crd_util.go:26: creating couchbase cluster: test-couchbase-8b4qn http://qa.sc.couchbase.com/job/k8s-cbop-aks-sanity-2.0.x/119/console

              People

              Assignee:
              arunkumar Arunkumar Senthilnathan
              Reporter:
              neil.huang Neil Huang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty