Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-39687

[BP 6.6] - Alternate IP Based XDCR Appears Broken

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 6.6.0
    • 6.5.1
    • XDCR
    • None
    • Kubernetes (any), CAO 2.0.0
    • Untriaged
    • Ubuntu 64-bit
    • 1
    • Unknown

    Description

      What

      I noted when testing Operator 2.0.1 that XDCR in our sanity suite (e.g. should never fail) was throwing up an error.  I had only changed security settings so nothing to do with XDCR which was immediately suspect.

      Here's Operator 2.0.0 running against 6.5.0:

       

      $ tco -t TestXdcrCreateCluster --server-image couchbase/server:6.5.0 -c gke_couchbase-engineering_us-east1_spjmurray -c gke_couchbase-engineering_us-west1_spjmurray -i couchbase/operator:2.0.0 -I couchbase/admission-controller:2.0.0
      === RUN   TestOperator
      === RUN   TestOperator/TestXdcrCreateCluster
      PASS
      --- PASS: TestOperator (312.74s)
          --- PASS: TestOperator/TestXdcrCreateCluster (195.50s)
              crd_util.go:26: creating couchbase cluster: test-couchbase-czzbs
              crd_util.go:26: creating couchbase cluster: test-couchbase-724rs
          test_util.go:35: Suite Test Results: 
          test_util.go:64: 1: TestXdcrCreateCluster...PASS
          test_util.go:106: 
               Pass: 1.000000 
               Fail: 0.000000 
               Pass Rate: 100.000000

      and against 6.5.1:

      tco -t TestXdcrCreateCluster --server-image couchbase/server:6.5.1 -c gke_couchbase-engineering_us-east1_spjmurray -c gke_couchbase-engineering_us-west1_spjmurray -i couchbase/operator:2.0.0 -I couchbase/admission-controller:2.0.0
      === RUN   TestOperator
      === RUN   TestOperator/TestXdcrCreateCluster
      FAIL
      --- FAIL: TestOperator (891.61s)
          --- FAIL: TestOperator/TestXdcrCreateCluster (781.25s)
              crd_util.go:26: creating couchbase cluster: test-couchbase-f2sv5
              crd_util.go:26: creating couchbase cluster: test-couchbase-vdkwf
              util.go:1304: context deadline exceeded: document count 0, expected 10
              util.go:1305: goroutine 531 [running]:
                  runtime/debug.Stack(0xc000580300, 0xc000b7dc50, 0x1)
                  	/usr/local/go/src/runtime/debug/stack.go:24 +0xab
                  github.com/couchbase/couchbase-operator/test/e2e/e2eutil.Die(0xc000580300, 0x23481e0, 0xc000434a60)
                  	/home/simon/go/src/github.com/couchbase/couchbase-operator/test/e2e/e2eutil/util.go:1305 +0x85
                  github.com/couchbase/couchbase-operator/test/e2e/e2eutil.MustVerifyDocCountInBucket(0xc000580300, 0xc000331680, 0xc000334d80, 0x20392c7, 0x7, 0xa, 0x8bb2c97000)
                  	/home/simon/go/src/github.com/couchbase/couchbase-operator/test/e2e/e2eutil/xdcr_util.go:120 +0xb5
                  github.com/couchbase/couchbase-operator/test/e2e.TestXdcrCreateCluster(0xc000580300)
                  	/home/simon/go/src/github.com/couchbase/couchbase-operator/test/e2e/xdcr_test.go:336 +0x784
                  github.com/couchbase/couchbase-operator/test/e2e/framework.RecoverDecorator.func1(0xc000580300)
                  	/home/simon/go/src/github.com/couchbase/couchbase-operator/test/e2e/framework/test_util.go:347 +0x85
                  testing.tRunner(0xc000580300, 0xc00084e5b0)
                  	/usr/local/go/src/testing/testing.go:909 +0x19a
                  created by testing.(*T).Run
                  	/usr/local/go/src/testing/testing.go:960 +0x652
                  
          test_util.go:35: Suite Test Results: 
          test_util.go:67: 1: TestXdcrCreateCluster...FAIL
          test_util.go:93: Failures: 
          test_util.go:95: 1: TestXdcrCreateCluster
          test_util.go:106: 
               Pass: 0.000000 
               Fail: 1.000000 
               Pass Rate: 0.000000
          test_util.go:117: suite contains failures

      The remote end is using IP based alternate addresses:

      kubectl --context gke_couchbase-engineering_us-west1_spjmurray -n remote exec -ti test-couchbase-vdkwf-0000 -- curl http://localhost:8091/pools/default/nodeServices -u Administrator:password | python3 -m json.tool
      {
          "rev": 39,
          "nodesExt": [
              {
                  "services": {
                      "mgmt": 8091,
                      "mgmtSSL": 18091,
                      "indexAdmin": 9100,
                      "indexScan": 9101,
                      "indexHttp": 9102,
                      "indexStreamInit": 9103,
                      "indexStreamCatchup": 9104,
                      "indexStreamMaint": 9105,
                      "indexHttps": 19102,
                      "kv": 11210,
                      "kvSSL": 11207,
                      "capi": 8092,
                      "capiSSL": 18092,
                      "projector": 9999,
                      "n1ql": 8093,
                      "n1qlSSL": 18093
                  },
                  "thisNode": true,
                  "hostname": "test-couchbase-vdkwf-0000.test-couchbase-vdkwf.remote.svc",
                  "alternateAddresses": {
                      "external": {
                          "hostname": "10.16.0.30",
                          "ports": {
                              "mgmt": 31671,
                              "mgmtSSL": 32548,
                              "kv": 31796,
                              "kvSSL": 31968,
                              "capi": 31383,
                              "capiSSL": 31979
                          }
                      }
                  }
              },
              {
                  "services": {
                      "mgmt": 8091,
                      "mgmtSSL": 18091,
                      "indexAdmin": 9100,
                      "indexScan": 9101,
                      "indexHttp": 9102,
                      "indexStreamInit": 9103,
                      "indexStreamCatchup": 9104,
                      "indexStreamMaint": 9105,
                      "indexHttps": 19102,
                      "kv": 11210,
                      "kvSSL": 11207,
                      "capi": 8092,
                      "capiSSL": 18092,
                      "projector": 9999,
                      "n1ql": 8093,
                      "n1qlSSL": 18093
                  },
                  "hostname": "test-couchbase-vdkwf-0001.test-couchbase-vdkwf.remote.svc",
                  "alternateAddresses": {
                      "external": {
                          "hostname": "10.16.0.34",
                          "ports": {
                              "mgmt": 31615,
                              "mgmtSSL": 31177,
                              "kv": 32342,
                              "kvSSL": 31076,
                              "capi": 32325,
                              "capiSSL": 32739
                          }
                      }
                  }
              },
              {
                  "services": {
                      "mgmt": 8091,
                      "mgmtSSL": 18091,
                      "indexAdmin": 9100,
                      "indexScan": 9101,
                      "indexHttp": 9102,
                      "indexStreamInit": 9103,
                      "indexStreamCatchup": 9104,
                      "indexStreamMaint": 9105,
                      "indexHttps": 19102,
                      "kv": 11210,
                      "kvSSL": 11207,
                      "capi": 8092,
                      "capiSSL": 18092,
                      "projector": 9999,
                      "n1ql": 8093,
                      "n1qlSSL": 18093
                  },
                  "hostname": "test-couchbase-vdkwf-0002.test-couchbase-vdkwf.remote.svc",
                  "alternateAddresses": {
                      "external": {
                          "hostname": "10.16.0.36",
                          "ports": {
                              "mgmt": 31020,
                              "mgmtSSL": 31086,
                              "kv": 31648,
                              "kvSSL": 31784,
                              "capi": 32130,
                              "capiSSL": 31562
                          }
                      }
                  }
              }
          ],
          "clusterCapabilitiesVer": [
              1,
              0
          ],
          "clusterCapabilities": {
              "n1ql": [
                  "enhancedPreparedStatements"
              ]
          }
      }
      

      However the UI is telling the story that it's attempting to use DNS based addresses:

      Why is this a Problem?

      We strongly discourage the use of IP based alternate addressing--as DNS based is far superior, and still works thankfully.  The reality of the situation is the vast majority of our customers use Red Hat Openshift, and that uses OVS as its networking layer, e.g. an overlay with a DNAT, forcing the use of IP based alternate addressing.

      The big risk here is anyone doing an upgrade will find themselves unable to rollback and have their XDCR connections stop working.

      Setup

      • X.Y.default.svc are the XDCR source
        • Establishes XDCR using an IP based "node port" URL
      • X.Y.remote.svc are the XDCR target
        • Has IP based alternate addresses exposed
      • Logs coming in a follow up as I don't trust the Mrs' internet connection...

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              arunkumar Arunkumar Senthilnathan (Inactive)
              neil.huang Neil Huang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty