Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50028

Alternate Address Break in 7.1

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • Neo
    • ns_server
    • None
    • Triaged
    • 1
    • Yes

    Description

      Now getting:

      unexpected status code: request failed PUT http://test-couchbase-frvgm-0001.test-couchbase-frvgm.test-c8h7m.svc:8091/node/controller/setupAlternateAddresses/external 400 Bad Request: Cannot set services unavailable on the node

      suggesting  that Server now requires us to filter ports based on what services are available on the pod.

      Note, this was tested with build 1772, Roo tested with 1831 and the story is the same

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          simon.murray Simon Murray added a comment -

          From the cbcollect I see:

          couchbase.log:7.1.0-1885

          The operator logs at the top level show...

          {"level":"debug","ts":1639481277.8856394,"logger":"api","msg":"http","cluster":"test-jmcmh/test-couchbase-lhdtq","method":"PUT","url":"http://test-couchbase-lhdtq-0001.test-couchbase-lhdtq.test-jmcmh.svc:8091/node/controller/setupAlternateAddresses/external","headers":{"Accept-Encoding":["application/json, text/plain, */*"],"Authorization":["Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA=="],"Content-Type":["application/x-www-form-urlencoded"],"User-Agent":["couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)"]},"request":"capi=32332&capiSSL=31297&hostname=172.18.0.2&kv=32748&kvSSL=31503&mgmt=31009&mgmtSSL=32076","status":"400 Bad Request","response":"Cannot set services unavailable on the node","time_ms":4.564389}{
           
          "level":"debug","ts":1639481279.5386343,"logger":"api","msg":"http","cluster":"test-jmcmh/test-couchbase-lhdtq","method":"PUT","url":"http://test-couchbase-lhdtq-0002.test-couchbase-lhdtq.test-jmcmh.svc:8091/node/controller/setupAlternateAddresses/external","headers":{"Accept-Encoding":["application/json, text/plain, */*"],"Authorization":["Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA=="],"Content-Type":["application/x-www-form-urlencoded"],"User-Agent":["couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)"]},"request":"capi=30052&capiSSL=32315&hostname=172.18.0.2&kv=31176&kvSSL=32233&mgmt=31000&mgmtSSL=32061","status":"400 Bad Request","response":"Cannot set services unavailable on the node","time_ms":3.745712}
           
          {"level":"debug","ts":1639481292.6682005,"logger":"api","msg":"http","cluster":"test-jmcmh/test-couchbase-lhdtq","method":"PUT","url":"http://test-couchbase-lhdtq-0000.test-couchbase-lhdtq.test-jmcmh.svc:8091/node/controller/setupAlternateAddresses/external","headers":{"Accept-Encoding":["application/json, text/plain, */*"],"Authorization":["Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA=="],"Content-Type":["application/x-www-form-urlencoded"],"User-Agent":["couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)"]},"request":"capi=31947&capiSSL=30223&hostname=172.18.0.2&kv=30745&kvSSL=31636&mgmt=30836&mgmtSSL=31880","status":"200 OK","time_ms":3.346623}

          so it works on node 0, and fails repeatedly on 1 and 2.

          Node 0 looks fine:

          10.244.0.86 - Administrator [14/Dec/2021:11:28:12 +0000] "PUT /node/controller/setupAlternateAddresses/external HTTP/1.1" 200 0 - "couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)" 3
           
          [ns_server:debug,2021-12-14T11:28:12.667Z,ns_1@test-couchbase-lhdtq-0000.test-couchbase-lhdtq.test-jmcmh.svc:ns_config_log<0.265.0>:ns_config_log:log_common:256]config change:
          {local_changes_count,<<"5238fbb651bb1209226faf08360eb87a">>} ->
          [{'_vclock',[{<<"5238fbb651bb1209226faf08360eb87a">>,{26,63806700492}}]}]
          [ns_server:debug,2021-12-14T11:28:12.667Z,ns_1@test-couchbase-lhdtq-0000.test-couchbase-lhdtq.test-jmcmh.svc:ns_config_log<0.265.0>:ns_config_log:log_common:256]config change:
          {node,'ns_1@test-couchbase-lhdtq-0000.test-couchbase-lhdtq.test-jmcmh.svc',
                alternate_addresses} ->
          [{'_vclock',[{<<"5238fbb651bb1209226faf08360eb87a">>,{1,63806700492}}]},
           {external,[{hostname,"172.18.0.2"},
                      {ports,[{ssl_rest_port,31880},
                              {rest_port,30836},
                              {memcached_ssl_port,31636},
                              {memcached_port,30745},
                              {ssl_capi_port,30223},
                              {capi_port,31947}]}]}]

          Node 1 is not fine and I see zero logging to tell us what's going on:

          10.244.0.86 - Administrator [14/Dec/2021:11:27:57 +0000] "PUT /node/controller/setupAlternateAddresses/external HTTP/1.1" 400 43 - "couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)" 4
           
          grep -R 11:27:57
          ns_server.http_access.log:10.244.0.86 - Administrator [14/Dec/2021:11:27:57 +0000] "PUT /node/controller/setupAlternateAddresses/external HTTP/1.1" 400 43 - "couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)" 4
          ns_server.http_access_internal.log:127.0.0.1 - @goxdcr-cbauth [14/Dec/2021:11:27:57 +0000] "GET /_cbauth/checkPermission?domain=admin&permission=cluster.admin.internal.xdcr%21read&user=%40goxdcr-cbauth HTTP/1.1" 200 0 - "Go-http-client/1.1" 1
          ns_server.http_access_internal.log:10.244.0.88 - @goxdcr-cbauth [14/Dec/2021:11:27:57 +0000] "POST /xdcr/p2pCommunications HTTP/1.1" 200 32 - "couchbase-goxdcr/7.1.0" 6
          ns_server.http_access_internal.log:10.244.0.86 - Administrator [14/Dec/2021:11:27:57 +0000] "PUT /node/controller/setupAlternateAddresses/external HTTP/1.1" 400 43 - "couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)" 4
          ns_server.prometheus.log:level=debug ts=2021-12-14T11:27:57.194Z caller=scrape.go:1129 component="scrape manager" scrape_pool=index_high_cardinality target=http://127.0.0.1:9102/_prometheusMetricsHigh msg="Scrape failed" err="Get \"http://127.0.0.1:9102/_prometheusMetricsHigh\": dial tcp 127.0.0.1:9102: connect: connection refused"

          What I will say is node 0 is the "master",  nodes 1 and 2 are having their alternate addresses injected before they are balanaced in, so clients will be able to see the new nodes as v buckets are made live.  Perhaps it's not filtering that's broken, it's something else and the error message is misleading/inaccurate??

          Logs: couchbase-operator-certification-20211214T113406+0000.tar.bz2

          simon.murray Simon Murray added a comment - From the cbcollect I see: couchbase.log:7.1.0-1885 The operator logs at the top level show... {"level":"debug","ts":1639481277.8856394,"logger":"api","msg":"http","cluster":"test-jmcmh/test-couchbase-lhdtq","method":"PUT","url":"http://test-couchbase-lhdtq-0001.test-couchbase-lhdtq.test-jmcmh.svc:8091/node/controller/setupAlternateAddresses/external","headers":{"Accept-Encoding":["application/json, text/plain, */*"],"Authorization":["Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA=="],"Content-Type":["application/x-www-form-urlencoded"],"User-Agent":["couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)"]},"request":"capi=32332&capiSSL=31297&hostname=172.18.0.2&kv=32748&kvSSL=31503&mgmt=31009&mgmtSSL=32076","status":"400 Bad Request","response":"Cannot set services unavailable on the node","time_ms":4.564389}{   "level":"debug","ts":1639481279.5386343,"logger":"api","msg":"http","cluster":"test-jmcmh/test-couchbase-lhdtq","method":"PUT","url":"http://test-couchbase-lhdtq-0002.test-couchbase-lhdtq.test-jmcmh.svc:8091/node/controller/setupAlternateAddresses/external","headers":{"Accept-Encoding":["application/json, text/plain, */*"],"Authorization":["Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA=="],"Content-Type":["application/x-www-form-urlencoded"],"User-Agent":["couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)"]},"request":"capi=30052&capiSSL=32315&hostname=172.18.0.2&kv=31176&kvSSL=32233&mgmt=31000&mgmtSSL=32061","status":"400 Bad Request","response":"Cannot set services unavailable on the node","time_ms":3.745712}   {"level":"debug","ts":1639481292.6682005,"logger":"api","msg":"http","cluster":"test-jmcmh/test-couchbase-lhdtq","method":"PUT","url":"http://test-couchbase-lhdtq-0000.test-couchbase-lhdtq.test-jmcmh.svc:8091/node/controller/setupAlternateAddresses/external","headers":{"Accept-Encoding":["application/json, text/plain, */*"],"Authorization":["Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA=="],"Content-Type":["application/x-www-form-urlencoded"],"User-Agent":["couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)"]},"request":"capi=31947&capiSSL=30223&hostname=172.18.0.2&kv=30745&kvSSL=31636&mgmt=30836&mgmtSSL=31880","status":"200 OK","time_ms":3.346623} so it works on node 0, and fails repeatedly on 1 and 2. Node 0 looks fine: 10.244.0.86 - Administrator [14/Dec/2021:11:28:12 +0000] "PUT /node/controller/setupAlternateAddresses/external HTTP/1.1" 200 0 - "couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)" 3   [ns_server:debug,2021-12-14T11:28:12.667Z,ns_1@test-couchbase-lhdtq-0000.test-couchbase-lhdtq.test-jmcmh.svc:ns_config_log<0.265.0>:ns_config_log:log_common:256]config change: {local_changes_count,<<"5238fbb651bb1209226faf08360eb87a">>} -> [{'_vclock',[{<<"5238fbb651bb1209226faf08360eb87a">>,{26,63806700492}}]}] [ns_server:debug,2021-12-14T11:28:12.667Z,ns_1@test-couchbase-lhdtq-0000.test-couchbase-lhdtq.test-jmcmh.svc:ns_config_log<0.265.0>:ns_config_log:log_common:256]config change: {node,'ns_1@test-couchbase-lhdtq-0000.test-couchbase-lhdtq.test-jmcmh.svc',       alternate_addresses} -> [{'_vclock',[{<<"5238fbb651bb1209226faf08360eb87a">>,{1,63806700492}}]},  {external,[{hostname,"172.18.0.2"},             {ports,[{ssl_rest_port,31880},                     {rest_port,30836},                     {memcached_ssl_port,31636},                     {memcached_port,30745},                     {ssl_capi_port,30223},                     {capi_port,31947}]}]}] Node 1 is not fine and I see zero logging to tell us what's going on: 10.244.0.86 - Administrator [14/Dec/2021:11:27:57 +0000] "PUT /node/controller/setupAlternateAddresses/external HTTP/1.1" 400 43 - "couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)" 4   grep -R 11:27:57 ns_server.http_access.log:10.244.0.86 - Administrator [14/Dec/2021:11:27:57 +0000] "PUT /node/controller/setupAlternateAddresses/external HTTP/1.1" 400 43 - "couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)" 4 ns_server.http_access_internal.log:127.0.0.1 - @goxdcr-cbauth [14/Dec/2021:11:27:57 +0000] "GET /_cbauth/checkPermission?domain=admin&permission=cluster.admin.internal.xdcr%21read&user=%40goxdcr-cbauth HTTP/1.1" 200 0 - "Go-http-client/1.1" 1 ns_server.http_access_internal.log:10.244.0.88 - @goxdcr-cbauth [14/Dec/2021:11:27:57 +0000] "POST /xdcr/p2pCommunications HTTP/1.1" 200 32 - "couchbase-goxdcr/7.1.0" 6 ns_server.http_access_internal.log:10.244.0.86 - Administrator [14/Dec/2021:11:27:57 +0000] "PUT /node/controller/setupAlternateAddresses/external HTTP/1.1" 400 43 - "couchbase-operator/2.3.0 (commit/632c6762a642344ef63d33b54c32d1a6ed850355; build/999)" 4 ns_server.prometheus.log:level=debug ts=2021-12-14T11:27:57.194Z caller=scrape.go:1129 component="scrape manager" scrape_pool=index_high_cardinality target=http://127.0.0.1:9102/_prometheusMetricsHigh msg="Scrape failed" err="Get \"http://127.0.0.1:9102/_prometheusMetricsHigh\": dial tcp 127.0.0.1:9102: connect: connection refused" What I will say is node 0 is the "master",  nodes 1 and 2 are having their alternate addresses injected before they are balanaced in, so clients will be able to see the new nodes as v buckets are made live.  Perhaps it's not filtering that's broken, it's something else and the error message is misleading/inaccurate?? Logs: couchbase-operator-certification-20211214T113406+0000.tar.bz2

          Simon Murray : Thank you for narrowing down the scenario. As you noticed this issue is only when node is added to the cluster but not rebalanced in yet and therefore doesn't have any active services. The fix for MB-49376 only checks for active services, we will fix it.

          Node 9002 has been added to cluster but not rebalanced in.
           
          $ curl -X PUT -u Administrator:asdasd http://localhost:9002//node/controller/setupAlternateAddresses/external -d 'capi=1123&capiSSL=1123&hostname=172.18.0.123&kv=1123&kvSSL=1123&mgmt=1123&mgmtSSL=1123'
          Cannot set services unavailable on the node

          Abhijeeth.Nuthan Abhijeeth Nuthan added a comment - Simon Murray : Thank you for narrowing down the scenario. As you noticed this issue is only when node is added to the cluster but not rebalanced in yet and therefore doesn't have any active services. The fix for MB-49376 only checks for active services, we will fix it. Node 9002 has been added to cluster but not rebalanced in.   $ curl -X PUT -u Administrator:asdasd http: //localhost:9002//node/controller/setupAlternateAddresses/external -d 'capi=1123&capiSSL=1123&hostname=172.18.0.123&kv=1123&kvSSL=1123&mgmt=1123&mgmtSSL=1123' Cannot set services unavailable on the node

          Simon Murray : Could you please verify the integrated fix? Thanks. 

          Abhijeeth.Nuthan Abhijeeth Nuthan added a comment - Simon Murray : Could you please verify the integrated fix? Thanks. 

          Build couchbase-server-7.1.0-1911 contains ns_server commit 898797b with commit message:
          MB-50028: Fix alternateAddresses breakage for inactiveAdded nodes

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1911 contains ns_server commit 898797b with commit message: MB-50028 : Fix alternateAddresses breakage for inactiveAdded nodes
          simon.murray Simon Murray added a comment -

          Looks fine to me!

          simon.murray Simon Murray added a comment - Looks fine to me!

          People

            simon.murray Simon Murray
            simon.murray Simon Murray
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty