Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31651 Improve experience dealing with port conflicts
  3. MB-39488

[Backport MB-31109 to 6.6.0] Indexing service should fail to start if it cannot listen on all required ports

    XMLWordPrintable

Details

    Description

      Summary

      KV-Engine can fail to bind to IPv4 port 11210 (meaning clients cannot connect to it) but still be considered successfully running by ns_server if it managed to bind to IPv6 port 11210.

      Details

      When initialising its listening sockets KV-Engine currently[1] will consider an interface successfully setup if it can open a listening socket on at least one address for each requested protocol (IPv4 / IPv6) - see server_socket().

      Only if none of the requested protocols could be opened does initialisation fail and the memcached process fail to start.

      Because ns_server doesn't specify which of IPv4 / IPv6 an interface should listen on in the memcached.json config; memcached defaults to a value of true for both and will consider success if either IPv4 or IPv6 binds.

      Given that most Linux distributions enable the IPv6 stack by default; if IPv4 port 11210 is not available (for example if a connection is already established on it) the the result is that while memcached fails to bind on IPv4 11210 it will nethertheless overall report success as long as it could bind the IPv6 11210 port.

      [1]: In all versions prior to 5.5.0; and 5.5.1 onwards - see also MB-30610).

      Workaround

      By disabling the IPv6 stack at the OS level, memcached will not be able to bind any IPv6 ports; and as such will only report success from server_socket() if the IPv4 port could be bound.

      Note there are two high-level methods to disable IPv6 on Linux:

      1. Disable the entire IPv6 protocol stack - typically by adding a boot-time grub parameter: ipv6.disable=1
      2. Disable assignment of IPv6 addresses to interfaces - either by setting the boot-time grub parameter: ipv6.disable_ipv6=1, or dynamically by setting the appropriate net.ipv6.conf sysctl properties.

      Only method 1 avoids the problem - method two stops IPv6 traffic from bring routed; but does not prevent IPv6 listening sockets from being created.

      Proposed Solution

      1. ns_server should explicitly set in memcached.json config which if IPv4 / IPv6 it should listen on (by adding ipv4 / ipv6 keys to each configured interface.
        • If we only need to support single-stack deployments (IPv4 XOR IPv6) then that change will be sufficient - memcached will only be told to listen for a single protocol; and if that protocol fails then the whole setup will fail and it will terminate with a fatal error.
        • If we need to support dual-stack deployments - (IPv4 OR IPv6) then additional changes are needed in the memcached <-> ns_server interface, as memcached needs to be told what it should do in the event of one of the protocols not binding.

      In the dual-stack case I would propose changing the ipv4 / ipv6 fields from booleans to a tri-state enum:

      "ipv4" / "ipv6": "never", "optional", "always" 
      

      KV-Engine would only attempt to bind on the given protocol if optional or always is specified; if it failed it would only be fatal if always had been specified. This moves the policy up to ns_server.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Build couchbase-server-6.6.0-7675 contains indexing commit 576f2f8 with commit message:
            MB-39488: Fail fast if network ports needed for dataport servers are not available

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7675 contains indexing commit 576f2f8 with commit message: MB-39488 : Fail fast if network ports needed for dataport servers are not available

            Build couchbase-server-6.6.0-7746 contains indexing commit 3fb7421 with commit message:
            MB-39488: Make ip:port binding on projector and indexer more lenient

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7746 contains indexing commit 3fb7421 with commit message: MB-39488 : Make ip:port binding on projector and indexer more lenient

            Verified for 6.6.0 - 7813 

            Steps to verify : 

            For cluster in IPv4 Mode

            1. indexing service listening on below ports
            2. # netstat -antlp | grep -E '9101|9102|9999|9100' | grep LISTEN
              tcp 0 0 0.0.0.0:19102 0.0.0.0:* LISTEN 30390/indexer
              tcp 0 0 0.0.0.0:9100 0.0.0.0:* LISTEN 30390/indexer
              tcp 0 0 0.0.0.0:9101 0.0.0.0:* LISTEN 30390/indexer
              tcp 0 0 0.0.0.0:9102 0.0.0.0:* LISTEN 30390/indexer
              tcp 0 0 0.0.0.0:9999 0.0.0.0:* LISTEN 30149/projector
            3. Stop couchbase server and start external listener on all of the above ports (separately) on tcp
            4. Start couchbase server and indexer.log contains : 
                     For 9102,19102,9100,9101 - "listen tcp4 :portnum: bind: address already in use"
                     For 9103,9104,9105 - "Error in listening on network port :portnum" Indexer exited normally.
                     For 9999 - "tcp 10.112.190.101:9999: getsockopt: connection refused. Projector health check needed, indexer can not proceed"
            5. UI - left bottom pop up - "Warning: Cannot communicate with indexer process. Information on indexes may be stale. Will retry."

            For cluster in IPv6 Mode

            1. indexing service listening on below ports
            2. # netstat -antlp | grep -E '9101|9102|9999|9100' | grep LISTEN
              tcp6 0 0 :::9100 :::* LISTEN 4690/indexer
              tcp6 0 0 :::9101 :::* LISTEN 4690/indexer
              tcp6 0 0 :::9102 :::* LISTEN 4690/indexer
              tcp6 0 0 :::9999 :::* LISTEN 4587/projector
              tcp6 0 0 :::19102 :::* LISTEN 4690/indexer
            3. Stop couchbase server and start external listener on all of the above ports (separately) on tcp
            4. Start couchbase server and indexer.log contains : 
                     For 9102,19102,9100,9101 - "listen tcp6 :portnum: bind: address already in use"
                     For 9103,9104,9105 - "Error in listening on network port :portnum" Indexer exited normally.
                     For 9999 - "tcp 172.23.211.58:9999: getsockopt: connection refused Projector health check needed, indexer can not proceed"
            5. UI - left bottom pop up - "Warning: Cannot communicate with indexer process. Information on indexes may be stale. Will retry."
            prajwal.kirankumar Prajwal‌ Kiran Kumar‌ (Inactive) added a comment - Verified for 6.6.0 - 7813  Steps to verify :  For cluster in IPv4 Mode indexing service listening on below ports # netstat -antlp | grep -E '9101|9102|9999|9100' | grep LISTEN tcp 0 0 0.0.0.0:19102 0.0.0.0:* LISTEN 30390/indexer tcp 0 0 0.0.0.0:9100 0.0.0.0:* LISTEN 30390/indexer tcp 0 0 0.0.0.0:9101 0.0.0.0:* LISTEN 30390/indexer tcp 0 0 0.0.0.0:9102 0.0.0.0:* LISTEN 30390/indexer tcp 0 0 0.0.0.0:9999 0.0.0.0:* LISTEN 30149/projector Stop couchbase server and start external listener on all of the above ports (separately) on tcp Start couchbase server and indexer.log contains :         For 9102,19102,9100,9101 - "listen tcp4 :portnum: bind: address already in use"        For 9103,9104,9105 - "Error in listening on network port :portnum" Indexer exited normally.        For 9999 - "tcp 10.112.190.101:9999: getsockopt: connection refused. Projector health check needed, indexer can not proceed" UI - left bottom pop up - "Warning: Cannot communicate with indexer process. Information on indexes may be stale. Will retry." For cluster in IPv6 Mode indexing service listening on below ports # netstat -antlp | grep -E '9101|9102|9999|9100' | grep LISTEN tcp6 0 0 :::9100 :::* LISTEN 4690/indexer tcp6 0 0 :::9101 :::* LISTEN 4690/indexer tcp6 0 0 :::9102 :::* LISTEN 4690/indexer tcp6 0 0 :::9999 :::* LISTEN 4587/projector tcp6 0 0 :::19102 :::* LISTEN 4690/indexer Stop couchbase server and start external listener on all of the above ports (separately) on tcp Start couchbase server and indexer.log contains :         For 9102,19102,9100,9101 - "listen tcp6 :portnum: bind: address already in use"        For 9103,9104,9105 - "Error in listening on network port :portnum" Indexer exited normally.        For 9999 - "tcp 172.23.211.58:9999: getsockopt: connection refused Projector health check needed, indexer can not proceed" UI - left bottom pop up - "Warning: Cannot communicate with indexer process. Information on indexes may be stale. Will retry."

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              amit.kulkarni Amit Kulkarni
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty