Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-41722

[BP MB-41691] - rebalance causes very large number of TIME_WAIT connections

    XMLWordPrintable

    Details

    • Triage:
      Untriaged
    • Story Points:
      1
    • Is this a Regression?:
      No

      Description

      Index rebalance can result in very large number of TIME_WAIT connections. There is a bug in waitForIndexBuild routine which causes it not to terminate at the end of the batch. Rather, it remains active till the end of rebalance. Due to the bug, it reaches a state where it will be in a for loop collection stats/status till next batch or end of rebalance. Each such collection creates new http.Client object which will lead to TIME_WAIT connections when freed by golang GC.

      This can lead to rebalance failure with following error:

      dial tcp 127.0.0.1:9102: connect: cannot assign requested address
      

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            mihir.kamdar Mihir Kamdar added a comment -

            Wayne Siu Jeelan Poola I am fine with including this in 6.6.1. The MB has details for repro and fix.

            Show
            mihir.kamdar Mihir Kamdar added a comment - Wayne Siu Jeelan Poola I am fine with including this in 6.6.1. The MB has details for repro and fix.
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.1-9116 contains indexing commit f297ce2 with commit message:
            MB-41722 [BP 6.6.1] fix termination for waitForIndexBuild

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9116 contains indexing commit f297ce2 with commit message: MB-41722 [BP 6.6.1] fix termination for waitForIndexBuild
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.0-7921 contains indexing commit 799af2e with commit message:
            MB-41722 [BP 6.6.1] fix termination for waitForIndexBuild

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7921 contains indexing commit 799af2e with commit message: MB-41722 [BP 6.6.1] fix termination for waitForIndexBuild
            Hide
            girish.benakappa Girish Benakappa added a comment - - edited
            • Could able to reproduce the issue with below steps with 6.6.0-7920, where during swap rebalance, number of TIME_WAIT connections in one of the node was around 28000
            • Steps to repro: With 5 index nodes, 1M docs, 20 indexes with 15 partitions each, swap rebalance - Number of time wait sockets on the incoming node at its peak - 28245
            • Fix verified with 6.6.0-7921. With the same steps above, number of time wait connections seen on the incoming node was less than 600.
            • Have done couple of iterations/repetitions as the issue doesn't happen all the time. And the observation was same.
            • Also tested the same with 6.6.1-9116. Observed the same behavior. Hence Closing this issue.
            Show
            girish.benakappa Girish Benakappa added a comment - - edited Could able to reproduce the issue with below steps with 6.6.0-7920, where during swap rebalance, number of TIME_WAIT connections in one of the node was around 28000 Steps to repro: With 5 index nodes, 1M docs, 20 indexes with 15 partitions each, swap rebalance - Number of time wait sockets on the incoming node at its peak - 28245 Fix verified with 6.6.0-7921. With the same steps above, number of time wait connections seen on the incoming node was less than 600. Have done couple of iterations/repetitions as the issue doesn't happen all the time. And the observation was same. Also tested the same with 6.6.1-9116. Observed the same behavior. Hence Closing this issue.

              People

              Assignee:
              deepkaran.salooja Deepkaran Salooja
              Reporter:
              jeelan.poola Jeelan Poola
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty