Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46153

A node crashing during completeJoin goes into an infinite restart loop

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      Can be easily reproduced by introducing a crash in perform_actual_join() like so:

      $ git diff
      diff --git i/src/ns_cluster.erl w/src/ns_cluster.erl
      index a72c95aab..1387d3c3f 100644
      --- i/src/ns_cluster.erl
      +++ w/src/ns_cluster.erl
      @@ -1327,6 +1327,8 @@ perform_actual_join(RemoteNode, NewCookie, ChronicleInfo) ->
               ns_cluster_membership:prepare_to_join(RemoteNode, NewCookie),
               ok = chronicle_local:prepare_join(ChronicleInfo),
       
      +        exit(crash),
      +
               %% reload is needed to reinitialize ns_config's cache after
               %% config cleanup ('erase' causes the problem, but it looks like
               %% it's not worth it to add proper 'erase' support to ns_config)
      

      Also seen in MB-46040: https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB46040/output_file.zip

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Aliaksey Artamonau Aliaksey Artamonau (Inactive)
              Aliaksey Artamonau Aliaksey Artamonau (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty