Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-9048

capi_set_view_manager may lose nodeup events leading to failures to replicate design documents

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 3.0
    • 2.0, 2.0.1, 2.1.0, 2.2.0
    • ns_server
    • Security Level: Public
    • None

    Description

      See CBSE-703 for actual (but rare) occurrence in production.

      As pointed out in CBSE-703 there's workaround which is:

      <quote>
      Regarding design documents not being propagated. It seems that there's a rare race condition there. Workaround for now is restarting responsible processes with this snippet:

      curl -X POST -u Administrator:<password> http://&lt;host&gt;:8091/diag/eval -d 'rpc:eval_everywhere(erlang, apply, [fun () -> [exit(whereis(list_to_atom("capi_set_view_manager-" ++ B)), kill) || B <- ns_bucket:get_bucket_names(membase)] end, []]).'

      It's enough to run it against one of the nodes.
      </quote>

      Issue itself we think happens because ns_node disco might combine multiple node

      {up,down}

      events into single "cumulative" ns_node_disco_events event. And if list of nodes before and after is same it'll actually not send any event.

      But that last property of eating aggregated event completely causes capi_set_view_manager (and it's friend doing replication docs replication) to lose nodeup events. I.e. because those folks are monitoring remote processes so they always see down event. But if down+up = nothing it'll not see up event.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            andreibaranouski Andrei Baranouski
            alkondratenko Aleksey Kondratenko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty