Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6216

reboot of source node causes subsequent change vbucket filter operation to fail

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 2.0-beta
    • 2.0-beta
    • ns_server
    • Security Level: Public

    Description

      From MB-6058. I've found that first rebalance failure happened because during vbucket filter change one of migrator's request for "checkpoint" stat on source node (.25) closed connection without returning anything.

      That what happens:

      [error_logger:error] [2012-08-13 11:46:34] [ns_1@10.3.121.26:error_logger:ale_error_logger_handler:log_msg:76] ** Generic server <0.20247.25> terminating

        • Last message in was {start_vbucket_filter_change, [74,92,93,263,264,298,365,366]}
        • When Server state == {state,#Port<0.2055414>,#Port<0.2055409>,
          #Port<0.2055416>,#Port<0.2055410>,<0.20249.25>,
          <<>>,<<>>,
          {set,7,16,16,8,80,48,
          {[],[],[],[],[],[],[],[],[],[],[],[],[], [],[],[]}

          ,
          {{[263],
          [298],
          [365],
          [],"
          ",[],[],[],[],[],[],
          [264],
          [],[],"J","]"}}},
          74959,false,false,0,

          {1344,883254,766949}

          ,
          not_started,undefined,
          <<"replication_ns_1@10.3.121.26">>,
          <0.20247.25>,

          Unknown macro: {had_backfill,false,undefined,[]}

          }

        • Reason for termination ==
        • badmatch,{error,closed,
          [ {mc_binary,quick_stats_recv,3}

          ,

          {mc_binary,quick_stats_loop,5}

          ,

          {mc_binary,quick_stats,5}

          ,

          {ebucketmigrator_srv,handle_call,3}

          ,

          {gen_server,handle_msg,5}

          ,

          {proc_lib,init_p_do_apply,3}

          ]}

      By analyzing logs I see that this was call on so called 'upstream-aux' connection (non-tap connection we maintain to upstream node (in this case .25) in order to request stats) and that this could be only "checkpoints" request.

      P.S. Given that we have those better logging integrated, I guess you can tell us now what log level to enable for memcached log file that will enable us to understand this properly.

      Full logs can be obtained at: https://s3.amazonaws.com/packages.couchbase/atop-files/2.0.0/atop-10nodes-1573-swap-reb-reboot-failed-20120813.tgz

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            andreibaranouski Andrei Baranouski
            alkondratenko Aleksey Kondratenko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty