Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7246

[windows] Commit failure and retry caused memcached to exit with 255, which in turn caused rebalance failure

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0.1
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Environment:

      Description

      to reproduce: rebalancetests.IncrementalRebalanceOut.test_load,replica=2,do-stop=True

      2012-11-21 06:50:12.180 ns_orchestrator:4:info:message(ns_1@10.1.3.82) - Starting rebalance, KeepNodes = ['ns_1@10.1.3.84','ns_1@10.1.3.85',
      'ns_1@10.1.3.82'], EjectNodes = ['ns_1@10.1.3.87']
      2012-11-21 06:54:27.902 ns_memcached:4:info:message(ns_1@10.1.3.84) - Control connection to memcached on 'ns_1@10.1.3.84' disconnected: {{badmatch,
      {error,
      closed}},
      [

      {mc_client_binary, cmd_binary_vocal_recv, 5}

      ,

      {mc_client_binary, select_bucket, 2}

      ,

      {ns_memcached, ensure_bucket, 2}

      ,

      {ns_memcached, handle_info, 2}

      ,

      {gen_server, handle_msg, 5}

      ,

      {proc_lib, init_p_do_apply, 3}

      ]}
      2012-11-21 06:54:33.024 ns_orchestrator:2:info:message(ns_1@10.1.3.82) - Rebalance exited with reason {{bulk_set_vbucket_state_failed,
      [{'ns_1@10.1.3.84',
      {'EXIT',
      {{badmatch,{error,closed,
      [

      {mc_client_binary, cmd_binary_vocal_recv,5}

      ,

      {mc_client_binary,select_bucket,2}

      ,

      {ns_memcached,ensure_bucket,2}

      ,

      {ns_memcached,handle_info,2}

      ,

      {gen_server,handle_msg,5},
      {proc_lib,init_p_do_apply,3}]},
      {gen_server,call,
      ['ns_memcached-bucket-0',
      {set_vbucket,484,replica},
      180000]}},
      {gen_server,call,
      [{'janitor_agent-bucket-0', 'ns_1@10.1.3.84'},
      {if_rebalance,<0.7929.47>,
      {update_vbucket_state,484,replica,
      undefined,'ns_1@10.1.3.85'}},
      infinity]}}}},
      {'ns_1@10.1.3.82',
      {'EXIT',
      {{{{unexpected_reason,
      badmatch,{error,closed,
      [{mc_binary,quick_stats_recv,3},
      {mc_binary,quick_stats_loop,5},
      {mc_binary,quick_stats,5},
      {mc_client_binary, get_zero_open_checkpoint_vbuckets,3},
      {ebucketmigrator_srv,handle_call,3},
      {gen_server,handle_msg,5}

      ,

      {proc_lib,init_p_do_apply,3}]}},
      [{misc,executing_on_new_process,1},
      {tap_replication_manager, change_vbucket_filter,4},
      {tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^5/1-5-', 2},
      {tap_replication_manager, do_set_incoming_replication_map,3},
      {tap_replication_manager,handle_call,3},
      {gen_server,handle_msg,5},
      {proc_lib,init_p_do_apply,3}

      ]},
      {gen_server,call,
      ['tap_replication_manager-bucket-0',

      {change_vbucket_replication,484, 'ns_1@10.1.3.84'}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-bucket-0', 'ns_1@10.1.3.82'}

      ,
      {if_rebalance,<0.7929.47>,
      {update_vbucket_state,484,replica,
      undefined,'ns_1@10.1.3.84'}},
      infinity]}}}}]},
      [

      {janitor_agent,bulk_set_vbucket_state,4}

      ,

      {ns_vbucket_mover, update_replication_post_move,3}

      ,

      {ns_vbucket_mover,handle_info,2}

      ,

      {gen_server,handle_msg,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]}

      attaching logs

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        The issue I pointed is unrelated to a crash. So that crash seems like a completely separate issue.

        Even with the erlang patch, the file open retry period is still very useful, as some windows background services and software such as antivirus might periodically open files without any share flags specified.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - The issue I pointed is unrelated to a crash. So that crash seems like a completely separate issue. Even with the erlang patch, the file open retry period is still very useful, as some windows background services and software such as antivirus might periodically open files without any share flags specified.
        Hide
        iryna iryna added a comment -

        Tested build 1976 on Windows + service pack 1 installed. Didn't reproduce this issue

        Show
        iryna iryna added a comment - Tested build 1976 on Windows + service pack 1 installed. Didn't reproduce this issue
        Hide
        iryna iryna added a comment -

        For windows during rebalance we can get access conflicts and as result couchbase fails to persist the mutations or deletions to some vbuckets. Then after many retries memcached dies and rebalance fails
        This problem is not appeared if windows has service pack 1 installed ( http://technet.microsoft.com/en-us/windows/gg635126.aspx )

        Show
        iryna iryna added a comment - For windows during rebalance we can get access conflicts and as result couchbase fails to persist the mutations or deletions to some vbuckets. Then after many retries memcached dies and rebalance fails This problem is not appeared if windows has service pack 1 installed ( http://technet.microsoft.com/en-us/windows/gg635126.aspx )
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Would be nice to get an answer to Chiyoung's last question.

        "In your tests, did you see the file access violation error in the logs when you get the above APPCRASH crash?"

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Would be nice to get an answer to Chiyoung's last question. "In your tests, did you see the file access violation error in the logs when you get the above APPCRASH crash?"
        Hide
        mccouch MC Brown (Inactive) added a comment -
        Show
        mccouch MC Brown (Inactive) added a comment - Added a note to the requirements section of the docs: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-getting-started-prepare-platforms.html

          People

          • Assignee:
            mccouch MC Brown (Inactive)
            Reporter:
            iryna iryna
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes