Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7294

Memcached crashes caused all nodes on a windows ec2 cluster to remain in "pend" state

    Details

      Description

      Cluster setup: c1:c2::5:5
      biXDCR_bucket: c1 <---> c2
      uniXDCR_src: c1 ---> c2 :uniXDCR_dest
      Front end loads on c1 and c2 for biXDCR_bucket, and on c1 for uniXDCR_src.
      c1: http://ec2-54-234-1-196.compute-1.amazonaws.com:8091/
      c2: http://ec2-50-19-189-64.compute-1.amazonaws.com:8091/

      • All nodes on c1 in pend state.
      • cbstats says that warm_thread on all these nodes is complete.
      • cbstats states that all vbuckets are dead.

      Multiple crash reports:
      =========================CRASH REPORT=========================
      crasher:
      initial call: compaction_daemon:spawn_vbucket_compactor/2-fun-0/0
      pid: <0.31351.3>
      registered_name: []
      exception exit: {{badmatch,285},
      [

      {couch_db_updater,copy_compact,3}

      ,

      {couch_db_updater,start_copy_compact,1}

      ]}
      in function compaction_daemon:'spawn_vbucket_compactor/2-fun-0'/4
      ancestors: [<0.26240.3>,<0.26238.3>,<0.26237.3>,compaction_daemon,
      <0.465.0>,ns_server_sup,ns_server_cluster_sup,<0.67.0>]
      messages: []
      links: [<0.26240.3>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 2584
      stack_size: 24
      reductions: 948
      neighbours:

      Will attach cbcollect_info stats in a bit.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        abhinav Abhinav Dangeti added a comment -

        cbcollect_info stats from c1's ec2-50-17-142-113.compute-1.amazonaws.com:
        https://s3.amazonaws.com/bugdb/MB-7294/blah.zip

        Show
        abhinav Abhinav Dangeti added a comment - cbcollect_info stats from c1's ec2-50-17-142-113.compute-1.amazonaws.com: https://s3.amazonaws.com/bugdb/MB-7294/blah.zip
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        Is it a regresson? Did we see this issue in your early XDCR 5:5 test on windows?

        Please upload the core dump of memcached and ask Chiyoung to take a first look. Thanks.

        Show
        junyi Junyi Xie (Inactive) added a comment - Is it a regresson? Did we see this issue in your early XDCR 5:5 test on windows? Please upload the core dump of memcached and ask Chiyoung to take a first look. Thanks.
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        Just talked to Abhinav, we saw this issue only once and it seems not impact regular userability of XDCR on windows. There is no core dumped and hard to triage.

        IMHO, unless we see this issue repeatedly, this bug is not extremely critical and should not impact XDCR userability. Keep it open at this time.

        Show
        junyi Junyi Xie (Inactive) added a comment - Just talked to Abhinav, we saw this issue only once and it seems not impact regular userability of XDCR on windows. There is no core dumped and hard to triage. IMHO, unless we see this issue repeatedly, this bug is not extremely critical and should not impact XDCR userability. Keep it open at this time.
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        Abhinav,

        This bug is pretty old, I mark it resolved since nothing to fix at this time. Is it still valid? Please close it if not. If it is still valid, please upload the memcached crash log and dumps, and assign to ep_engine team.

        Thanks.

        Show
        junyi Junyi Xie (Inactive) added a comment - Abhinav, This bug is pretty old, I mark it resolved since nothing to fix at this time. Is it still valid? Please close it if not. If it is still valid, please upload the memcached crash log and dumps, and assign to ep_engine team. Thanks.
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        See my comments

        Show
        junyi Junyi Xie (Inactive) added a comment - See my comments

          People

          • Assignee:
            abhinav Abhinav Dangeti
            Reporter:
            abhinav Abhinav Dangeti
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes