Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7059

[system test] beam.smp is running at node 43 but all other nodes saw this node down

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 2.0
    • 2.0
    • ns_server
    • Security Level: Public
    • centos 6.2 64bit build 2.0.0-1908

    Description

      Cluster information:

      • 8 centos 6.2 64bit server with 4 cores CPU
      • Each server has 32 GB RAM and 400 GB SSD disk.
      • 24.8 GB RAM for couchbase server at each node
      • SSD disk format ext4 on /data
      • Each server has its own SSD drive, no disk sharing with other server.
      • Create cluster with 6 nodes installed couchbase server 2.0.0-1908
      • Link to manifest file http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1908-rel.rpm.manifest.xml
      • Cluster has 2 buckets, default (12GB with 2 replica) and saslbucket (12GB with 1 replica).
      • Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)

      10.6.2.37
      10.6.2.38
      10.6.2.44
      10.6.2.45
      10.6.2.42
      10.6.2.43

      • Load 16 million items to default bucket and 20 million items to saslbuckett. Each key has size from 512 bytes to 1024 bytes
      • After done loading, wait until initial index. Disable view compaction.
      • After initial indexing done, mutate all items with size from 1024 to 1512 bytes.
      • Queries all 4 views from 2 docs
      • Do swap rebalance, remove node 39, 40 and add node 44, 45.
      • At the end of rebalance saslbucket, rebalance exited with timeout on node 43
      • Then see a lot of reset connection to mccouch. Updated bug MB-7046
      • Kill all loads pointing to this cluster. Node 43 did not back to stable state.
      • beam.smp is running but node 43 still down.
      • Kill beam.smp by sigusr1 to create erlang core dump

      Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201210/orange-ci-1908-node43-down-erl-hang-20121030.tgz

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            thuan Thuan Nguyen
            thuan Thuan Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty