Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5606

Seeing Server errors on a 2 node cluster, followed by an attemp to auto-failover node, and both the nodes put in pending -state

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 1.8.1
    • 1.8.1
    • ns_server
    • Security Level: Public
    • None
    • 2 Node cluster, 2 buckets-bucket1,bucket2, 1024vbuckets
      ubuntu 24G each node
      Build - 181-918

    Description

      Setup
      1. Setup a 2 node cluster, with 2 buckets-bucket1, bucket2
      2. Load 25M items on each bucket. Enable auto-failover 30s.
      3. Cluster is in dgm, 70 percent active resident ratio on each bucket.
      4. Continue loading/mutating data to create high fragmentation

      Error/Output
      1. Seeing Server-errors - Web Request failed on the web-logs
      2. This was followed by an attempt to auto-failover one node-95( non-master). But it could not since the cluster was too small!
      3. Non-master node 95 was made the master node
      4. Memcached connection on node 94 is lost.
      5. Both the nodes are in pending state. Items from node 95 are lost, since it was auto-failed over.

      Error messages from the web-logs
      1.Server error during processing: ["web request failed",

      {path,"/pools/bucket1/saslBucketsStreaming"}

      ,

      {type,exit}

      ,
      {what,
      {timeout,

      {gen_server,call,[ns_config,get]}

      }},
      {trace,
      [

      {diag_handler,diagnosing_timeouts,1}

      ,

      {ns_bucket,json_map_from_config,2}

      ,

      {menelaus_web_buckets, '-handle_sasl_buckets_streaming/2-fun-1-', 3}

      ,

      {lists,map,2}

      ,

      {menelaus_web_buckets, '-handle_sasl_buckets_streaming/2-fun-2-', 2}

      ,

      {menelaus_web,streaming_inner,3}

      ,

      {menelaus_web,handle_streaming,4}

      ,

      {menelaus_web,loop,3}

      ]}]

      2. Could not auto-failover node ('ns_1@10.3.2.95'). Cluster was too small, you need at least 2 other nodes.

      3. Control connection to memcached on 'ns_1@10.3.2.94' disconnected: {{badmatch,
      {error,
      timeout}},
      [

      {mc_client_binary, cmd_binary_vocal_recv, 5}

      ,

      {mc_client_binary, select_bucket, 2}

      ,

      {ns_memcached, ensure_bucket, 2}

      ,

      {ns_memcached, handle_info, 2}

      ,

      {gen_server, handle_msg, 5}

      ,

      {proc_lib, init_p_do_apply, 3}

      ]} ns_memcached004 ns_1@10.3.2.94 20:42:43 - Mon Jun 18, 2012

      Attached is a screen-shot from the cluster.

      Attached the logs from nodes, 94 and 95 - bug2.tar.

      *Can access/ping both the nodes. I dont see any core-dumps on any node.

      Live cluster can be found at http://10.3.2.94:8091/index.html#sec=analytics&statsBucket=/pools/default/buckets/bucket1

      error_logger:error] [2012-06-18 20:38:29] [ns_1@10.3.2.94:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_heart:init/1
      pid: <0.236.0>
      registered_name: ns_heart
      exception exit: {timeout,{gen_server,call,[disksup,get_disk_data,5000]}}
      in function gen_server:terminate/6
      ancestors: [ns_server_sup,ns_server_cluster_sup,<0.60.0>]
      messages: [do_expensive_checks,beat,beat,beat,do_expensive_checks,beat,
      beat,force_beat,force_beat]
      links: [<0.196.0>,<0.237.0>,<0.57.0>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 121393
      stack_size: 24
      reductions: 432952171
      neighbours:

      [ns_server:debug] [2012-06-18 20:38:29] [ns_1@10.3.2.94:<0.21840.29>:menelaus_web:handle_streaming:673] Starting streaming for 10.1.3.103:35053 path /pools/bucket2/saslBucketsStreaming
      [ns_server:debug] [2012-06-18 20:38:31] [ns_1@10.3.2.94:<0.267.0>:ns_pubsub:do_subscribe_link:132] Parent process exited with reason shutdown
      [ns_server:debug] [2012-06-18 20:38:31] [ns_1@10.3.2.94:<0.263.0>:ns_pubsub:do_subscribe_link:132] Parent process exited with reason killed
      [ns_doctor:debug] [2012-06-18 20:38:29] [ns_1@10.3.2.94:ns_doctor:ns_doctor:handle_info:93] Current node statuses:
      [{'ns_1@10.3.2.94',
      [{last_heard,{1340,76981,427276}},

      {active_buckets,["bucket2","bucket1"]}

      ,

      {ready_buckets,[]}

      ,
      {replication,[

      {"bucket2",0.0}

      ,

      {"bucket1",0.0}

      ]},
      {memory,
      [

      {total,161965240}

      ,

      {processes,92352464}

      ,

      {processes_used,92324960}

      ,

      {system,69612776}

      ,

      {atom,1184801}

      ,

      {atom_used,1159075}

      ,

      {binary,4648544}

      ,

      {code,11556873}

      ,

      {ets,41617136}

      ]},
      {system_stats,
      [

      {cpu_utilization_rate,0.7246376811594203}

      ,

      {swap_total,6840901632}

      ,

      {swap_used,352305152}

      ]},
      {interesting_stats,
      [

      {curr_items,25151068}

      ,

      {curr_items_tot,50300938}

      ,

      {vb_replica_curr_items,25149870}

      ]},

      {cluster_compatibility_version,1}

      ,
      {version,
      [

      {ale,"8cffe61"}

      ,

      {os_mon,"2.2.6"}

      ,

      {mnesia,"4.4.19"}

      ,

      {inets,"5.6"}

      ,

      {kernel,"2.14.4"}

      ,

      {sasl,"2.1.9.4"}

      ,

      {ns_server,"1.8.1-918-rel-enterprise"}

      ,

      {stdlib,"1.17.4"}

      ]},

      {system_arch,"x86_64-unknown-linux-gnu"}

      ,

      {wall_clock,199922}

      ,
      {memory_data,{26397753344,26260025344,

      {<0.958.0>,16006104}

      }},
      {disk_data,
      [

      {"/",19223252,10}

      ,

      {"/dev",12884588,1}

      ,

      {"/dev/shm",12889528,0}

      ,

      {"/var/run",12889528,1}

      ,

      {"/var/lock",12889528,0}

      ,

      {"/lib/init/rw",12889528,0}

      ,

      {"/data",103208224,23}

      ]},

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ketaki Ketaki Gangal (Inactive)
            ketaki Ketaki Gangal (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty