Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5828

rebalance is stuck during 1 hour when memory is almost full and then it's failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • 2.0
    • None
    • ns_server
    • Security Level: Public
    • None

    Description

      build 1409
      used nodes:10.5.2.11
      10.5.2.13
      10.5.2.14
      10.5.2.15

      long-running tests with the steps( not so important)
      1.one node 10.5.2.11
      2.upload using DocumentGenerator rebalance.rebalancein.RebalanceInTests.incremental_rebalance_in_with_queries,blob_generator=False,items=10000000
      3. during uploading data rebalance in 10.5.2.13, 10.5.2.15 incremental
      4.rebalance out 10.5.2.15 and stop rebalance on progress ~ 40% - data is still loading
      5.restart rebalance after 5 min
      6.rebalance in 2 nodes: 10.5.2.14 & 10.5.2.15
      7.stop loading data on 5336207 docs( my host hanged)
      8.continue loading data 6000000-10000000 keys ( 9336207 - total)
      9. create 5 views in ddoc
      10. run docs ops about 2 hours( update, get via tests scripts)
      11. rebalance out 10.5.2.15

      result:rebalance is stuck with progress
      {"status":"running","ns_1@10.5.2.11":

      {"progress":0.578125}

      ,"ns_1@10.5.2.13":

      {"progress":0.5490196078431373}

      ,"ns_1@10.5.2.14":

      {"progress":0.08949416342412453}

      ,"ns_1@10.5.2.15":{"progress":0.31640625}}

      memory is used almost fully

      top - 14:32:04 up 24 days, 5:14, 1 user, load average: 3.38, 3.38, 3.00
      Tasks: 123 total, 1 running, 122 sleeping, 0 stopped, 0 zombie
      Cpu(s): 19.9%us, 0.7%sy, 0.2%ni, 33.6%id, 45.2%wa, 0.0%hi, 0.5%si, 0.0%st
      Mem: 2058744k total, 2047312k used, 11432k free, 4488k buffers
      Swap: 5996536k total, 11260k used, 5985276k free, 425112k cached

      PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
      8033 couchbas 15 0 1078m 902m 3164 S 34.3 44.9 84:31.12 memcached
      7999 couchbas 25 0 1500m 554m 8948 S 7.0 27.6 351:07.27 beam.smp
      7722 root 34 19 282m 25m 8168 D 0.3 1.3 0:01.61 yum-updatesd-he
      7796 jenkins 15 0 12760 1112 828 R 0.3 0.1 0:00.39 top
      1 root 15 0 10368 684 572 S 0.0 0.0 0:00.79 init
      2 root RT -5 0 0 0 S 0.0 0.0 0:02.93 migration/0

      rebalance hanged almost 1 hour and then it's failed:

      Server error during processing: ["web request failed",

      {path,"/pools/default"}

      ,

      {type,exit}

      ,
      {what,
      {timeout,

      {gen_server,call, [ns_cookie_manager,cookie_get]}

      }},
      {trace,
      [

      {gen_server,call,2}

      ,

      {menelaus_web,build_nodes_info_fun,3}

      ,

      {menelaus_web,build_pool_info,4}

      ,

      {menelaus_web,handle_pool_info_wait,6}

      ,

      {menelaus_web,check_and_handle_pool_info,2}

      ,

      {menelaus_web,loop,3}

      ,

      {mochiweb_http,headers,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]}]

      Haven't heard from a higher priority node or a master, so I'm taking over.

      see also Web Console log

      possible reason is memory leak as mentioned in http://www.couchbase.com/issues/browse/MB-5806

      Attachments

        1. 10.5.2.11-8091-diag.txt.gz
          15.75 MB
        2. 10.5.2.13-8091-diag.txt.gz
          11.10 MB
        3. 10.5.2.14-8091-diag.txt.gz
          5.98 MB
        4. 10.5.2.15-8091-diag.txt.gz
          7.31 MB
        5. Couchbase Console.html
          234 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            alkondratenko Aleksey Kondratenko (Inactive)
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty