Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6528

[longevity] view compaction crashed leads to node out of disk space

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • None
    • view-engine
    • Security Level: Public
    • None
    • centos 6.2 64bit

    Description

      Cluster information:

      • 11 centos 6.2 64bit server with 4 cores CPU
      • Each server has 10 GB RAM and 150 GB disk.
      • 8 GB RAM for couchbase server at each node (80% total system memmories)
      • Disk format ext3 on both data and root
      • Each server has its own drive, no disk sharing with other server.
      • Load 9 million items to both buckets and querying them continuously
      • Initial indexing, so cpu a little heavy load
      • Cluster has 2 buckets, default (3GB) and saslbucket (3GB)
      • Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
      • Add one more doc d2 with 2 views to default bucket
      • Create cluster with 10 nodes installed couchbase server 2.0.0-1663
        10.3.121.13
        10.3.121.14
        10.3.121.15
        10.3.121.16
        10.3.121.17
        10.3.121.20
        10.3.121.22
        10.3.121.24
        10.3.121.25
        10.3.121.23
      • Data path /data
      • View path /data
      • The last run, I do swap rebalance remove node 13 and add node 26.
      • Then node 26 failed due to physical failure. I failover node 26 and rebalance.
      • Rebalance failed with known issue MB-6497 at the end of rebalance saslbucket
      • Do rebalance again. Rebalance failed due to couchbase server on node 22 shutdown (run out of space)
      • Look at diags of log 22, I see compaction start and stop right after that

      [couchdb:info,2012-09-05T1:21:36.281,ns_1@10.3.121.22:<0.15943.233>:couch_log:info:39]Set view `saslbucket`, replica group `_design/d11`, compaction starting
      [couchdb:info,2012-09-05T1:21:36.289,ns_1@10.3.121.22:<0.15943.233>:couch_log:info:39]Set view `saslbucket`, replica group `_design/d11`, linked PID <0.30311.403> stopped normally
      [ns_server:info,2012-09-05T1:21:37.541,ns_1@10.3.121.22:ns_port_memcached:ns_port_server:log:169]memcached<0.499.0>: Wed Sep 5 08:21:37.339107 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.14 - Suspend for 5.00 secs

      [couchdb:info,2012-09-05T1:21:40.110,ns_1@10.3.121.22:<0.30089.403>:couch_log:info:39]Updater checkpointing set view `default` update for replica group `_design/d1`
      [couchdb:info,2012-09-05T1:21:41.082,ns_1@10.3.121.22:<0.15943.233>:couch_log:info:39]Starting updater for set view `saslbucket`, replica group `_design/d11`

      and also compaction crash error.

      [error_logger:error,2012-09-05T0:03:07.142,ns_1@10.3.121.22:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: compaction_daemon:spawn_view_index_compactor/6-fun-0/0
      pid: <0.5614.402>
      registered_name: []
      exception exit:

      {updater_died,noproc}
      in function compaction_daemon:do_spawn_view_index_compactor/5
      in call from compaction_daemon:'spawn_view_index_compactor/6-fun-0'/7
      ancestors: [<0.5612.402>,<0.5136.402>,<0.5134.402>,compaction_daemon,
      <0.536.0>,ns_server_sup,ns_server_cluster_sup,<0.59.0>]
      messages: []
      links: [<0.5612.402>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 4181
      stack_size: 24
      reductions: 3925
      neighbours:

      [ns_server:warn,2012-09-05T0:03:07.145,ns_1@10.3.121.22:<0.5136.402>:compaction_daemon:do_chain_compactors:524]Compactor for view `default/_design/d2` (pid [{type,view},
      {name,<<"default/_design/d2">>},
      {important,false},
      {fa,
      {#Fun<compaction_daemon.21.129945092>,
      [<<"default">>,
      <<"_design/d2">>,
      {config,
      {30,18446744073709551616},
      {30,18446744073709551616},
      undefined,false,
      {daemon_config,30,131072}},
      false,
      {[{type,bucket}]}]}}]) terminated unexpectedly (ignoring this): {updater_died, noproc}
      [error_logger:error,2012-09-05T0:03:07.146,ns_1@10.3.121.22:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: compaction_daemon:spawn_view_compactor/5-fun-1/0
      pid: <0.5612.402>
      registered_name: []
      exception exit: {updater_died,noproc}

      in function compaction_daemon:do_chain_compactors/2
      ancestors: [<0.5136.402>,<0.5134.402>,compaction_daemon,<0.536.0>,
      ns_server_sup,ns_server_cluster_sup,<0.59.0>]
      messages: []
      links: [<0.5136.402>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 1597
      stack_size: 24
      reductions: 3959
      neighbours:

      =========================CRASH REPORT=========================
      crasher:
      initial call: compaction_daemon:spawn_view_index_compactor/6-fun-0/0
      pid: <0.9218.402>
      registered_name: []
      exception exit:

      {updater_died,noproc}
      in function compaction_daemon:do_spawn_view_index_compactor/5
      in call from compaction_daemon:'spawn_view_index_compactor/6-fun-0'/7
      ancestors: [<0.8756.402>,<0.6337.402>,<0.6289.402>,compaction_daemon,
      <0.536.0>,ns_server_sup,ns_server_cluster_sup,<0.59.0>]
      messages: []
      links: [<0.8756.402>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 233
      stack_size: 24
      reductions: 11959
      neighbours:

      =========================CRASH REPORT=========================
      crasher:
      initial call: compaction_daemon:spawn_view_compactor/5-fun-1/0
      pid: <0.8756.402>
      registered_name: []
      exception exit: {updater_died,noproc}

      in function compaction_daemon:do_chain_compactors/2
      ancestors: [<0.6337.402>,<0.6289.402>,compaction_daemon,<0.536.0>,
      ns_server_sup,ns_server_cluster_sup,<0.59.0>]
      messages: []
      links: [<0.6337.402>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 1597
      stack_size: 24
      reductions: 3957
      neighbours:

      Link to collect info of node 22
      https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_0/201209/info_node22-20120905.zip

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            FilipeManana Filipe Manana (Inactive)
            thuan Thuan Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty