Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7737

[Doc'd] Swap usage shoots up with view querying or indexing

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.1.0
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:
      None
    • Environment:
      Seen on the latest few builds
      Observed on:
      2.0.1-153 Linux
      4 Core 30GB SSD machines

      Description

      • Setup a 7 node cluster,
      • RAM (for each physical machine): ~32GB
      • create 2 buckets: default (bucket quota: 12000MB per node) and saslbucket (bucket quota: 7000MB per node)
      • run a load (with just creates), until the item count hits a mark
      • create 2 views over each of the buckets
      • Run load that pushes system into DGM, ~80% resident ratio for default, and ~60% resident ratio for saslbucket
      • Run queries continuously against the views as the load continues.
        (for e.g.:
        curl -X GET 'http://10.1.3.235:8092/default/_design/d1/_view/v1?startkey=0&stale=ok&limit=1000')
      • Noticed the increased swap usage.
      • Start a new mixed load on both the buckets (creates, updates, deletes, expires).
      • Setup XDCR to another 5 node cluster.
      • Continue querying, noticed considerable increase in swap usage.

      This behavior is seen with a cluster with views as well (no XDCR involved).

      1. vperf-lnx.loop_2.0.1-153-rel-enterprise_2.0.1-153-rel-enterprise_VESTA_Feb-11-2013_15-40-09.pdf
        2.54 MB
        Farshid Ghods
      1. Screen Shot 2013-02-13 at 4.46.23 PM.png
        124 kB
      2. Screen Shot 2013-02-18 at 2.38.32 PM.png
        35 kB
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        jin Jin Lim (Inactive) added a comment -

        Abhinav,
        If available please provide more information regarding test environment setup and test scenario (manual steps to reproduce or test suite, etc).
        Also, can you please confirm if this is a regression or existing since 2.0.0. Please assign back to Jin after the requested info. Thanks!

        Show
        jin Jin Lim (Inactive) added a comment - Abhinav, If available please provide more information regarding test environment setup and test scenario (manual steps to reproduce or test suite, etc). Also, can you please confirm if this is a regression or existing since 2.0.0. Please assign back to Jin after the requested info. Thanks!
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        This is an existing behavior from 2.0 and the bug is opened as placeholder where system test team keeps adding more information based on the weekly runs.

        Abhinav,

        please add the test spec and the hardware which this test runs on ( things like how many buckets , how much ram left for indexing )..

        Show
        farshid Farshid Ghods (Inactive) added a comment - This is an existing behavior from 2.0 and the bug is opened as placeholder where system test team keeps adding more information based on the weekly runs. Abhinav, please add the test spec and the hardware which this test runs on ( things like how many buckets , how much ram left for indexing )..
        Hide
        mikew Mike Wiederhold added a comment -

        I also want to note here that memcached is pretty good about using roughly the memory it has been told it can use. If ns_server or couchdb are using too much memory when running view related tasks and pushing memcached into swap then I think this issue should be addressed from the ns_server/couchdb side. On the other hand if memcached is using more memory than it is told it can use then the ep-engine team should be looking at this.

        Show
        mikew Mike Wiederhold added a comment - I also want to note here that memcached is pretty good about using roughly the memory it has been told it can use. If ns_server or couchdb are using too much memory when running view related tasks and pushing memcached into swap then I think this issue should be addressed from the ns_server/couchdb side. On the other hand if memcached is using more memory than it is told it can use then the ep-engine team should be looking at this.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        this is seen on performance tests as well.

        test case

        vperf−lnx.conf

        1. View performance test for Linux:
        2. 8K ops/sec
        3. 80% reads, 20% write (12% updates/deletes, 8% inserts)
        4. 8M dataset (non−DGM)
        5. Stop after 6M total queries
        6. 3 ddocs with 3 views per ddoc
          performance.iperf.MultiClientTests.test_vperf
          params:
        7. general
          batch=50
          kind=json
          mem_quota=15000
        8. load phase
          items=8000000
          hot_init_items=1000
        9. index phase
          views=[3, 3, 3]
        10. access phase
          ratio_sets=0.2
          ratio_misses=0.04
          ratio_creates=0.40
          ratio_deletes=0.50
          ratio_hot=0.2
          ratio_hot_gets=0.95
          ratio_hot_sets=0.95
          ratio_expirations=0.03
          bg_max_ops_per_sec=500
          fg_max_ops=6000000
          total_clients=16
        11. control (defaults: pytests/performance/perf_defaults.py)
          load_wait_until_drained=1
          loop_wait_until_drained=0
          mcsoda_heartbeat=3
          tear_down=1
          tear_down_proxy=1
          tear_down_bucket=0
          tear_down_cluster=1
          tear_down_on_setup=0
        Show
        farshid Farshid Ghods (Inactive) added a comment - this is seen on performance tests as well. test case vperf−lnx.conf View performance test for Linux: 8K ops/sec 80% reads, 20% write (12% updates/deletes, 8% inserts) 8M dataset (non−DGM) Stop after 6M total queries 3 ddocs with 3 views per ddoc performance.iperf.MultiClientTests.test_vperf params: general batch=50 kind=json mem_quota=15000 load phase items=8000000 hot_init_items=1000 index phase views= [3, 3, 3] access phase ratio_sets=0.2 ratio_misses=0.04 ratio_creates=0.40 ratio_deletes=0.50 ratio_hot=0.2 ratio_hot_gets=0.95 ratio_hot_sets=0.95 ratio_expirations=0.03 bg_max_ops_per_sec=500 fg_max_ops=6000000 total_clients=16 control (defaults: pytests/performance/perf_defaults.py) load_wait_until_drained=1 loop_wait_until_drained=0 mcsoda_heartbeat=3 tear_down=1 tear_down_proxy=1 tear_down_bucket=0 tear_down_cluster=1 tear_down_on_setup=0
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        abhinav,

        please rephrase :

        • create 2 buckets: default (RAM: 12000MB per node) and saslbucket (RAM: 7000MB per node)
          to
        • create 2 buckets: default (bucket qouta: 12000MB per node) and saslbucket (qouta: 7000MB per node)

        also how much RAM does each physica machine have ?

        Show
        farshid Farshid Ghods (Inactive) added a comment - abhinav, please rephrase : create 2 buckets: default (RAM: 12000MB per node) and saslbucket (RAM: 7000MB per node) to create 2 buckets: default (bucket qouta: 12000MB per node) and saslbucket (qouta: 7000MB per node) also how much RAM does each physica machine have ?
        Hide
        abhinav Abhinav Dangeti added a comment -

        *Added in description

        Show
        abhinav Abhinav Dangeti added a comment - *Added in description
        Hide
        jin Jin Lim (Inactive) added a comment -

        Per bug scrubs:

        • not a regression from 2.0 (2.0 must have this issue as well)
        • swappiness set = 10 & swap memory = 5G
        • will need to start looking at it now
        Show
        jin Jin Lim (Inactive) added a comment - Per bug scrubs: not a regression from 2.0 (2.0 must have this issue as well) swappiness set = 10 & swap memory = 5G will need to start looking at it now
        Hide
        abhinav Abhinav Dangeti added a comment - - edited

        Cluster with views: with compaction, indexing and continuous querying. Noticed swap usage shoot up with system in DGM. Rebalance operation causing swap to max out. This high swap usage (about 100% at times) causes rebalance operations to fail. This particular time during a swap rebalance (failed) a node went down, with a memcached core. (10.6.2.68: Node being rebalanced in during swap rebalance).

        High swap usage observed on all later 2.0.2 builds and 2.1.0 builds.
        Current crash happened on 2.0.2-820.
        Cluster still in high swap, no front-end/compaction/indexing.

        Back trace of memcached core: (core at /data/core.memcached.3925 on 10.6.2.68)

        Thread 11 (Thread 0x7f8df70fc700 (LWP 21383)):
        #0 0x00007f8e002627bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        #1 0x00007f8dfa0ed5c1 in wait (this=0x6d8f860) at src/syncobject.hh:57
        #2 ExecutorThread::run (this=0x6d8f860) at src/scheduler.cc:139
        #3 0x00007f8dfa0ede4d in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34
        #4 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0
        #5 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6

        Thread 10 (Thread 0x7f8df5cfa700 (LWP 21385)):
        #0 0x00007f8e002627bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        #1 0x00007f8dfa0ed5c1 in wait (this=0x6db6d00) at src/syncobject.hh:57
        #2 ExecutorThread::run (this=0x6db6d00) at src/scheduler.cc:139
        #3 0x00007f8dfa0ede4d in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34
        #4 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0
        #5 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6

        Thread 9 (Thread 0x7f8df7afd700 (LWP 21382)):
        #0 0x00007f8e0026243c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        #1 0x00007f8dfeb95fed in add_log_entry (severity=<value optimized out>, client_cookie=<value optimized out>, fmt=<value optimized out>) at extensions/loggers/file_logger.c:178
        #2 logger_log (severity=<value optimized out>, client_cookie=<value optimized out>, fmt=<value optimized out>) at extensions/loggers/file_logger.c:280
        #3 0x00007f8dfa0c373e in LOG (severity=EXTENSION_LOG_WARNING, fmt=0x7f8dfa14f48c "%s") at src/ep_engine.cc:1197
        #4 0x00007f8dfa0c1e2f in PersistenceCallback::callback(std::pair<int, long>&) () from /opt/couchbase/lib/memcached/ep.so
        #5 0x00007f8dfa1227b4 in CouchKVStore::commitCallback (this=0x6d64300, committedReqs=<value optimized out>, numReqs=39988, errCode=COUCHSTORE_ERROR_WRITE) at src/couch-kvstore/couch-kvstore.cc:1590
        #6 0x00007f8dfa1242c6 in CouchKVStore::commit2couchstore (this=0x6d64300) at src/couch-kvstore/couch-kvstore.cc:1417
        #7 0x00007f8dfa12449a in CouchKVStore::commit (this=0x6d64300) at src/couch-kvstore/couch-kvstore.cc:806
        #8 0x00007f8dfa0b5de6 in EventuallyPersistentStore::flushVBucket (this=0x6d4e000, vbid=361) at src/ep.cc:1920
        #9 0x00007f8dfa0ded52 in doFlush (this=0x2487040, tid=<value optimized out>) at src/flusher.cc:222
        #10 completeFlush (this=0x2487040, tid=<value optimized out>) at src/flusher.cc:197
        #11 Flusher::step (this=0x2487040, tid=<value optimized out>) at src/flusher.cc:170
        #12 0x00007f8dfa0ed731 in ExecutorThread::run (this=0x6d8fa00) at src/scheduler.cc:153
        #13 0x00007f8dfa0ede4d in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34
        #14 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0
        #15 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6

        Thread 8 (Thread 0x7f8df66fb700 (LWP 21384)):
        #0 0x00007f8e002627bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        #1 0x00007f8dfa0ed5c1 in wait (this=0x6db6ea0) at src/syncobject.hh:57
        #2 ExecutorThread::run (this=0x6db6ea0) at src/scheduler.cc:139
        #3 0x00007f8dfa0ede4d in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34
        #4 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0
        #5 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6

        Thread 7 (Thread 0x7f8df52f9700 (LWP 21386)):
        #0 0x00007f8e002627bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        #1 0x00007f8dfa0ed5c1 in wait (this=0x6db6b60) at src/syncobject.hh:57
        #2 ExecutorThread::run (this=0x6db6b60) at src/scheduler.cc:139
        #3 0x00007f8dfa0ede4d in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34
        #4 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0
        #5 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6

        Thread 6 (Thread 0x7f8dfad80700 (LWP 21380)):
        #0 0x00007f8dfff7097d in nanosleep () from /lib64/libc.so.6
        #1 0x00007f8dfffa5b34 in usleep () from /lib64/libc.so.6
        #2 0x00007f8dfa0e8d75 in updateStatsThread (arg=0x247c4c0) at src/memory_tracker.cc:31
        #3 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0
        #4 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6

        Thread 5 (Thread 0x7f8df84fe700 (LWP 21381)):
        #0 0x00007f8e0026243c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        #1 0x00007f8dfeb95fed in add_log_entry (severity=<value optimized out>, client_cookie=<value optimized out>, fmt=<value optimized out>) at extensions/loggers/file_logger.c:178
        #2 logger_log (severity=<value optimized out>, client_cookie=<value optimized out>, fmt=<value optimized out>) at extensions/loggers/file_logger.c:280
        #3 0x00007f8dfa0c373e in LOG (severity=EXTENSION_LOG_WARNING, fmt=0x7f8dfa14f48c "%s") at src/ep_engine.cc:1197
        #4 0x00007f8dfa0c1e2f in PersistenceCallback::callback(std::pair<int, long>&) () from /opt/couchbase/lib/memcached/ep.so
        #5 0x00007f8dfa1227b4 in CouchKVStore::commitCallback (this=0x6d64c00, committedReqs=<value optimized out>, numReqs=13023, errCode=COUCHSTORE_ERROR_WRITE) at src/couch-kvstore/couch-kvstore.cc:1590
        #6 0x00007f8dfa1242c6 in CouchKVStore::commit2couchstore (this=0x6d64c00) at src/couch-kvstore/couch-kvstore.cc:1417
        #7 0x00007f8dfa12449a in CouchKVStore::commit (this=0x6d64c00) at src/couch-kvstore/couch-kvstore.cc:806
        #8 0x00007f8dfa0b5de6 in EventuallyPersistentStore::flushVBucket (this=0x6d4e000, vbid=570) at src/ep.cc:1920
        #9 0x00007f8dfa0decd9 in doFlush (this=0x2486f00, tid=8) at src/flusher.cc:222
        #10 Flusher::step (this=0x2486f00, tid=8) at src/flusher.cc:152
        #11 0x00007f8dfa0ed731 in ExecutorThread::run (this=0x6d8fba0) at src/scheduler.cc:153
        #12 0x00007f8dfa0ede4d in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34
        #13 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0
        #14 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6

        Thread 4 (Thread 0x7f8e0139b720 (LWP 3925)):
        #0 0x00007f8dfff7097d in nanosleep () from /lib64/libc.so.6
        #1 0x00007f8dfffa5b34 in usleep () from /lib64/libc.so.6
        #2 0x00007f8dfa0ddffa in Flusher::wait (this=0x2487040) at src/flusher.cc:32
        #3 0x00007f8dfa0af5aa in stopFlusher (this=0x6d4e000, __in_chrg=<value optimized out>) at src/ep.cc:400
        #4 EventuallyPersistentStore::~EventuallyPersistentStore (this=0x6d4e000, __in_chrg=<value optimized out>) at src/ep.cc:354
        #5 0x00007f8dfa0daecf in EventuallyPersistentEngine::~EventuallyPersistentEngine() () from /opt/couchbase/lib/memcached/ep.so
        #6 0x00007f8dfa0cbed5 in EvpDestroy (handle=0x6d3c000, force=false) at src/ep_engine.cc:133
        #7 0x00007f8dfdf87bb5 in bucket_shutdown_engine (key=<value optimized out>, nkey=<value optimized out>, val=0x6cea2a0, nval=<value optimized out>, args=<value optimized out>) at bucket_engine.c:1290
        #8 0x00007f8dfdf8e68c in genhash_iter (h=0x6cee000, iterfunc=0x7f8dfdf87b80 <bucket_shutdown_engine>, arg=0x0) at genhash.c:275
        #9 0x00007f8dfdf88f46 in bucket_destroy (handle=0x7f8dfe191660, force=<value optimized out>) at bucket_engine.c:1327
        #10 0x000000000040980b in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7935

        Thread 3 (Thread 0x7f8dfeb92700 (LWP 3933)):
        #0 0x00007f8dfff9f43d in write () from /lib64/libc.so.6
        #1 0x00007f8dfff36033 in _IO_new_file_write () from /lib64/libc.so.6
        #2 0x00007f8dfff35efa in _IO_new_file_xsputn () from /lib64/libc.so.6
        #3 0x00007f8dfff2c0bd in fwrite () from /lib64/libc.so.6
        #4 0x00007f8dfeb963e7 in flush_pending_io (arg=0x247c040) at extensions/loggers/file_logger.c:319
        #5 logger_thead_main (arg=0x247c040) at extensions/loggers/file_logger.c:357
        #6 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0
        #7 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6

        Thread 2 (Thread 0x7f8df34f6700 (LWP 21389)):
        #0 0x00007f8e002627bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        #1 0x00007f8dfa0a91c8 in wait (this=0x6cd82d0, d=...) at src/syncobject.hh:57
        #2 IdleTask::run (this=0x6cd82d0, d=...) at src/dispatcher.cc:342
        #3 0x00007f8dfa0abd2a in Dispatcher::run (this=0x6d20a80) at src/dispatcher.cc:184
        #4 0x00007f8dfa0ac4ed in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28
        #5 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0
        #6 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6

        Thread 1 (Thread 0x7f8df3ef7700 (LWP 21388)):
        #0 0x00007f8dffef78a5 in raise () from /lib64/libc.so.6
        #1 0x00007f8dffef9085 in abort () from /lib64/libc.so.6
        #2 0x00007f8dffef0a1e in __assert_fail_base () from /lib64/libc.so.6
        #3 0x00007f8dffef0ae0 in __assert_fail () from /lib64/libc.so.6
        #4 0x00007f8dfa115dd1 in writeFully (this=0x39a82000) at src/mutation_log.cc:46
        #5 MutationLog::writeInitialBlock (this=0x39a82000) at src/mutation_log.cc:163
        #6 0x00007f8dfa117fe4 in MutationLog::open (this=0x39a82000, _readOnly=<value optimized out>) at src/mutation_log.cc:273
        #7 0x00007f8dfa09999b in ItemAccessVisitor::ItemAccessVisitor (this=0x6cd8c60, _store=<value optimized out>, _stats=<value optimized out>, sfin=<value optimized out>) at src/access_scanner.cc:22
        #8 0x00007f8dfa09747a in AccessScanner::callback (this=0x2489ec0, d=..., t=...) at src/access_scanner.cc:108
        #9 0x00007f8dfa0abd2a in Dispatcher::run (this=0x6d20c40) at src/dispatcher.cc:184
        #10 0x00007f8dfa0ac4ed in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28
        #11 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0
        #12 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6

        Crash on thread-1
        attachments: 10_6_2_68.zip : cbcollect_info logs from the node

        Show
        abhinav Abhinav Dangeti added a comment - - edited Cluster with views: with compaction, indexing and continuous querying. Noticed swap usage shoot up with system in DGM. Rebalance operation causing swap to max out. This high swap usage (about 100% at times) causes rebalance operations to fail. This particular time during a swap rebalance (failed) a node went down, with a memcached core. (10.6.2.68: Node being rebalanced in during swap rebalance). High swap usage observed on all later 2.0.2 builds and 2.1.0 builds. Current crash happened on 2.0.2-820. Cluster still in high swap, no front-end/compaction/indexing. Back trace of memcached core: (core at /data/core.memcached.3925 on 10.6.2.68) Thread 11 (Thread 0x7f8df70fc700 (LWP 21383)): #0 0x00007f8e002627bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f8dfa0ed5c1 in wait (this=0x6d8f860) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6d8f860) at src/scheduler.cc:139 #3 0x00007f8dfa0ede4d in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34 #4 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7f8df5cfa700 (LWP 21385)): #0 0x00007f8e002627bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f8dfa0ed5c1 in wait (this=0x6db6d00) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6db6d00) at src/scheduler.cc:139 #3 0x00007f8dfa0ede4d in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34 #4 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7f8df7afd700 (LWP 21382)): #0 0x00007f8e0026243c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f8dfeb95fed in add_log_entry (severity=<value optimized out>, client_cookie=<value optimized out>, fmt=<value optimized out>) at extensions/loggers/file_logger.c:178 #2 logger_log (severity=<value optimized out>, client_cookie=<value optimized out>, fmt=<value optimized out>) at extensions/loggers/file_logger.c:280 #3 0x00007f8dfa0c373e in LOG (severity=EXTENSION_LOG_WARNING, fmt=0x7f8dfa14f48c "%s") at src/ep_engine.cc:1197 #4 0x00007f8dfa0c1e2f in PersistenceCallback::callback(std::pair<int, long>&) () from /opt/couchbase/lib/memcached/ep.so #5 0x00007f8dfa1227b4 in CouchKVStore::commitCallback (this=0x6d64300, committedReqs=<value optimized out>, numReqs=39988, errCode=COUCHSTORE_ERROR_WRITE) at src/couch-kvstore/couch-kvstore.cc:1590 #6 0x00007f8dfa1242c6 in CouchKVStore::commit2couchstore (this=0x6d64300) at src/couch-kvstore/couch-kvstore.cc:1417 #7 0x00007f8dfa12449a in CouchKVStore::commit (this=0x6d64300) at src/couch-kvstore/couch-kvstore.cc:806 #8 0x00007f8dfa0b5de6 in EventuallyPersistentStore::flushVBucket (this=0x6d4e000, vbid=361) at src/ep.cc:1920 #9 0x00007f8dfa0ded52 in doFlush (this=0x2487040, tid=<value optimized out>) at src/flusher.cc:222 #10 completeFlush (this=0x2487040, tid=<value optimized out>) at src/flusher.cc:197 #11 Flusher::step (this=0x2487040, tid=<value optimized out>) at src/flusher.cc:170 #12 0x00007f8dfa0ed731 in ExecutorThread::run (this=0x6d8fa00) at src/scheduler.cc:153 #13 0x00007f8dfa0ede4d in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34 #14 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0 #15 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7f8df66fb700 (LWP 21384)): #0 0x00007f8e002627bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f8dfa0ed5c1 in wait (this=0x6db6ea0) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6db6ea0) at src/scheduler.cc:139 #3 0x00007f8dfa0ede4d in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34 #4 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f8df52f9700 (LWP 21386)): #0 0x00007f8e002627bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f8dfa0ed5c1 in wait (this=0x6db6b60) at src/syncobject.hh:57 #2 ExecutorThread::run (this=0x6db6b60) at src/scheduler.cc:139 #3 0x00007f8dfa0ede4d in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34 #4 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f8dfad80700 (LWP 21380)): #0 0x00007f8dfff7097d in nanosleep () from /lib64/libc.so.6 #1 0x00007f8dfffa5b34 in usleep () from /lib64/libc.so.6 #2 0x00007f8dfa0e8d75 in updateStatsThread (arg=0x247c4c0) at src/memory_tracker.cc:31 #3 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f8df84fe700 (LWP 21381)): #0 0x00007f8e0026243c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f8dfeb95fed in add_log_entry (severity=<value optimized out>, client_cookie=<value optimized out>, fmt=<value optimized out>) at extensions/loggers/file_logger.c:178 #2 logger_log (severity=<value optimized out>, client_cookie=<value optimized out>, fmt=<value optimized out>) at extensions/loggers/file_logger.c:280 #3 0x00007f8dfa0c373e in LOG (severity=EXTENSION_LOG_WARNING, fmt=0x7f8dfa14f48c "%s") at src/ep_engine.cc:1197 #4 0x00007f8dfa0c1e2f in PersistenceCallback::callback(std::pair<int, long>&) () from /opt/couchbase/lib/memcached/ep.so #5 0x00007f8dfa1227b4 in CouchKVStore::commitCallback (this=0x6d64c00, committedReqs=<value optimized out>, numReqs=13023, errCode=COUCHSTORE_ERROR_WRITE) at src/couch-kvstore/couch-kvstore.cc:1590 #6 0x00007f8dfa1242c6 in CouchKVStore::commit2couchstore (this=0x6d64c00) at src/couch-kvstore/couch-kvstore.cc:1417 #7 0x00007f8dfa12449a in CouchKVStore::commit (this=0x6d64c00) at src/couch-kvstore/couch-kvstore.cc:806 #8 0x00007f8dfa0b5de6 in EventuallyPersistentStore::flushVBucket (this=0x6d4e000, vbid=570) at src/ep.cc:1920 #9 0x00007f8dfa0decd9 in doFlush (this=0x2486f00, tid=8) at src/flusher.cc:222 #10 Flusher::step (this=0x2486f00, tid=8) at src/flusher.cc:152 #11 0x00007f8dfa0ed731 in ExecutorThread::run (this=0x6d8fba0) at src/scheduler.cc:153 #12 0x00007f8dfa0ede4d in launch_executor_thread (arg=<value optimized out>) at src/scheduler.cc:34 #13 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0 #14 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f8e0139b720 (LWP 3925)): #0 0x00007f8dfff7097d in nanosleep () from /lib64/libc.so.6 #1 0x00007f8dfffa5b34 in usleep () from /lib64/libc.so.6 #2 0x00007f8dfa0ddffa in Flusher::wait (this=0x2487040) at src/flusher.cc:32 #3 0x00007f8dfa0af5aa in stopFlusher (this=0x6d4e000, __in_chrg=<value optimized out>) at src/ep.cc:400 #4 EventuallyPersistentStore::~EventuallyPersistentStore (this=0x6d4e000, __in_chrg=<value optimized out>) at src/ep.cc:354 #5 0x00007f8dfa0daecf in EventuallyPersistentEngine::~EventuallyPersistentEngine() () from /opt/couchbase/lib/memcached/ep.so #6 0x00007f8dfa0cbed5 in EvpDestroy (handle=0x6d3c000, force=false) at src/ep_engine.cc:133 #7 0x00007f8dfdf87bb5 in bucket_shutdown_engine (key=<value optimized out>, nkey=<value optimized out>, val=0x6cea2a0, nval=<value optimized out>, args=<value optimized out>) at bucket_engine.c:1290 #8 0x00007f8dfdf8e68c in genhash_iter (h=0x6cee000, iterfunc=0x7f8dfdf87b80 <bucket_shutdown_engine>, arg=0x0) at genhash.c:275 #9 0x00007f8dfdf88f46 in bucket_destroy (handle=0x7f8dfe191660, force=<value optimized out>) at bucket_engine.c:1327 #10 0x000000000040980b in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7935 Thread 3 (Thread 0x7f8dfeb92700 (LWP 3933)): #0 0x00007f8dfff9f43d in write () from /lib64/libc.so.6 #1 0x00007f8dfff36033 in _IO_new_file_write () from /lib64/libc.so.6 #2 0x00007f8dfff35efa in _IO_new_file_xsputn () from /lib64/libc.so.6 #3 0x00007f8dfff2c0bd in fwrite () from /lib64/libc.so.6 #4 0x00007f8dfeb963e7 in flush_pending_io (arg=0x247c040) at extensions/loggers/file_logger.c:319 #5 logger_thead_main (arg=0x247c040) at extensions/loggers/file_logger.c:357 #6 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f8df34f6700 (LWP 21389)): #0 0x00007f8e002627bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f8dfa0a91c8 in wait (this=0x6cd82d0, d=...) at src/syncobject.hh:57 #2 IdleTask::run (this=0x6cd82d0, d=...) at src/dispatcher.cc:342 #3 0x00007f8dfa0abd2a in Dispatcher::run (this=0x6d20a80) at src/dispatcher.cc:184 #4 0x00007f8dfa0ac4ed in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28 #5 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f8df3ef7700 (LWP 21388)): #0 0x00007f8dffef78a5 in raise () from /lib64/libc.so.6 #1 0x00007f8dffef9085 in abort () from /lib64/libc.so.6 #2 0x00007f8dffef0a1e in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007f8dffef0ae0 in __assert_fail () from /lib64/libc.so.6 #4 0x00007f8dfa115dd1 in writeFully (this=0x39a82000) at src/mutation_log.cc:46 #5 MutationLog::writeInitialBlock (this=0x39a82000) at src/mutation_log.cc:163 #6 0x00007f8dfa117fe4 in MutationLog::open (this=0x39a82000, _readOnly=<value optimized out>) at src/mutation_log.cc:273 #7 0x00007f8dfa09999b in ItemAccessVisitor::ItemAccessVisitor (this=0x6cd8c60, _store=<value optimized out>, _stats=<value optimized out>, sfin=<value optimized out>) at src/access_scanner.cc:22 #8 0x00007f8dfa09747a in AccessScanner::callback (this=0x2489ec0, d=..., t=...) at src/access_scanner.cc:108 #9 0x00007f8dfa0abd2a in Dispatcher::run (this=0x6d20c40) at src/dispatcher.cc:184 #10 0x00007f8dfa0ac4ed in launch_dispatcher_thread (arg=<value optimized out>) at src/dispatcher.cc:28 #11 0x00007f8e0025e851 in start_thread () from /lib64/libpthread.so.0 #12 0x00007f8dfffac6dd in clone () from /lib64/libc.so.6 Crash on thread-1 attachments: 10_6_2_68.zip : cbcollect_info logs from the node
        Hide
        wayne Wayne Siu added a comment -

        Assigning to 2.1.0 so that it will be included in the current scrub meeting.

        Show
        wayne Wayne Siu added a comment - Assigning to 2.1.0 so that it will be included in the current scrub meeting.
        Hide
        maria Maria McDuff (Inactive) added a comment -

        abhinav and alk are both working together in investigating this issue.

        Show
        maria Maria McDuff (Inactive) added a comment - abhinav and alk are both working together in investigating this issue.
        Hide
        abhinav Abhinav Dangeti added a comment -

        With swappiness set at 0, swap usage was reduced by a considerable amount as when compared to when swappiness was at 10. Swap usage averaged at around 1G per node, during rebalance operations, and access phase.

        Show
        abhinav Abhinav Dangeti added a comment - With swappiness set at 0, swap usage was reduced by a considerable amount as when compared to when swappiness was at 10. Swap usage averaged at around 1G per node, during rebalance operations, and access phase.
        Hide
        maria Maria McDuff (Inactive) added a comment -

        Abhinav,

        pls provide exact steps (to karen) on how to set swappiness to 0, does it require restart? or any other settings? Pass it on to Karen Z for documentation.

        Show
        maria Maria McDuff (Inactive) added a comment - Abhinav, pls provide exact steps (to karen) on how to set swappiness to 0, does it require restart? or any other settings? Pass it on to Karen Z for documentation.
        Hide
        abhinav Abhinav Dangeti added a comment -

        Setting swappiness (to zero) in linux:
        sudo sysctl vm.swappiness=0

        To see what the swappiness of the system is: either sysctl vm.swappiness or cat /proc/sys/vm/swappiness.

        I think it would be wise to note how the performance is affected with this setting.
        I'll try and set up a heavy dgm - rebalance run with and without this setting at zero, to note the effect on performance.

        Show
        abhinav Abhinav Dangeti added a comment - Setting swappiness (to zero) in linux: sudo sysctl vm.swappiness=0 To see what the swappiness of the system is: either sysctl vm.swappiness or cat /proc/sys/vm/swappiness. I think it would be wise to note how the performance is affected with this setting. I'll try and set up a heavy dgm - rebalance run with and without this setting at zero, to note the effect on performance.
        Hide
        abhinav Abhinav Dangeti added a comment -

        No significant performance regression in rebalance times with swappiness at 0 and swappiness at 10. (on build 2.1.0-715)
        swappiness=10 :: http://qa.hq.northscale.net/job/litmuses-graph-loop/246/ ..rebalance time: 437s
        swappiness=0 :: http://qa.hq.northscale.net/job/litmuses-graph-loop/245/ ..rebalance time: 461s

        So, I guess the recommendation would be to have swappiness set at zero, whenever there be a views scenario with/without involving topology changes.

        Show
        abhinav Abhinav Dangeti added a comment - No significant performance regression in rebalance times with swappiness at 0 and swappiness at 10. (on build 2.1.0-715) swappiness=10 :: http://qa.hq.northscale.net/job/litmuses-graph-loop/246/ ..rebalance time: 437s swappiness=0 :: http://qa.hq.northscale.net/job/litmuses-graph-loop/245/ ..rebalance time: 461s So, I guess the recommendation would be to have swappiness set at zero, whenever there be a views scenario with/without involving topology changes.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        hm. But sysctl or echo to /proc/sys/vm/swappiness will not hold over reboot.

        On both rhel5 and my Debian box I'm seeing /etc/sysctl.conf being used at boot time to set all sorts of sysctl variables.

        We should recommend users editing this file.

        I.e. just appending:

        vm.swappiness = 0

        (and reboot) should be enough on both all kinds of ubuntus and all kinds of rhels.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - hm. But sysctl or echo to /proc/sys/vm/swappiness will not hold over reboot. On both rhel5 and my Debian box I'm seeing /etc/sysctl.conf being used at boot time to set all sorts of sysctl variables. We should recommend users editing this file. I.e. just appending: vm.swappiness = 0 (and reboot) should be enough on both all kinds of ubuntus and all kinds of rhels.
        Hide
        kzeller kzeller added a comment -

        <rnentry>

        <version ver="2.1.0a"/>

        <version ver="2.0.0l"/>

        <version ver="2.0.0k"/>

        <class id="cluster"/>

        <issue type="cb" ref="MB-7737"/>
        <issue type="cb" ref="MB-7774"/>

        <rntext>

        <para>
        By default most Linux systems have swappiness set to 60. This will lead to overuse of disk swap with
        Couchbase Server. Please follow our current recommendations on swappiness and swap space, see
        <xref linkend="couchbase-bestpractice-cloud-swap"/>.
        </para>

        </rntext>

        </rnentry>

        Show
        kzeller kzeller added a comment - <rnentry> <version ver="2.1.0a"/> <version ver="2.0.0l"/> <version ver="2.0.0k"/> <class id="cluster"/> <issue type="cb" ref=" MB-7737 "/> <issue type="cb" ref=" MB-7774 "/> <rntext> <para> By default most Linux systems have swappiness set to 60. This will lead to overuse of disk swap with Couchbase Server. Please follow our current recommendations on swappiness and swap space, see <xref linkend="couchbase-bestpractice-cloud-swap"/>. </para> </rntext> </rnentry>
        Hide
        kzeller kzeller added a comment -

        <rnentry>

        <version ver="2.1.0a"/>

        <version ver="2.0.0l"/>

        <version ver="2.0.0k"/>

        <class id="cluster"/>

        <issue type="cb" ref="MB-7737"/>
        <issue type="cb" ref="MB-7774"/>

        <rntext>

        <para>
        By default most Linux systems have swappiness set to 60. This will lead to overuse of disk swap with
        Couchbase Server. Please follow our current recommendations on swappiness and swap space, see
        <xref linkend="couchbase-bestpractice-cloud-swap"/>.
        </para>

        </rntext>

        </rnentry>

        Show
        kzeller kzeller added a comment - <rnentry> <version ver="2.1.0a"/> <version ver="2.0.0l"/> <version ver="2.0.0k"/> <class id="cluster"/> <issue type="cb" ref=" MB-7737 "/> <issue type="cb" ref=" MB-7774 "/> <rntext> <para> By default most Linux systems have swappiness set to 60. This will lead to overuse of disk swap with Couchbase Server. Please follow our current recommendations on swappiness and swap space, see <xref linkend="couchbase-bestpractice-cloud-swap"/>. </para> </rntext> </rnentry>

          People

          • Assignee:
            kzeller kzeller
            Reporter:
            abhinav Abhinav Dangeti
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes