Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-2478

membase performance drops significantly

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • 1.6.0 GA
    • 1.6.0 beta3
    • couchbase-bucket
    • None
    • Operating System: All
      Platform: All

    Description

      Per email threads:

      The following is the top output for memcachetest when ops drops to almost 0:

      top - 00:33:54 up 3 days, 8:42, 2 users, load average: 0.01, 0.19, 0.47
      Tasks: 131 total, 1 running, 130 sleeping, 0 stopped, 0 zombie
      Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
      Mem: 4009308k total, 1514640k used, 2494668k free, 147244k buffers
      Swap: 6062072k total, 160k used, 6061912k free, 988872k cached

      PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
      11863 root 25 0 875m 108m 680 S 0.7 2.8 2:44.98 memcachetest

      After that, I just restarted memcachetest process and the top output was as follows:

      top - 17:41:46 up 3 days, 7:31, 5 users, load average: 2.76, 2.44, 2.58
      Tasks: 175 total, 2 running, 173 sleeping, 0 stopped, 0 zombie
      Cpu(s): 8.7%us, 6.9%sy, 0.0%ni, 56.7%id, 23.0%wa, 0.3%hi, 4.5%si, 0.0%st
      Mem: 4009308k total, 3608104k used, 401204k free, 78708k buffers
      Swap: 3068372k total, 473788k used, 2594584k free, 1469060k cached

      PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
      21906 membase 15 0 1953m 1.6g 2492 S 62.8 42.9 29:03.86 memcached

      top output on the data server (10.2.1.51):

      top - 17:46:24 up 3 days, 7:35, 5 users, load average: 3.59, 3.03, 2.78
      Tasks: 175 total, 1 running, 174 sleeping, 0 stopped, 0 zombie
      Cpu(s): 8.7%us, 6.8%sy, 0.0%ni, 56.4%id, 23.3%wa, 0.4%hi, 4.4%si, 0.0%st
      Mem: 4009308k total, 3612320k used, 396988k free, 79740k buffers
      Swap: 3068372k total, 473788k used, 2594584k free, 1477096k cached

      PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
      21906 membase 15 0 1953m 1.6g 2492 S 65.5 42.9 31:41.11 memcached

      After some time, the memcachetest CPU and memory usage dropped significantly again that resulted in dropping ops/sec to near 0.

      Chiyoung

      On Oct 7, 2010, at 12:28 AM, Dustin Sallings wrote:

      >
      > On Oct 7, 2010, at 0:16, Chiyoung Seo wrote:
      >
      >> Looks like memcachetest has some issues when it's running for a long time. I was able to reproduce the same problem on 10.2.1.51, and after I restarted memcachetest process, ops per sec returns to the normal range for the two node cluster.
      >
      >
      > Was RSS too large? It was clearly hunting for memory.
      >
      > –
      > Dustin Sallings
      >

      We got to a state after few rebalancing where 10.2.1.51 is processing events very slow (1K qps), using the same client command that use to process 10K qps.

      Restarting moxi didn’t help.
      Dustin, can you loging to 0.2.1.51 and see if you can understand what it is doing?

      UI password is password (http://10.2.1.51:8091)

      Membase pstack:
      [root@localhost membase]# pstack 11910
      Thread 14 (Thread 0x420a9940 (LWP 11916)):
      #0 0x0000003dc30c5f3b in read () from /lib64/libc.so.6
      #1 0x0000003dc306cc07 in _IO_new_file_underflow () from /lib64/libc.so.6
      #2 0x0000003dc306d5ce in _IO_default_uflow_internal () from /lib64/libc.so.6
      #3 0x0000003dc3068e8b in getc () from /lib64/libc.so.6
      #4 0x00002ba3f9f4b898 in check_stdin_thread ()
      #5 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
      #6 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
      Thread 13 (Thread 0x41511940 (LWP 11917)):
      #0 0x0000003dc309a0b1 in nanosleep () from /lib64/libc.so.6
      #1 0x0000003dc3099ed4 in sleep () from /lib64/libc.so.6
      #2 0x000000000041396b in check_isasl_db_thread ()
      #3 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
      #4 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
      Thread 12 (Thread 0x42aaa940 (LWP 11918)):
      #0 0x0000003dc30df9ee in __lll_lock_wait_private () from /lib64/libc.so.6
      #1 0x0000003dc3076d31 in _L_lock_5478 () from /lib64/libc.so.6
      #2 0x0000003dc3071c8d in _int_free () from /lib64/libc.so.6
      #3 0x0000003dc307273b in free () from /lib64/libc.so.6
      #4 0x0000000000412b9d in topkeys_item_get_or_create ()
      #5 0x000000000040f557 in complete_nread_binary ()
      #6 0x000000000041033f in conn_nread ()
      #7 0x0000000000408268 in event_handler ()
      #8 0x000000000041678a in event_base_loop ()
      #9 0x0000000000412324 in worker_libevent ()
      #10 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
      #11 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
      Thread 11 (Thread 0x434ab940 (LWP 11919)):
      #0 0x0000003dc3c0d174 in __lll_lock_wait () from /lib64/libpthread.so.0
      #1 0x0000003dc3c08aca in _L_lock_1034 () from /lib64/libpthread.so.0
      #2 0x0000003dc3c0898c in pthread_mutex_lock () from /lib64/libpthread.so.0
      #3 0x000000000040f539 in complete_nread_binary ()
      #4 0x000000000041033f in conn_nread ()
      #5 0x0000000000408268 in event_handler ()
      #6 0x000000000041678a in event_base_loop ()
      #7 0x0000000000412324 in worker_libevent ()
      #8 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
      #9 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
      Thread 10 (Thread 0x43eac940 (LWP 11920)):
      #0 0x0000003dc307432a in _int_realloc () from /lib64/libc.so.6
      #1 0x0000003dc30751a2 in realloc () from /lib64/libc.so.6
      #2 0x0000000000404680 in grow_dynamic_buffer ()
      #3 0x0000000000404ad4 in append_stats ()
      #4 0x00000000004067c6 in append_stat ()
      #5 0x00000000004070db in server_stats ()
      #6 0x000000000040adad in process_bin_stat ()
      #7 0x000000000040f275 in complete_nread_binary ()
      #8 0x000000000041033f in conn_nread ()
      #9 0x0000000000408268 in event_handler ()
      #10 0x000000000041678a in event_base_loop ()
      #11 0x0000000000412324 in worker_libevent ()
      #12 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
      #13 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
      Thread 9 (Thread 0x448ad940 (LWP 11921)):
      #0 0x0000003dc3c0d174 in __lll_lock_wait () from /lib64/libpthread.so.0
      #1 0x0000003dc3c08aca in _L_lock_1034 () from /lib64/libpthread.so.0
      #2 0x0000003dc3c0898c in pthread_mutex_lock () from /lib64/libpthread.so.0
      #3 0x000000000040f539 in complete_nread_binary ()
      #4 0x000000000041033f in conn_nread ()
      #5 0x0000000000408268 in event_handler ()
      #6 0x000000000041678a in event_base_loop ()
      #7 0x0000000000412324 in worker_libevent ()
      #8 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
      #9 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
      Thread 8 (Thread 0x452ae940 (LWP 11922)):
      #0 0x0000003dc30d4018 in epoll_wait () from /lib64/libc.so.6
      #1 0x0000000000418190 in epoll_dispatch ()
      #2 0x00000000004166c1 in event_base_loop ()
      #3 0x0000000000412324 in worker_libevent ()
      #4 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
      #5 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
      Thread 7 (Thread 0x45caf940 (LWP 11932)):
      #0 0x00002aaaaad0a8a1 in StrategicSqlite3::insert () from bin/ep_engine/ep.so
      #1 0x00002aaaaace039f in EventuallyPersistentStore::flushOneDelOrSet ()
      #2 0x00002aaaaace0ab7 in EventuallyPersistentStore::flushOne ()
      #3 0x00002aaaaace0bed in EventuallyPersistentStore::flushSome ()
      #4 0x00002aaaaad040ed in Flusher::doFlush () from bin/ep_engine/ep.so
      #5 0x00002aaaaad04d4a in Flusher::step () from bin/ep_engine/ep.so
      #6 0x00002aaaaad0550e in FlusherStepper::callback () from bin/ep_engine/ep.so
      #7 0x00002aaaaacd9696 in Dispatcher::run () from bin/ep_engine/ep.so
      #8 0x00002aaaaacda033 in launch_dispatcher_thread () from bin/ep_engine/ep.so
      #9 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
      #10 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
      Thread 6 (Thread 0x466b0940 (LWP 11933)):
      #0 0x0000003dc3c0ab99 in pthread_cond_wait@@GLIBC_2.3.2 ()
      #1 0x00002aaaaacd953f in Dispatcher::run () from bin/ep_engine/ep.so
      #2 0x00002aaaaacda033 in launch_dispatcher_thread () from bin/ep_engine/ep.so
      #3 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
      #4 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
      Thread 5 (Thread 0x470b1940 (LWP 11934)):
      #0 0x0000003dc3c0ae00 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
      #1 0x00002aaaaacf60d1 in EventuallyPersistentEngine::notifyTapIoThread ()
      #2 0x00002aaaaacf6229 in EvpNotifyTapIo () from bin/ep_engine/ep.so
      #3 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
      #4 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
      Thread 4 (Thread 0x506c0940 (LWP 21280)):
      #0 0x0000003dc3c0ae00 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
      #1 0x00002aaaaacd99d4 in Dispatcher::run () from bin/ep_engine/ep.so
      #2 0x00002aaaaacda033 in launch_dispatcher_thread () from bin/ep_engine/ep.so
      #3 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
      #4 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
      Thread 3 (Thread 0x48eb4940 (LWP 21281)):
      #0 0x0000003dc3c0ab99 in pthread_cond_wait@@GLIBC_2.3.2 ()
      #1 0x00002aaaaacd953f in Dispatcher::run () from bin/ep_engine/ep.so
      #2 0x00002aaaaacda033 in launch_dispatcher_thread () from bin/ep_engine/ep.so
      #3 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
      #4 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
      Thread 2 (Thread 0x47ab2940 (LWP 21282)):
      #0 0x0000003dc3c0ae00 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
      #1 0x00002aaaaacf60d1 in EventuallyPersistentEngine::notifyTapIoThread ()
      #2 0x00002aaaaacf6229 in EvpNotifyTapIo () from bin/ep_engine/ep.so
      #3 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
      #4 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
      Thread 1 (Thread 0x2ba3f9f4a620 (LWP 11910)):
      #0 0x0000003dc30d4018 in epoll_wait () from /lib64/libc.so.6
      #1 0x0000000000418190 in epoll_dispatch ()
      #2 0x00000000004166c1 in event_base_loop ()
      #3 0x000000000040952b in main ()
      [root@localhost membase]#

      [root@localhost membase]# /opt/membase/bin/ep_engine/management/stats 10.2.1.51:11210 all
      auth_cmds: 3627
      auth_errors: 0
      bucket_conns: 27
      bytes_read: 26126881147
      bytes_written: 23337809435
      cas_badval: 0
      cas_hits: 0
      cas_misses: 0
      cmd_flush: 0
      cmd_get: 10993637
      cmd_set: 8731858
      conn_yields: 45768
      connection_structures: 42
      curr_connections: 38
      curr_items: 1005000
      curr_items_tot: 1005000
      daemon_connections: 10
      decr_hits: 0
      decr_misses: 0
      delete_hits: 0
      delete_misses: 0
      ep_bg_fetched: 30056
      ep_bg_load_avg: 758
      ep_bg_max_load: 336932
      ep_bg_max_wait: 18774150
      ep_bg_min_load: 37
      ep_bg_min_wait: 8
      ep_bg_num_samples: 30056
      ep_bg_wait_avg: 41927
      ep_commit_num: 453
      ep_commit_time: 7
      ep_commit_time_total: 420
      ep_data_age: 469
      ep_data_age_highwat: 1368
      ep_dbinit: 0
      ep_dbname: /opt/membase/1.6.0beta4/data/ns_1/default
      ep_dbshards: 4
      ep_expired: 0
      ep_flush_duration: 742
      ep_flush_duration_highwat: 742
      ep_flush_duration_total: 1548
      ep_flush_preempts: 434
      ep_flusher_state: running
      ep_flusher_todo: 1616722
      ep_io_num_read: 30056
      ep_io_num_write: 896625
      ep_io_read_bytes: 61698623
      ep_io_write_bytes: 1827411512
      ep_item_commit_failed: 0
      ep_item_flush_expired: 0
      ep_item_flush_failed: 0
      ep_kv_size: 2108822960
      ep_max_data_size: 3145728000
      ep_max_txn_size: 50000
      ep_mem_high_wat: 2359296000
      ep_mem_low_wat: 1887436800
      ep_min_data_age: 0
      ep_num_eject_failures: 33262286
      ep_num_expiry_pager_runs: 1
      ep_num_non_resident: 94603
      ep_num_not_my_vbuckets: 9203
      ep_num_pager_runs: 64
      ep_num_value_ejects: 131430
      ep_oom_errors: 0
      ep_overhead: 82314686
      ep_pending_ops: 0
      ep_pending_ops_max: 2
      ep_pending_ops_max_duration: 215315
      ep_pending_ops_total: 71
      ep_queue_age_cap: 900
      ep_queue_size: 281288
      ep_storage_age: 2360
      ep_storage_age_highwat: 2360
      ep_storage_type: featured
      ep_tap_keepalive: 0
      ep_tmp_oom_errors: 55499
      ep_too_old: 225576
      ep_too_young: 0
      ep_total_cache_size: 2691763888
      ep_total_del_items: 0
      ep_total_enqueued: 2822341
      ep_total_new_items: 529285
      ep_total_persisted: 887042
      ep_vbucket_del: 2415
      ep_vbucket_del_fail: 0
      ep_version: 1.6.0beta4_101_g4ebb2dc
      ep_warmed_up: 0
      ep_warmup: true
      ep_warmup_dups: 0
      ep_warmup_oom: 0
      ep_warmup_thread: complete
      ep_warmup_time: 0
      get_hits: 10993637
      get_misses: 0
      incr_hits: 0
      incr_misses: 0
      libevent: 1.4.13-stable
      limit_maxbytes: 67108864
      mem_used: 2191137646
      pid: 11910
      pointer_size: 64
      rejected_conns: 0
      rusage_system: 588.378552
      rusage_user: 713.620513
      tap_connect_received: 2566
      tap_mutation_received: 3688078
      tap_mutation_sent: 3741111
      tap_vbucket_set_received: 2048
      tap_vbucket_set_sent: 5120
      threads: 4
      time: 1286400422
      total_connections: 3654
      uptime: 3945
      version: 1.4.4_298_g250909b
      [root@localhost membase]#

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              dustin@sallings.org Dustin Sallings (Inactive)
              sharon.barr@northscale.com sharon.barr@northscale.com
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty