Details
-
Bug
-
Resolution: Duplicate
-
1.6.0 beta3
-
None
-
Operating System: All
Platform: All
Description
Per email threads:
The following is the top output for memcachetest when ops drops to almost 0:
top - 00:33:54 up 3 days, 8:42, 2 users, load average: 0.01, 0.19, 0.47
Tasks: 131 total, 1 running, 130 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4009308k total, 1514640k used, 2494668k free, 147244k buffers
Swap: 6062072k total, 160k used, 6061912k free, 988872k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11863 root 25 0 875m 108m 680 S 0.7 2.8 2:44.98 memcachetest
After that, I just restarted memcachetest process and the top output was as follows:
top - 17:41:46 up 3 days, 7:31, 5 users, load average: 2.76, 2.44, 2.58
Tasks: 175 total, 2 running, 173 sleeping, 0 stopped, 0 zombie
Cpu(s): 8.7%us, 6.9%sy, 0.0%ni, 56.7%id, 23.0%wa, 0.3%hi, 4.5%si, 0.0%st
Mem: 4009308k total, 3608104k used, 401204k free, 78708k buffers
Swap: 3068372k total, 473788k used, 2594584k free, 1469060k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21906 membase 15 0 1953m 1.6g 2492 S 62.8 42.9 29:03.86 memcached
top output on the data server (10.2.1.51):
top - 17:46:24 up 3 days, 7:35, 5 users, load average: 3.59, 3.03, 2.78
Tasks: 175 total, 1 running, 174 sleeping, 0 stopped, 0 zombie
Cpu(s): 8.7%us, 6.8%sy, 0.0%ni, 56.4%id, 23.3%wa, 0.4%hi, 4.4%si, 0.0%st
Mem: 4009308k total, 3612320k used, 396988k free, 79740k buffers
Swap: 3068372k total, 473788k used, 2594584k free, 1477096k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21906 membase 15 0 1953m 1.6g 2492 S 65.5 42.9 31:41.11 memcached
After some time, the memcachetest CPU and memory usage dropped significantly again that resulted in dropping ops/sec to near 0.
Chiyoung
On Oct 7, 2010, at 12:28 AM, Dustin Sallings wrote:
>
> On Oct 7, 2010, at 0:16, Chiyoung Seo wrote:
>
>> Looks like memcachetest has some issues when it's running for a long time. I was able to reproduce the same problem on 10.2.1.51, and after I restarted memcachetest process, ops per sec returns to the normal range for the two node cluster.
>
>
> Was RSS too large? It was clearly hunting for memory.
>
> –
> Dustin Sallings
>
We got to a state after few rebalancing where 10.2.1.51 is processing events very slow (1K qps), using the same client command that use to process 10K qps.
Restarting moxi didn’t help.
Dustin, can you loging to 0.2.1.51 and see if you can understand what it is doing?
UI password is password (http://10.2.1.51:8091)
Membase pstack:
[root@localhost membase]# pstack 11910
Thread 14 (Thread 0x420a9940 (LWP 11916)):
#0 0x0000003dc30c5f3b in read () from /lib64/libc.so.6
#1 0x0000003dc306cc07 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x0000003dc306d5ce in _IO_default_uflow_internal () from /lib64/libc.so.6
#3 0x0000003dc3068e8b in getc () from /lib64/libc.so.6
#4 0x00002ba3f9f4b898 in check_stdin_thread ()
#5 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
#6 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
Thread 13 (Thread 0x41511940 (LWP 11917)):
#0 0x0000003dc309a0b1 in nanosleep () from /lib64/libc.so.6
#1 0x0000003dc3099ed4 in sleep () from /lib64/libc.so.6
#2 0x000000000041396b in check_isasl_db_thread ()
#3 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
Thread 12 (Thread 0x42aaa940 (LWP 11918)):
#0 0x0000003dc30df9ee in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x0000003dc3076d31 in _L_lock_5478 () from /lib64/libc.so.6
#2 0x0000003dc3071c8d in _int_free () from /lib64/libc.so.6
#3 0x0000003dc307273b in free () from /lib64/libc.so.6
#4 0x0000000000412b9d in topkeys_item_get_or_create ()
#5 0x000000000040f557 in complete_nread_binary ()
#6 0x000000000041033f in conn_nread ()
#7 0x0000000000408268 in event_handler ()
#8 0x000000000041678a in event_base_loop ()
#9 0x0000000000412324 in worker_libevent ()
#10 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
#11 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
Thread 11 (Thread 0x434ab940 (LWP 11919)):
#0 0x0000003dc3c0d174 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x0000003dc3c08aca in _L_lock_1034 () from /lib64/libpthread.so.0
#2 0x0000003dc3c0898c in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000000040f539 in complete_nread_binary ()
#4 0x000000000041033f in conn_nread ()
#5 0x0000000000408268 in event_handler ()
#6 0x000000000041678a in event_base_loop ()
#7 0x0000000000412324 in worker_libevent ()
#8 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
#9 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x43eac940 (LWP 11920)):
#0 0x0000003dc307432a in _int_realloc () from /lib64/libc.so.6
#1 0x0000003dc30751a2 in realloc () from /lib64/libc.so.6
#2 0x0000000000404680 in grow_dynamic_buffer ()
#3 0x0000000000404ad4 in append_stats ()
#4 0x00000000004067c6 in append_stat ()
#5 0x00000000004070db in server_stats ()
#6 0x000000000040adad in process_bin_stat ()
#7 0x000000000040f275 in complete_nread_binary ()
#8 0x000000000041033f in conn_nread ()
#9 0x0000000000408268 in event_handler ()
#10 0x000000000041678a in event_base_loop ()
#11 0x0000000000412324 in worker_libevent ()
#12 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
#13 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x448ad940 (LWP 11921)):
#0 0x0000003dc3c0d174 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x0000003dc3c08aca in _L_lock_1034 () from /lib64/libpthread.so.0
#2 0x0000003dc3c0898c in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000000040f539 in complete_nread_binary ()
#4 0x000000000041033f in conn_nread ()
#5 0x0000000000408268 in event_handler ()
#6 0x000000000041678a in event_base_loop ()
#7 0x0000000000412324 in worker_libevent ()
#8 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
#9 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x452ae940 (LWP 11922)):
#0 0x0000003dc30d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000418190 in epoll_dispatch ()
#2 0x00000000004166c1 in event_base_loop ()
#3 0x0000000000412324 in worker_libevent ()
#4 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
#5 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x45caf940 (LWP 11932)):
#0 0x00002aaaaad0a8a1 in StrategicSqlite3::insert () from bin/ep_engine/ep.so
#1 0x00002aaaaace039f in EventuallyPersistentStore::flushOneDelOrSet ()
#2 0x00002aaaaace0ab7 in EventuallyPersistentStore::flushOne ()
#3 0x00002aaaaace0bed in EventuallyPersistentStore::flushSome ()
#4 0x00002aaaaad040ed in Flusher::doFlush () from bin/ep_engine/ep.so
#5 0x00002aaaaad04d4a in Flusher::step () from bin/ep_engine/ep.so
#6 0x00002aaaaad0550e in FlusherStepper::callback () from bin/ep_engine/ep.so
#7 0x00002aaaaacd9696 in Dispatcher::run () from bin/ep_engine/ep.so
#8 0x00002aaaaacda033 in launch_dispatcher_thread () from bin/ep_engine/ep.so
#9 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x466b0940 (LWP 11933)):
#0 0x0000003dc3c0ab99 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1 0x00002aaaaacd953f in Dispatcher::run () from bin/ep_engine/ep.so
#2 0x00002aaaaacda033 in launch_dispatcher_thread () from bin/ep_engine/ep.so
#3 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x470b1940 (LWP 11934)):
#0 0x0000003dc3c0ae00 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1 0x00002aaaaacf60d1 in EventuallyPersistentEngine::notifyTapIoThread ()
#2 0x00002aaaaacf6229 in EvpNotifyTapIo () from bin/ep_engine/ep.so
#3 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x506c0940 (LWP 21280)):
#0 0x0000003dc3c0ae00 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1 0x00002aaaaacd99d4 in Dispatcher::run () from bin/ep_engine/ep.so
#2 0x00002aaaaacda033 in launch_dispatcher_thread () from bin/ep_engine/ep.so
#3 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x48eb4940 (LWP 21281)):
#0 0x0000003dc3c0ab99 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1 0x00002aaaaacd953f in Dispatcher::run () from bin/ep_engine/ep.so
#2 0x00002aaaaacda033 in launch_dispatcher_thread () from bin/ep_engine/ep.so
#3 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x47ab2940 (LWP 21282)):
#0 0x0000003dc3c0ae00 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1 0x00002aaaaacf60d1 in EventuallyPersistentEngine::notifyTapIoThread ()
#2 0x00002aaaaacf6229 in EvpNotifyTapIo () from bin/ep_engine/ep.so
#3 0x0000003dc3c064a7 in start_thread () from /lib64/libpthread.so.0
#4 0x0000003dc30d3c2d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2ba3f9f4a620 (LWP 11910)):
#0 0x0000003dc30d4018 in epoll_wait () from /lib64/libc.so.6
#1 0x0000000000418190 in epoll_dispatch ()
#2 0x00000000004166c1 in event_base_loop ()
#3 0x000000000040952b in main ()
[root@localhost membase]#
[root@localhost membase]# /opt/membase/bin/ep_engine/management/stats 10.2.1.51:11210 all
auth_cmds: 3627
auth_errors: 0
bucket_conns: 27
bytes_read: 26126881147
bytes_written: 23337809435
cas_badval: 0
cas_hits: 0
cas_misses: 0
cmd_flush: 0
cmd_get: 10993637
cmd_set: 8731858
conn_yields: 45768
connection_structures: 42
curr_connections: 38
curr_items: 1005000
curr_items_tot: 1005000
daemon_connections: 10
decr_hits: 0
decr_misses: 0
delete_hits: 0
delete_misses: 0
ep_bg_fetched: 30056
ep_bg_load_avg: 758
ep_bg_max_load: 336932
ep_bg_max_wait: 18774150
ep_bg_min_load: 37
ep_bg_min_wait: 8
ep_bg_num_samples: 30056
ep_bg_wait_avg: 41927
ep_commit_num: 453
ep_commit_time: 7
ep_commit_time_total: 420
ep_data_age: 469
ep_data_age_highwat: 1368
ep_dbinit: 0
ep_dbname: /opt/membase/1.6.0beta4/data/ns_1/default
ep_dbshards: 4
ep_expired: 0
ep_flush_duration: 742
ep_flush_duration_highwat: 742
ep_flush_duration_total: 1548
ep_flush_preempts: 434
ep_flusher_state: running
ep_flusher_todo: 1616722
ep_io_num_read: 30056
ep_io_num_write: 896625
ep_io_read_bytes: 61698623
ep_io_write_bytes: 1827411512
ep_item_commit_failed: 0
ep_item_flush_expired: 0
ep_item_flush_failed: 0
ep_kv_size: 2108822960
ep_max_data_size: 3145728000
ep_max_txn_size: 50000
ep_mem_high_wat: 2359296000
ep_mem_low_wat: 1887436800
ep_min_data_age: 0
ep_num_eject_failures: 33262286
ep_num_expiry_pager_runs: 1
ep_num_non_resident: 94603
ep_num_not_my_vbuckets: 9203
ep_num_pager_runs: 64
ep_num_value_ejects: 131430
ep_oom_errors: 0
ep_overhead: 82314686
ep_pending_ops: 0
ep_pending_ops_max: 2
ep_pending_ops_max_duration: 215315
ep_pending_ops_total: 71
ep_queue_age_cap: 900
ep_queue_size: 281288
ep_storage_age: 2360
ep_storage_age_highwat: 2360
ep_storage_type: featured
ep_tap_keepalive: 0
ep_tmp_oom_errors: 55499
ep_too_old: 225576
ep_too_young: 0
ep_total_cache_size: 2691763888
ep_total_del_items: 0
ep_total_enqueued: 2822341
ep_total_new_items: 529285
ep_total_persisted: 887042
ep_vbucket_del: 2415
ep_vbucket_del_fail: 0
ep_version: 1.6.0beta4_101_g4ebb2dc
ep_warmed_up: 0
ep_warmup: true
ep_warmup_dups: 0
ep_warmup_oom: 0
ep_warmup_thread: complete
ep_warmup_time: 0
get_hits: 10993637
get_misses: 0
incr_hits: 0
incr_misses: 0
libevent: 1.4.13-stable
limit_maxbytes: 67108864
mem_used: 2191137646
pid: 11910
pointer_size: 64
rejected_conns: 0
rusage_system: 588.378552
rusage_user: 713.620513
tap_connect_received: 2566
tap_mutation_received: 3688078
tap_mutation_sent: 3741111
tap_vbucket_set_received: 2048
tap_vbucket_set_sent: 5120
threads: 4
time: 1286400422
total_connections: 3654
uptime: 3945
version: 1.4.4_298_g250909b
[root@localhost membase]#
Attachments
Issue Links
- duplicates
-
MB-2417 Persistence events do not occur when any vbuckets are pending deletion
- Resolved