Details
-
Bug
-
Resolution: Duplicate
-
1.6.0 beta4
-
None
-
Operating System: All
Platform: All
Description
on osx, 1.6.0beta4-63, while trying to reproduce another bug...
- While under memcachetest load...
./memcachetest -h 127.0.0.1:12001 -i 100000 -c 50000 -l -t 8
- Join together 6 nodes, and hit Rebalance...
- After rebalance is done, in the web ui, the "Persistence write queue size" graph starts slowly coming down, but gets stuck at ~138K. (still stuck after 20 minutes.) That graph is defined as sum of ep_queue_size + ep_flusher_todo.
- The first node appears to have a ep_queue_size of ~138K, with zero-sized queues on all other nodes...
stevenmb:~ steveyen$ ~/dev/ep-engine/management/stats 127.0.0.1:12000 all | grep todo
ep_flusher_todo: 0
stevenmb:~ steveyen$ ~/dev/ep-engine/management/stats 127.0.0.1:12002 all | grep todo
ep_flusher_todo: 0
stevenmb:~ steveyen$ ~/dev/ep-engine/management/stats 127.0.0.1:12004 all | grep todo
ep_flusher_todo: 0
stevenmb:~ steveyen$ ~/dev/ep-engine/management/stats 127.0.0.1:12008 all | grep todo
ep_flusher_todo: 0
stevenmb:~ steveyen$ ~/dev/ep-engine/management/stats 127.0.0.1:12010 all | grep todo
ep_flusher_todo: 0
stevenmb:~ steveyen$ ~/dev/ep-engine/management/stats 127.0.0.1:12000 all | grep queue_size
ep_queue_size: 138128
stevenmb:~ steveyen$ ~/dev/ep-engine/management/stats 127.0.0.1:12002 all | grep queue_size
ep_queue_size: 0
stevenmb:~ steveyen$ ~/dev/ep-engine/management/stats 127.0.0.1:12004 all | grep queue_size
ep_queue_size: 0
stevenmb:~ steveyen$ ~/dev/ep-engine/management/stats 127.0.0.1:12006 all | grep queue_size
ep_queue_size: 0
stevenmb:~ steveyen$ ~/dev/ep-engine/management/stats 127.0.0.1:12008 all | grep queue_size
ep_queue_size: 0
stevenmb:~ steveyen$ ~/dev/ep-engine/management/stats 127.0.0.1:12010 all | grep queue_size
ep_queue_size: 0
--------------
Also, two stack traces for memcached separated by 20 minutes or so...
(gdb) thread apply all bt
Thread 11 (process 91693):
#0 0x00007fff841de316 in __semwait_signal ()
#1 0x00007fff841e2131 in _pthread_cond_wait ()
#2 0x0000000101021f31 in EventuallyPersistentEngine::notifyTapIoThread (this=0x10080dc00) at syncobject.hh:42
#3 0x0000000101022099 in EvpNotifyTapIo (arg=<value temporarily unavailable, due to optimizations>) at ep_engine.cc:631
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 10 (process 91693):
#0 0x00007fff841de316 in __semwait_signal ()
#1 0x00007fff841e2131 in _pthread_cond_wait ()
#2 0x0000000101002293 in SyncObject::wait () at /Users/steveyen/work/northscale/dev/wallace_master/components/tmp/repo/ep-engine/syncobject.hh:57
#3 0x0000000101002293 in Dispatcher::run (this=0x1001137a0) at dispatcher.cc:57
#4 0x0000000101002ce5 in launch_dispatcher_thread (arg=<value temporarily unavailable, due to optimizations>) at dispatcher.cc:26
#5 0x00007fff841dc8b6 in _pthread_start ()
#6 0x00007fff841dc769 in thread_start ()
Thread 9 (process 91693):
#0 0x00007fff841add46 in read ()
#1 0x0000000101050edf in unixRead () at atomicity.h:51
#2 0x0000000101063df6 in readDbPage () at atomicity.h:51
#3 0x00000001010666de in sqlite3PagerAcquire () at atomicity.h:51
#4 0x00000001010712e6 in moveToChild () at atomicity.h:51
#5 0x000000010108b8a5 in sqlite3BtreeNext () at atomicity.h:51
#6 0x00000001010dac07 in sqlite3VdbeExec () at atomicity.h:51
#7 0x00000001010de896 in sqlite3_step () at atomicity.h:51
#8 0x00000001010343c9 in PreparedStatement::execute (this=0x10010d990) at sqlite-pst.cc:47
#9 0x0000000101032a76 in StrategicSqlite3::delVBucket (this=0x100109e20, vbucket=184) at sqlite-kvstore.cc:152
#10 0x000000010100c5b6 in EventuallyPersistentStore::completeVBucketDeletion (this=0x100863000, vbid=184) at ep.cc:457
#11 0x00000001010156d1 in VBucketDeletionCallback::callback (this=<value temporarily unavailable, due to optimizations>, d=<value temporarily unavailable, due to optimizations>, t=<value temporarily unavailable, due to optimizations>) at ep.cc:160
#12 0x0000000101002405 in Dispatcher::run (this=0x1001143c0) at dispatcher.hh:87
#13 0x0000000101002ce5 in launch_dispatcher_thread (arg=<value temporarily unavailable, due to optimizations>) at dispatcher.cc:26
#14 0x00007fff841dc8b6 in _pthread_start ()
#15 0x00007fff841dc769 in thread_start ()
Thread 8 (process 91693):
#0 0x00007fff841bc4ea in kevent ()
#1 0x0000000100020df7 in kq_dispatch (base=0x100102b90, tv=<value temporarily unavailable, due to optimizations>) at kqueue.c:261
#2 0x000000010001b0b0 in event_base_loop (base=0x100102b90, flags=<value temporarily unavailable, due to optimizations>) at event.c:1536
#3 0x0000000100012631 in worker_libevent (arg=0x10002ffc0) at thread.c:252
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 7 (process 91693):
#0 0x00007fff841bc4ea in kevent ()
#1 0x0000000100020df7 in kq_dispatch (base=0x100102420, tv=<value temporarily unavailable, due to optimizations>) at kqueue.c:261
#2 0x000000010001b0b0 in event_base_loop (base=0x100102420, flags=<value temporarily unavailable, due to optimizations>) at event.c:1536
#3 0x0000000100012631 in worker_libevent (arg=0x10082d148) at thread.c:252
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 6 (process 91693):
#0 0x00007fff841bc4ea in kevent ()
#1 0x0000000100020df7 in kq_dispatch (base=0x100101cb0, tv=<value temporarily unavailable, due to optimizations>) at kqueue.c:261
#2 0x000000010001b0b0 in event_base_loop (base=0x100101cb0, flags=<value temporarily unavailable, due to optimizations>) at event.c:1536
#3 0x0000000100012631 in worker_libevent (arg=0x10082d030) at thread.c:252
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 5 (process 91693):
#0 0x00007fff841bc4ea in kevent ()
#1 0x0000000100020df7 in kq_dispatch (base=0x100101540, tv=<value temporarily unavailable, due to optimizations>) at kqueue.c:261
#2 0x000000010001b0b0 in event_base_loop (base=0x100101540, flags=<value temporarily unavailable, due to optimizations>) at event.c:1536
#3 0x0000000100012631 in worker_libevent (arg=0x10082cf18) at thread.c:252
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 4 (process 91693):
#0 0x00007fff841bc4ea in kevent ()
#1 0x0000000100020df7 in kq_dispatch (base=0x100100dd0, tv=<value temporarily unavailable, due to optimizations>) at kqueue.c:261
#2 0x000000010001b0b0 in event_base_loop (base=0x100100dd0, flags=<value temporarily unavailable, due to optimizations>) at event.c:1536
#3 0x0000000100012631 in worker_libevent (arg=0x10082ce00) at thread.c:252
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 3 (process 91693):
#0 0x00007fff841de316 in __semwait_signal ()
#1 0x00007fff841de1a5 in nanosleep ()
#2 0x00007fff8422b0a4 in sleep ()
#3 0x00000001000140e8 in check_isasl_db_thread (arg=<value temporarily unavailable, due to optimizations>) at isasl.c:215
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 2 (process 91693):
#0 0x00007fff841a5362 in read$NOCANCEL ()
#1 0x00007fff841f48be in __sread ()
#2 0x00007fff841ec604 in _sread ()
#3 0x00007fff841ec49a in __srefill ()
#4 0x00007fff841f765a in __srget ()
#5 0x00007fff841f7616 in getc ()
#6 0x0000000100062dd8 in check_stdin_thread (arg=0x100001ad0) at stdin_check.c:18
#7 0x00007fff841dc8b6 in _pthread_start ()
#8 0x00007fff841dc769 in thread_start ()
Thread 1 (process 91693):
#0 0x00007fff841bc4ea in kevent ()
#1 0x0000000100020df7 in kq_dispatch (base=0x100100300, tv=<value temporarily unavailable, due to optimizations>) at kqueue.c:261
#2 0x000000010001b0b0 in event_base_loop (base=0x100100300, flags=<value temporarily unavailable, due to optimizations>) at event.c:1536
#3 0x00000001000057d9 in main (argc=<value temporarily unavailable, due to optimizations>, argv=<value temporarily unavailable, due to optimizations>) at memcached.c:6948
#4 0x0000000100001054 in start ()
-------------------
(gdb) thread apply all bt
Thread 11 (process 91693):
#0 0x00007fff841de316 in __semwait_signal ()
#1 0x00007fff841e2131 in _pthread_cond_wait ()
#2 0x0000000101021f31 in EventuallyPersistentEngine::notifyTapIoThread (this=0x10080dc00) at syncobject.hh:42
#3 0x0000000101022099 in EvpNotifyTapIo (arg=<value temporarily unavailable, due to optimizations>) at ep_engine.cc:631
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 10 (process 91693):
#0 0x00007fff841de316 in __semwait_signal ()
#1 0x00007fff841e2131 in _pthread_cond_wait ()
#2 0x0000000101002293 in SyncObject::wait () at /Users/steveyen/work/northscale/dev/wallace_master/components/tmp/repo/ep-engine/syncobject.hh:57
#3 0x0000000101002293 in Dispatcher::run (this=0x1001137a0) at dispatcher.cc:57
#4 0x0000000101002ce5 in launch_dispatcher_thread (arg=<value temporarily unavailable, due to optimizations>) at dispatcher.cc:26
#5 0x00007fff841dc8b6 in _pthread_start ()
#6 0x00007fff841dc769 in thread_start ()
Thread 9 (process 91693):
#0 0x00007fff841add46 in read ()
#1 0x0000000101050edf in unixRead () at atomicity.h:51
#2 0x0000000101063df6 in readDbPage () at atomicity.h:51
#3 0x00000001010666de in sqlite3PagerAcquire () at atomicity.h:51
#4 0x00000001010712e6 in moveToChild () at atomicity.h:51
#5 0x000000010108b8a5 in sqlite3BtreeNext () at atomicity.h:51
#6 0x00000001010dac07 in sqlite3VdbeExec () at atomicity.h:51
#7 0x00000001010de896 in sqlite3_step () at atomicity.h:51
#8 0x00000001010343c9 in PreparedStatement::execute (this=0x10010d990) at sqlite-pst.cc:47
#9 0x0000000101032a76 in StrategicSqlite3::delVBucket (this=0x100109e20, vbucket=724) at sqlite-kvstore.cc:152
#10 0x000000010100c5b6 in EventuallyPersistentStore::completeVBucketDeletion (this=0x100863000, vbid=724) at ep.cc:457
#11 0x00000001010156d1 in VBucketDeletionCallback::callback (this=<value temporarily unavailable, due to optimizations>, d=<value temporarily unavailable, due to optimizations>, t=<value temporarily unavailable, due to optimizations>) at ep.cc:160
#12 0x0000000101002405 in Dispatcher::run (this=0x1001143c0) at dispatcher.hh:87
#13 0x0000000101002ce5 in launch_dispatcher_thread (arg=<value temporarily unavailable, due to optimizations>) at dispatcher.cc:26
#14 0x00007fff841dc8b6 in _pthread_start ()
#15 0x00007fff841dc769 in thread_start ()
Thread 8 (process 91693):
#0 0x00007fff841bc4ea in kevent ()
#1 0x0000000100020df7 in kq_dispatch (base=0x100102b90, tv=<value temporarily unavailable, due to optimizations>) at kqueue.c:261
#2 0x000000010001b0b0 in event_base_loop (base=0x100102b90, flags=<value temporarily unavailable, due to optimizations>) at event.c:1536
#3 0x0000000100012631 in worker_libevent (arg=0x10002ffc0) at thread.c:252
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 7 (process 91693):
#0 0x00007fff841bc4ea in kevent ()
#1 0x0000000100020df7 in kq_dispatch (base=0x100102420, tv=<value temporarily unavailable, due to optimizations>) at kqueue.c:261
#2 0x000000010001b0b0 in event_base_loop (base=0x100102420, flags=<value temporarily unavailable, due to optimizations>) at event.c:1536
#3 0x0000000100012631 in worker_libevent (arg=0x10082d148) at thread.c:252
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 6 (process 91693):
#0 0x00007fff841bc4ea in kevent ()
#1 0x0000000100020df7 in kq_dispatch (base=0x100101cb0, tv=<value temporarily unavailable, due to optimizations>) at kqueue.c:261
#2 0x000000010001b0b0 in event_base_loop (base=0x100101cb0, flags=<value temporarily unavailable, due to optimizations>) at event.c:1536
#3 0x0000000100012631 in worker_libevent (arg=0x10082d030) at thread.c:252
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 5 (process 91693):
#0 0x00007fff841bc4ea in kevent ()
#1 0x0000000100020df7 in kq_dispatch (base=0x100101540, tv=<value temporarily unavailable, due to optimizations>) at kqueue.c:261
#2 0x000000010001b0b0 in event_base_loop (base=0x100101540, flags=<value temporarily unavailable, due to optimizations>) at event.c:1536
#3 0x0000000100012631 in worker_libevent (arg=0x10082cf18) at thread.c:252
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 4 (process 91693):
#0 0x00007fff841bc4ea in kevent ()
#1 0x0000000100020df7 in kq_dispatch (base=0x100100dd0, tv=<value temporarily unavailable, due to optimizations>) at kqueue.c:261
#2 0x000000010001b0b0 in event_base_loop (base=0x100100dd0, flags=<value temporarily unavailable, due to optimizations>) at event.c:1536
#3 0x0000000100012631 in worker_libevent (arg=0x10082ce00) at thread.c:252
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 3 (process 91693):
#0 0x00007fff841de316 in __semwait_signal ()
#1 0x00007fff841de1a5 in nanosleep ()
#2 0x00007fff8422b0a4 in sleep ()
#3 0x00000001000140e8 in check_isasl_db_thread (arg=<value temporarily unavailable, due to optimizations>) at isasl.c:215
#4 0x00007fff841dc8b6 in _pthread_start ()
#5 0x00007fff841dc769 in thread_start ()
Thread 2 (process 91693):
#0 0x00007fff841a5362 in read$NOCANCEL ()
#1 0x00007fff841f48be in __sread ()
#2 0x00007fff841ec604 in _sread ()
#3 0x00007fff841ec49a in __srefill ()
#4 0x00007fff841f765a in __srget ()
#5 0x00007fff841f7616 in getc ()
#6 0x0000000100062dd8 in check_stdin_thread (arg=0x100001ad0) at stdin_check.c:18
#7 0x00007fff841dc8b6 in _pthread_start ()
#8 0x00007fff841dc769 in thread_start ()
Thread 1 (process 91693):
#0 0x00007fff841bc4ea in kevent ()
#1 0x0000000100020df7 in kq_dispatch (base=0x100100300, tv=<value temporarily unavailable, due to optimizations>) at kqueue.c:261
#2 0x000000010001b0b0 in event_base_loop (base=0x100100300, flags=<value temporarily unavailable, due to optimizations>) at event.c:1536
#3 0x00000001000057d9 in main (argc=<value temporarily unavailable, due to optimizations>, argv=<value temporarily unavailable, due to optimizations>) at memcached.c:6948
#4 0x0000000100001054 in start ()
--------------
and, stats all from the first node...
$ ~/dev/ep-engine/management/stats 127.0.0.1:12000 all auth_cmds: 877
auth_errors: 0
bucket_conns: 21
bytes_read: 1879088598
bytes_written: 1400424716
cas_badval: 0
cas_hits: 0
cas_misses: 0
cmd_flush: 0
cmd_get: 640549
cmd_set: 846433
conn_yields: 563
connection_structures: 57
curr_connections: 57
curr_items: 38371
curr_items_tot: 76499
daemon_connections: 10
decr_hits: 0
decr_misses: 0
delete_hits: 0
delete_misses: 0
ep_bg_fetched: 0
ep_commit_num: 14
ep_commit_time: 6
ep_commit_time_total: 47
ep_data_age: 100
ep_data_age_highwat: 114
ep_dbinit: 0
ep_dbname: /Users/steveyen/work/northscale/dev/ns_server/data/n_0/default
ep_dbshards: 4
ep_expired: 0
ep_flush_duration: 94
ep_flush_duration_highwat: 94
ep_flush_duration_total: 202
ep_flush_preempts: 0
ep_flusher_state: running
ep_flusher_todo: 0
ep_io_num_read: 47070
ep_io_num_write: 360985
ep_io_read_bytes: 96944547
ep_io_write_bytes: 741345244
ep_item_commit_failed: 0
ep_item_flush_expired: 0
ep_item_flush_failed: 0
ep_kv_size: 147564333
ep_max_data_size: 87381333
ep_max_txn_size: 50000
ep_mem_high_wat: 65535999
ep_mem_low_wat: 52428799
ep_min_data_age: 0
ep_num_eject_failures: 275065
ep_num_expiry_pager_runs: 0
ep_num_non_resident: 41497
ep_num_not_my_vbuckets: 6658
ep_num_pager_runs: 4
ep_num_value_ejects: 80495
ep_oom_errors: 0
ep_overhead: 12652399
ep_pending_ops: 0
ep_pending_ops_max: 0
ep_pending_ops_max_duration: 0
ep_pending_ops_total: 0
ep_queue_age_cap: 900
ep_queue_size: 138128
ep_storage_age: 97
ep_storage_age_highwat: 111
ep_storage_type: featured
ep_tap_keepalive: 0
ep_tmp_oom_errors: 100
ep_too_old: 0
ep_too_young: 0
ep_total_cache_size: 312926981
ep_total_del_items: 0
ep_total_enqueued: 498216
ep_total_new_items: 229917
ep_total_persisted: 360088
ep_vbucket_del: 44
ep_vbucket_del_fail: 0
ep_version: 1.6.0beta4_98_gaddeb09
ep_warmed_up: 0
ep_warmup: true
ep_warmup_dups: 0
ep_warmup_oom: 0
ep_warmup_thread: complete
ep_warmup_time: 0
get_hits: 640549
get_misses: 0
incr_hits: 0
incr_misses: 0
libevent: 2.0.7-rc
limit_maxbytes: 67108864
mem_used: 160216732
pid: 91693
pointer_size: 64
rejected_conns: 0
rusage_system: 230.418806
rusage_user: 81.542152
tap_connect_received: 863
tap_mutation_received: 44133
tap_mutation_sent: 236930
tap_vbucket_set_sent: 1711
threads: 4
time: 1286394570
total_connections: 938
uptime: 5217
version: 1.4.4_298_g250909b
Attachments
Issue Links
- duplicates
-
MB-2417 Persistence events do not occur when any vbuckets are pending deletion
- Resolved