Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.4.0-beta
-
None
-
None
-
1
-
1,000,000
Description
Scenario:
Simple test that sets the timeout threshold low, then does a bunch of ops in a tight loop. Then does more ops for 20 seconds.
Expected behavior:
ThresholdLoggingTracer would log something
Observed behavior:
Some minorly enhanced timeout logging, but no threshold logging tracer output.
Script and output attached. Invoked as:
$ LCB_LOGLEVEL=2 python timeoutme.py > run.out 2>&1
(note: filing this to Sergey Avseyev since it probably is lcb level, but I wrote the test as python, please move or redirect as appropriate)
Attachments
- timeoutme.py
- 0.9 kB
- timeoutme.php
- 0.7 kB
- run3.txt
- 17 kB
- run2.txt
- 19 kB
- run.out
- 6 kB
Activity
I tried to repro this with PHP, and could not. In particular, the effective connstr came through differently and I had logging output indicating the threshold logging tracer. So, now I wonder if there's a python level problem. Ellis Breen, can you have a look?
script: timeoutme.php
run output: timeoutme_php_output.txt
exec: LCB_LOGLEVEL=2 php timeoutme.php > timeoutme_php_output.txt 2>&1
Matt - the Orphaned responses in the last Python example are clearly visible here:
13860ms [Ia5744edb] {81507/a33} [WARN] (tracer - L:126) Orphan responses observed: {"count":6,"service":"kv","top":[{"last_local_address":"127.0.0.1:54909","last_local_id":"00000000a5744edb/9e327d4953bf9c29","last_remote_address":"localhost:11210","operation_id":"0xc896","operation_name":"get","server_us":0,"total_us":981},{"last_local_address":"127.0.0.1:54909","last_local_id":"00000000a5744edb/9e327d4953bf9c29","last_remote_address":"localhost:11210","operation_id":"0xc895","operation_name":"get","server_us":0,"total_us":1005},{"last_local_address":"127.0.0.1:54909","last_local_id":"00000000a5744edb/9e327d4953bf9c29","last_remote_address":"localhost:11210","operation_id":"0x940d","operation_name":"get","server_us":0,"total_us":1033},{"last_local_address":"127.0.0.1:54909","last_local_id":"00000000a5744edb/9e327d4953bf9c29","last_remote_address":"localhost:11210","operation_id":"0x7dc7","operation_name":"get","server_us":0,"total_us":1087},{"last_local_address":"127.0.0.1:54909","last_local_id":"00000000a5744edb/9e327d4953bf9c29","last_remote_address":"localhost:11210","operation_id":"0x7dc6","operation_name":"get","server_us":0,"total_us":1442},{"last_local_address":"127.0.0.1:54909","last_local_id":"00000000a5744edb/9e327d4953bf9c29","last_remote_address":"localhost:11210","operation_id":"0xa393","operation_name":"get","server_us":0,"total_us":1720}]} |
17305ms [Ia5744edb] {81507/a33} [WARN] (server - L:360) <localhost:11210> (CTX=0x7fef70510260,memcached,SRV=0x7fef7050cda0,IX=0) Server sent us reply for a timed-out command. (OP=0xb5, RC=0x0, SEQ=89334) |
19443ms [Ia5744edb] {81507/a33} [WARN] (tracer - L:126) Orphan responses observed: {"count":6,"service":"kv","top":[{"last_local_address":"127.0.0.1:54909","last_local_id":"00000000a5744edb/9e327d4953bf9c29","last_remote_address":"localhost:11210","operation_id":"0x14ded","operation_name":"get","server_us":0,"total_us":957},{"last_local_address":"127.0.0.1:54909","last_local_id":"00000000a5744edb/9e327d4953bf9c29","last_remote_address":"localhost:11210","operation_id":"0x12932","operation_name":"get","server_us":0,"total_us":974},{"last_local_address":"127.0.0.1:54909","last_local_id":"00000000a5744edb/9e327d4953bf9c29","last_remote_address":"localhost:11210","operation_id":"0x15cf5","operation_name":"get","server_us":0,"total_us":981},{"last_local_address":"127.0.0.1:54909","last_local_id":"00000000a5744edb/9e327d4953bf9c29","last_remote_address":"localhost:11210","operation_id":"0x15cf4","operation_name":"get","server_us":0,"total_us":996},{"last_local_address":"127.0.0.1:54909","last_local_id":"00000000a5744edb/9e327d4953bf9c29","last_remote_address":"localhost:11210","operation_id":"0x1136b","operation_name":"get","server_us":0,"total_us":1042},{"last_local_address":"127.0.0.1:54909","last_local_id":"00000000a5744edb/9e327d4953bf9c29","last_remote_address":"localhost:11210","operation_id":"0x1136c","operation_name":"get","server_us":0,"total_us":1394}]} |
OperationResult<rc=0x0, key=u'key', cas=0x152f8fe62b8b0000, tracing_context=0, is_tracing_stub=False> |
Looks fine to me, and roughly equivalent to the PHP output.
Ellis Breen I think Matt was trying to get Threshold Logging output, not orphaned responses.
Also, what is the current situation regarding flushing of the collected spans on a time interval?
I can't see any 'over-threshold' logs in the PHP output though. I managed to get the threshold warnings in the test I linked above, e.g.
[4141233127] Operations over threshold: {"count":99,"service":"kv","top":[{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1ca4","operation_name":"upsert","server_us":19,"total_us":8898},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1ca7","operation_name":"upsert","server_us":8,"total_us":10334},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1caa","operation_name":"upsert","server_us":6,"total_us":11273},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cb0","operation_name":"upsert","server_us":6,"total_us":11746},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cb4","operation_name":"upsert","server_us":3,"total_us":12232},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cb6","operation_name":"upsert","server_us":6,"total_us":12750},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cb9","operation_name":"upsert","server_us":3,"total_us":13241},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cbd","operation_name":"upsert","server_us":3,"total_us":13687},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cbf","operation_name":"upsert","server_us":3,"total_us":14292},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cc4","operation_name":"upsert","server_us":3,"total_us":14764},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cc8","operation_name":"upsert","server_us":3,"total_us":15346},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cc9","operation_name":"upsert","server_us":3,"total_us":16046},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1ccd","operation_name":"upsert","server_us":3,"total_us":16602},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cd7","operation_name":"upsert","server_us":3,"total_us":17054},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cdb","operation_name":"upsert","server_us":6,"total_us":17622},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cdf","operation_name":"upsert","server_us":3,"total_us":18123},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1ce8","operation_name":"upsert","server_us":3,"total_us":18531},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cef","operation_name":"upsert","server_us":3,"total_us":18987},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cf4","operation_name":"upsert","server_us":6,"total_us":19523},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cfc","operation_name":"upsert","server_us":3,"total_us":19894},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cfe","operation_name":"upsert","server_us":3,"total_us":20497},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1cff","operation_name":"upsert","server_us":3,"total_us":21137},{"last_local_address":"192.168.99.1:49414","last_local_id":"00000000f6d633e7/1816c785a1ee1ffe","last_remote_address":"192.168.99.1:49398","operation_id":"0x1d01","operation_name":"upsert","server_us":2,"total_us":21838},{"last_local_address":"192.168.99.1:49414", |
|
...
|
|
"total_us":69233},{"last_local_address":"192.168.99.1:49415","last_local_id":"00000000f6d633e7/1d49399a10bf3c19","last_remote_address":"192.168.99.1:49399","operation_id":"0x1cf7","operation_name":"upsert","server_us":6,"total_us":69619},{"last_local_address":"192.168.99.1:49415","last_local_id":"00000000f6d633e7/1d49399a10bf3c19","last_remote_address":"192.168.99.1:49399","operation_id":"0x1cfa","operation_name":"upsert","server_us":3,"total_us":70130},{"last_local_address":"192.168.99.1:49415","last_local_id":"00000000f6d633e7/1d49399a10bf3c19","last_remote_address":"192.168.99.1:49399","operation_id":"0x1cfb","operation_name":"upsert","server_us":3,"total_us":70714},{"last_local_address":"192.168.99.1:49415","last_local_id":"00000000f6d633e7/1d49399a10bf3c19","last_remote_address":"192.168.99.1:49399","operation_id":"0x1d03","operation_name":"upsert","server_us":3,"total_us":71103}]} (L:128) |
You can set the intervals as Matt has done above: https://issues.couchbase.com/browse/PYCBC-485?focusedCommentId=272810&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-272810
The default settings are in libcouchbase.h:
#define LCBTRACE_DEFAULT_ORPHANED_QUEUE_FLUSH_INTERVAL LCB_MS2US(1000)
#define LCBTRACE_DEFAULT_ORPHANED_QUEUE_SIZE 128
#define LCBTRACE_DEFAULT_THRESHOLD_QUEUE_FLUSH_INTERVAL LCB_MS2US(3000)
#define LCBTRACE_DEFAULT_THRESHOLD_QUEUE_SIZE 128
#define LCBTRACE_DEFAULT_THRESHOLD_KV LCB_MS2US(500)
#define LCBTRACE_DEFAULT_THRESHOLD_N1QL LCB_MS2US(1000)
#define LCBTRACE_DEFAULT_THRESHOLD_VIEW LCB_MS2US(1000)
#define LCBTRACE_DEFAULT_THRESHOLD_FTS LCB_MS2US(1000)
#define LCBTRACE_DEFAULT_THRESHOLD_ANALYTICS LCB_MS2US(1000)
So every 1 second for orphaned spans, every 3 seconds for threshold-exceeding spans.
Updated including tunings suggested by Ellis Breen and running with envvar suggested by Sergey Avseyev. Still not working.
script: timeoutme-tuned.py
run output: run3.txt
exec: LCB_OPTIONS="enable_tracing=true" LCB_LOGLEVEL=2 python timeoutme.py > run.out 2>&1