Details
-
Bug
-
Status: Resolved
-
Test Blocker
-
Resolution: Fixed
-
3.0.4
-
None
-
1
-
SDK36: SearchFeat/Txns/Docs, SDK38: Txnβ-ready/Docs, SDK 44: Txns, SDK 3.1, Docs
Description
Running new collections n1ql tests requires we use python sdk3. All n1ql collections test run with 3.0.4. I was seeing some odd behavior after the test had run for about 10 minutes and the client side would freeze up. Using top, I could see that the client processes would continually grow in memory usage. By 10 minutes, the client machine would be out of memory. I ran tracemalloc to see where the memory could be leaking and it seem to be coming from here: https://github.com/couchbase/couchbase-python-client/blob/master/couchbase_core/n1ql.py#L529
This is a sample output of tracemalloc after 100 queries where submitted:
[2020-08-31 13:20:16,966: WARNING/Process-1:2] [ Top 20 differences ] |
[2020-08-31 13:20:16,966: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/n1ql.py:529: size=15.3 MiB (+193 KiB), count=153918 (+1900), average=104 B |
[2020-08-31 13:20:16,966: WARNING/Process-1:2] /usr/lib/python3.5/json/decoder.py:355: size=5703 KiB (+70.4 KiB), count=72901 (+900), average=80 B |
[2020-08-31 13:20:16,966: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/n1ql.py:417: size=1076 KiB (+13.3 KiB), count=8101 (+100), average=136 B |
[2020-08-31 13:20:16,966: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/__init__.py:262: size=1068 KiB (+13.3 KiB), count=16080 (+200), average=68 B |
[2020-08-31 13:20:16,966: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/cluster.py:595: size=506 KiB (+6400 B), count=8100 (+100), average=64 B |
[2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/__init__.py:292: size=505 KiB (+6400 B), count=8070 (+100), average=64 B |
[2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/n1ql.py:140: size=505 KiB (+6400 B), count=8066 (+100), average=64 B |
[2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/__init__.py:258: size=502 KiB (+6400 B), count=8040 (+100), average=64 B |
[2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/cluster.py:178: size=419 KiB (+6208 B), count=6704 (+97), average=64 B |
[2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/spring/reservoir.py:28: size=285 KiB (+3600 B), count=8099 (+100), average=36 B |
[2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/spring/cbgen_helpers.py:92: size=188 KiB (+2376 B), count=8028 (+99), average=24 B |
[2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/__init__.py:267: size=86.8 KiB (+192 B), count=1381 (+3), average=64 B |
[2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/spring/reservoir.py:31: size=67.5 KiB (+0 B), count=1 (+0), average=67.5 KiB |
[2020-08-31 13:20:16,967: WARNING/Process-1:2] /usr/lib/python3.5/tracemalloc.py:113: size=14.2 KiB (+0 B), count=165 (+0), average=88 B |
[2020-08-31 13:20:16,967: WARNING/Process-1:2] /usr/lib/python3.5/tracemalloc.py:123: size=13.2 KiB (+0 B), count=170 (+0), average=80 B |
[2020-08-31 13:20:16,967: WARNING/Process-1:2] /usr/lib/python3.5/tracemalloc.py:425: size=8408 B (+0 B), count=167 (+0), average=50 B |
[2020-08-31 13:20:16,967: WARNING/Process-1:2] /usr/lib/python3.5/multiprocessing/reduction.py:50: size=3556 B (+0 B), count=52 (+0), average=68 B |
[2020-08-31 13:20:16,968: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/options.py:73: size=3344 B (+0 B), count=46 (+0), average=73 B |
[2020-08-31 13:20:16,968: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/options.py:38: size=2384 B (+0 B), count=31 (+0), average=77 B |
[2020-08-31 13:20:16,968: WARNING/Process-1:2] /usr/lib/python3.5/tracemalloc.py:349: size=1944 B (+0 B), count=26 (+0), average=75 B |
The top one is:
/root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/n1ql.py:529: size=15.3 MiB (+193 KiB), count=153918 (+1900), average=104 B
You can see that it is adding around 193 KiB ever 100 queries, or 1.93 KiB per query. Our test do upwards of 10k queries/sec which would result in 19 MB per second or around 1 GB per minute. This causes our tests to quickly run out of memory on the client machines.
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Description |
Running new collections n1ql tests requires we use python sdk3. All n1ql collections test run with 3.0.4. I was seeing some odd behavior after the test had run for about 10 minutes and the client side would freeze up. Using top, I could see that the client processes would continually grow in memory usage. By 10 minutes, the client machine would be out of memory. I ran tracemalloc to see where the memory could be leaking and it seem to be coming from here: [https://github.com/couchbase/couchbase-python-client/blob/master/couchbase_core/n1ql.py#L529]
This is a sample output of tracemalloc after 100 queries where submitted: {code:java}[2020-08-31 13:20:16,966: WARNING/Process-1:2] [ Top 20 differences ] [2020-08-31 13:20:16,966: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/n1ql.py:529: size=15.3 MiB (+193 KiB), count=153918 (+1900), average=104 B [2020-08-31 13:20:16,966: WARNING/Process-1:2] /usr/lib/python3.5/json/decoder.py:355: size=5703 KiB (+70.4 KiB), count=72901 (+900), average=80 B [2020-08-31 13:20:16,966: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/n1ql.py:417: size=1076 KiB (+13.3 KiB), count=8101 (+100), average=136 B [2020-08-31 13:20:16,966: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/__init__.py:262: size=1068 KiB (+13.3 KiB), count=16080 (+200), average=68 B [2020-08-31 13:20:16,966: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/cluster.py:595: size=506 KiB (+6400 B), count=8100 (+100), average=64 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/__init__.py:292: size=505 KiB (+6400 B), count=8070 (+100), average=64 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/n1ql.py:140: size=505 KiB (+6400 B), count=8066 (+100), average=64 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/__init__.py:258: size=502 KiB (+6400 B), count=8040 (+100), average=64 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/cluster.py:178: size=419 KiB (+6208 B), count=6704 (+97), average=64 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/spring/reservoir.py:28: size=285 KiB (+3600 B), count=8099 (+100), average=36 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/spring/cbgen_helpers.py:92: size=188 KiB (+2376 B), count=8028 (+99), average=24 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/__init__.py:267: size=86.8 KiB (+192 B), count=1381 (+3), average=64 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/spring/reservoir.py:31: size=67.5 KiB (+0 B), count=1 (+0), average=67.5 KiB [2020-08-31 13:20:16,967: WARNING/Process-1:2] /usr/lib/python3.5/tracemalloc.py:113: size=14.2 KiB (+0 B), count=165 (+0), average=88 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /usr/lib/python3.5/tracemalloc.py:123: size=13.2 KiB (+0 B), count=170 (+0), average=80 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /usr/lib/python3.5/tracemalloc.py:425: size=8408 B (+0 B), count=167 (+0), average=50 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /usr/lib/python3.5/multiprocessing/reduction.py:50: size=3556 B (+0 B), count=52 (+0), average=68 B [2020-08-31 13:20:16,968: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/options.py:73: size=3344 B (+0 B), count=46 (+0), average=73 B [2020-08-31 13:20:16,968: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/options.py:38: size=2384 B (+0 B), count=31 (+0), average=77 B [2020-08-31 13:20:16,968: WARNING/Process-1:2] /usr/lib/python3.5/tracemalloc.py:349: size=1944 B (+0 B), count=26 (+0), average=75 B {code} The top one is: /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/n1ql.py:529: size=15.3 MiB (+193 KiB), count=153918 (+1900), average=104 B You can that it is adding around 193 KiB ever 100 queries, or 1.93 KiB per query. Our test do upwards of 10k queries/sec which would result in 19 MB per second or around 1 GB per minute. This causes our tests to quickly run out of memory on the client machines. |
Running new collections n1ql tests requires we use python sdk3. All n1ql collections test run with 3.0.4. I was seeing some odd behavior after the test had run for about 10 minutes and the client side would freeze up. Using top, I could see that the client processes would continually grow in memory usage. By 10 minutes, the client machine would be out of memory. I ran tracemalloc to see where the memory could be leaking and it seem to be coming from here: [https://github.com/couchbase/couchbase-python-client/blob/master/couchbase_core/n1ql.py#L529]
This is a sample output of tracemalloc after 100 queries where submitted: {code:java}[2020-08-31 13:20:16,966: WARNING/Process-1:2] [ Top 20 differences ] [2020-08-31 13:20:16,966: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/n1ql.py:529: size=15.3 MiB (+193 KiB), count=153918 (+1900), average=104 B [2020-08-31 13:20:16,966: WARNING/Process-1:2] /usr/lib/python3.5/json/decoder.py:355: size=5703 KiB (+70.4 KiB), count=72901 (+900), average=80 B [2020-08-31 13:20:16,966: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/n1ql.py:417: size=1076 KiB (+13.3 KiB), count=8101 (+100), average=136 B [2020-08-31 13:20:16,966: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/__init__.py:262: size=1068 KiB (+13.3 KiB), count=16080 (+200), average=68 B [2020-08-31 13:20:16,966: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/cluster.py:595: size=506 KiB (+6400 B), count=8100 (+100), average=64 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/__init__.py:292: size=505 KiB (+6400 B), count=8070 (+100), average=64 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/n1ql.py:140: size=505 KiB (+6400 B), count=8066 (+100), average=64 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/__init__.py:258: size=502 KiB (+6400 B), count=8040 (+100), average=64 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/cluster.py:178: size=419 KiB (+6208 B), count=6704 (+97), average=64 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/spring/reservoir.py:28: size=285 KiB (+3600 B), count=8099 (+100), average=36 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/spring/cbgen_helpers.py:92: size=188 KiB (+2376 B), count=8028 (+99), average=24 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/__init__.py:267: size=86.8 KiB (+192 B), count=1381 (+3), average=64 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /root/korry/perfrunner/spring/reservoir.py:31: size=67.5 KiB (+0 B), count=1 (+0), average=67.5 KiB [2020-08-31 13:20:16,967: WARNING/Process-1:2] /usr/lib/python3.5/tracemalloc.py:113: size=14.2 KiB (+0 B), count=165 (+0), average=88 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /usr/lib/python3.5/tracemalloc.py:123: size=13.2 KiB (+0 B), count=170 (+0), average=80 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /usr/lib/python3.5/tracemalloc.py:425: size=8408 B (+0 B), count=167 (+0), average=50 B [2020-08-31 13:20:16,967: WARNING/Process-1:2] /usr/lib/python3.5/multiprocessing/reduction.py:50: size=3556 B (+0 B), count=52 (+0), average=68 B [2020-08-31 13:20:16,968: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/options.py:73: size=3344 B (+0 B), count=46 (+0), average=73 B [2020-08-31 13:20:16,968: WARNING/Process-1:2] /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase/options.py:38: size=2384 B (+0 B), count=31 (+0), average=77 B [2020-08-31 13:20:16,968: WARNING/Process-1:2] /usr/lib/python3.5/tracemalloc.py:349: size=1944 B (+0 B), count=26 (+0), average=75 B {code} The top one is: /root/korry/perfrunner/env/lib/python3.5/site-packages/couchbase_core/n1ql.py:529: size=15.3 MiB (+193 KiB), count=153918 (+1900), average=104 B You can see that it is adding around 193 KiB ever 100 queries, or 1.93 KiB per query. Our test do upwards of 10k queries/sec which would result in 19 MB per second or around 1 GB per minute. This causes our tests to quickly run out of memory on the client machines. |
Status | New [ 10003 ] | Open [ 1 ] |
Attachment | 10krun.txt [ 106178 ] | |
Attachment | 100run.txt [ 106179 ] |
Attachment | repro.py [ 106180 ] |
Labels | performance | affects-cc-testing performance |
Sprint | SDK36: SearchFeat/Txns/Docs [ 1212 ] |
Rank | Ranked higher |
Rank | Ranked higher |
Fix Version/s | 3.0.6 [ 17108 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Attachment | screenshot-1.png [ 106984 ] |
Attachment | image-2020-09-09-21-39-02-834.png [ 106990 ] |
Attachment | fixtracemalloc.txt [ 107257 ] |
Attachment | 304tracemalloc.txt [ 107258 ] |
Sprint | SDK36: SearchFeat/Txns/Docs [ 1212 ] | SDK36: SearchFeat/Txns/Docs, SDK38: Txnβ-ready/Docs [ 1212, 1213 ] |
Status | In Progress [ 3 ] | Open [ 1 ] |
Sprint | SDK36: SearchFeat/Txns/Docs, SDK38: Txnβ-ready/Docs [ 1212, 1213 ] | SDK36: SearchFeat/Txns/Docs, SDK38: Txnβ-ready/Docs, SDK40: [ 1212, 1213, 1220 ] |
Sprint | SDK36: SearchFeat/Txns/Docs, SDK38: Txnβ-ready/Docs, SDK40: [ 1212, 1213, 1220 ] | SDK36: SearchFeat/Txns/Docs, SDK38: Txnβ-ready/Docs [ 1212, 1213 ] |
Rank | Ranked higher |
Sprint | SDK36: SearchFeat/Txns/Docs, SDK38: Txnβ-ready/Docs [ 1212, 1213 ] | SDK36: SearchFeat/Txns/Docs, SDK38: Txnβ-ready/Docs, SDK 44: Txns, SDK 3.1 [ 1212, 1213, 1289 ] |
Rank | Ranked lower |
Assignee | Ellis Breen [ ellis.breen ] | Brett Lawson [ brett19 ] |
Resolution | Fixed [ 1 ] | |
Status | Open [ 1 ] | Resolved [ 5 ] |
Workflow | Couchbase SDK Workflow [ 178669 ] | Couchbase SDK Workflow with Review [ 257463 ] |
100run.txt
10krun.txt
I have uploaded the output of valgrind on a small reprop script repro.py
. The two runs are with 100 queries and 10,000 queries. Here is the leak summary for both:
10k
==16406== LEAK SUMMARY:
==16406== definitely lost: 48 bytes in 1 blocks
==16406== indirectly lost: 510 bytes in 12 blocks
==16406== possibly lost: 12,296 bytes in 22 blocks
==16406== still reachable: 21,087,700 bytes in 145,720 blocks
==16406== of which reachable via heuristic:
==16406== stdstring : 409,959 bytes in 9,999 blocks
==16406== suppressed: 0 bytes in 0 blocks
100
==16872== LEAK SUMMARY:
==16872== definitely lost: 48 bytes in 1 blocks
==16872== indirectly lost: 510 bytes in 12 blocks
==16872== possibly lost: 19,712 bytes in 34 blocks
==16872== still reachable: 2,701,820 bytes in 7,910 blocks
==16872== of which reachable via heuristic:
==16872== stdstring : 4,059 bytes in 99 blocks
==16872== suppressed: 0 bytes in 0 blocks
The interesting part is the "still reachable". You can see that the number of blocks still reachable corresponds to the number of queries run. Although these blocks are "still reachable" they are a problem as the memory usage continues to grow with the number of queries. "Still reachable" from what I could tell means that the memory could have been freed but wasnt prior to program exit. So technically it seems like its not a "memory leak" but the memory doesn't get freed until program exit. And, as the memory size grows with the number of queries, this would suggest that the query results are not being discarded properly after the sdk gives the results to my program, not sure thought, just an idea.