Uploaded image for project: 'Couchbase Python Client Library'
  1. Couchbase Python Client Library
  2. PYCBC-1029

Memory leak on get_multi() and other ..._multi functions

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.5
    • 3.2.4
    • library
    • None
    • CentOS 7 x86_64, Python 3.6 (IUS)
    • 1

    Description

      Memory usage quickly builds up on querying many batches of keys, with bucket.get_multi() or by using any other ..._multi() functions, eg. upsert_mult().

      You may test it with the following script:

      from couchbase.bucket import Bucket
      import resource
       
      keys = [line.rstrip() for line in open('1000_keys.txt', 'rt')]
      bucket = Bucket('couchbase://hostname/a_bucket', username=..., password=...)
       
      for i in range(100):
          bucket.get_multi(keys)
          mem_rss = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
          print('Memory usage: ', mem_rss)

      Libcouchbase 2.10.7 + couchbase 2.5.12 Python package does not have this issue, but 3.0.5 does have.

      Attachments

        1. asan_get_multi_1000_fix.txt
          571 kB
        2. asan_get_multi_1000.txt
          573 kB
        3. cb_leak_get.py
          0.7 kB
        4. cb_leaks_2.py
          1 kB
        5. cb_leaks.py
          0.7 kB
        6. cb-leaks-2.xml.gz
          8 kB
        7. leak.xml.gz
          120 kB
        8. leak.xml-1.gz
          7 kB
        9. leak-2.xml.gz
          9 kB
        10. leak-3.8.xml.gz
          91 kB
        11. leak-get.xml.gz
          8 kB
        12. san-leaks.txt
          1 kB
        13. valgrind_multi_upsert.zip
          1.68 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          lcosmin Cos Min added a comment -

          It looks like there's a leak even for a simple "get".  Attaching test code. cb_leak_get.py  

          When running the test code and  while true; do ps h -o pid,vsz,rss $TEST_PID; sleep 10; done   in parrallel, a steady memory increase can be observed:

          1778843  73032 64396
          1778843  77128 68492
          1778843  81224 72588
          1778843  85064 76428
          1778843  88904 80268
          1778843  92744 84108
          1778843  96584 87948
          1778843 100680 92044
          1778843 104776 96140
          1778843 108616 99980
          1778843 112456 103820
          1778843 116296 107660
          1778843 120392 111756
          1778843 124232 115596
          1778843 128328 119692
          1778843 132168 123532
          1778843 136264 127628
          ... 

           

          Also, not sure if relevant, but breaking (Ctrl-C) the test code results in an assertion error from src/oputil.c:

          python-couchbase: self->nremaining == 0 at src/oputil.c:207. Abort[1]    1782098 abort      python3 cb_leak_get.py -i 100000 

           

          leak-get.xml.gz

           

          lcosmin Cos Min added a comment - It looks like there's a leak even for a simple "get".  Attaching test code.  cb_leak_get.py    When running the test code and  while true; do ps h -o pid,vsz,rss $TEST_PID; sleep 10; done    in parrallel, a steady memory increase can be observed: 1778843 73032 64396 1778843 77128 68492 1778843 81224 72588 1778843 85064 76428 1778843 88904 80268 1778843 92744 84108 1778843 96584 87948 1778843 100680 92044 1778843 104776 96140 1778843 108616 99980 1778843 112456 103820 1778843 116296 107660 1778843 120392 111756 1778843 124232 115596 1778843 128328 119692 1778843 132168 123532 1778843 136264 127628 ...   Also, not sure if relevant, but breaking (Ctrl-C) the test code results in an assertion error from src/oputil.c : python-couchbase: self->nremaining == 0 at src/oputil.c: 207 . Abort[ 1 ] 1782098 abort python3 cb_leak_get.py -i 100000   leak-get.xml.gz  
          jared.casey Jared Casey added a comment - - edited

          I have attached a couple of files showing output from ASAN leak detection and valgrind output. From the ASAN output, the following block is missing once the fix has been applied. When looking at the valgrind output, the related terms are: "pycbc_set_kv_ull", "pycbc_dict_add_text_kv_strn" and "pycbc_convert_kv_error_context" and those will not be found in the valgrind output when the fix is applied.

          Direct leak of 575424 byte(s) in 999 object(s) allocated from:
              #0 0x10f4047b0 in wrap_malloc+0xa0 (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x447b0)
              #1 0x10f0d77d9 in _PyObject_Malloc+0x39 (python:x86_64+0x1000627d9)
              #2 0x10f0c2b3f in new_keys_object+0x9f (python:x86_64+0x10004db3f)
              #3 0x10f0c3d22 in dictresize+0x42 (python:x86_64+0x10004ed22)
              #4 0x10f0c3989 in insertdict+0x2f9 (python:x86_64+0x10004e989)
              #5 0x1117f3fc0 in pycbc_dict_add_text_kv ext.c:3214
              #6 0x1117c9c64 in pycbc_convert_kv_error_context callbacks.c:210
              #7 0x1117ca07c in get_operation_err_info callbacks.c:276
              #8 0x1117cc1bd in operation_completed_with_err_info callbacks.c:305
              #9 0x1117cddfc in value_callback callbacks.c:926
              #10 0x11190b5a2 in void invoke_callback<lcb_RESPGET_>(mc_packet_st const*, lcb_st*, lcb_RESPGET_*, lcb_CALLBACK_TYPE) handler.cc:359
              #11 0x11190a323 in mcreq_dispatch_response handler.cc:1058
              #12 0x11191fb2c in lcb::Server::try_read(lcbio_CTX*, rdb_IOROPE*) mcserver.cc
              #13 0x11192198d in on_read(lcbio_CTX*, unsigned int) mcserver.cc:628
              #14 0x1118e4dde in E_handler(int, short, void*) ctx.cc:278
              #15 0x1118cf894 in run_loop plugin-select.c:309
              #16 0x11196a7bd in lcb_wait wait.cc:108
              #17 0x1118134e8 in pycbc_common_vars_wait oputil.c:205
              #18 0x1117fc611 in get_common get.c:474
              #19 0x1117fd065 in pycbc_Bucket_get_multi get.c:639
              #20 0x10f09831b in _PyMethodDef_RawFastCallDict+0x20b (python:x86_64+0x10002331b)
              #21 0x10f09e70b in methoddescr_call+0x5b (python:x86_64+0x10002970b)
              #22 0x10f097cc8 in PyObject_Call+0x88 (python:x86_64+0x100022cc8)
              #23 0x10f16559c in _PyEval_EvalFrameDefault+0x6e3c (python:x86_64+0x1000f059c)
              #24 0x10f1694d1 in _PyEval_EvalCodeWithName+0xae1 (python:x86_64+0x1000f44d1)
              #25 0x10f097658 in _PyFunction_FastCallDict+0x1c8 (python:x86_64+0x100022658)
              #26 0x10f098b42 in _PyObject_Call_Prepend+0x82 (python:x86_64+0x100023b42)
              #27 0x10f097cc8 in PyObject_Call+0x88 (python:x86_64+0x100022cc8)
              #28 0x10f16559c in _PyEval_EvalFrameDefault+0x6e3c (python:x86_64+0x1000f059c)
              #29 0x10f1694d1 in _PyEval_EvalCodeWithName+0xae1 (python:x86_64+0x1000f44d1)
          

          When the Python SDK receives a response from libcouchbase it stores error context in a dict object (this is done in the Python C-API). The memory leak was occurring with the _*multi() operations due to replacing the pointer for error context with the most recent libcouchbase response without decrementing the reference count for the previous error context. The fix was to decrement the reference count of the previous error context.

          Attachments:
          asan_get_multi_1000.txt
          asan_get_multi_1000_fix.txt
          valgrind_multi_upsert.zip

          jared.casey Jared Casey added a comment - - edited I have attached a couple of files showing output from ASAN leak detection and valgrind output. From the ASAN output, the following block is missing once the fix has been applied. When looking at the valgrind output, the related terms are: "pycbc_set_kv_ull", "pycbc_dict_add_text_kv_strn" and "pycbc_convert_kv_error_context" and those will not be found in the valgrind output when the fix is applied. Direct leak of 575424 byte(s) in 999 object(s) allocated from: #0 0x10f4047b0 in wrap_malloc+0xa0 (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x447b0) #1 0x10f0d77d9 in _PyObject_Malloc+0x39 (python:x86_64+0x1000627d9) #2 0x10f0c2b3f in new_keys_object+0x9f (python:x86_64+0x10004db3f) #3 0x10f0c3d22 in dictresize+0x42 (python:x86_64+0x10004ed22) #4 0x10f0c3989 in insertdict+0x2f9 (python:x86_64+0x10004e989) #5 0x1117f3fc0 in pycbc_dict_add_text_kv ext.c:3214 #6 0x1117c9c64 in pycbc_convert_kv_error_context callbacks.c:210 #7 0x1117ca07c in get_operation_err_info callbacks.c:276 #8 0x1117cc1bd in operation_completed_with_err_info callbacks.c:305 #9 0x1117cddfc in value_callback callbacks.c:926 #10 0x11190b5a2 in void invoke_callback<lcb_RESPGET_>(mc_packet_st const*, lcb_st*, lcb_RESPGET_*, lcb_CALLBACK_TYPE) handler.cc:359 #11 0x11190a323 in mcreq_dispatch_response handler.cc:1058 #12 0x11191fb2c in lcb::Server::try_read(lcbio_CTX*, rdb_IOROPE*) mcserver.cc #13 0x11192198d in on_read(lcbio_CTX*, unsigned int) mcserver.cc:628 #14 0x1118e4dde in E_handler(int, short, void*) ctx.cc:278 #15 0x1118cf894 in run_loop plugin-select.c:309 #16 0x11196a7bd in lcb_wait wait.cc:108 #17 0x1118134e8 in pycbc_common_vars_wait oputil.c:205 #18 0x1117fc611 in get_common get.c:474 #19 0x1117fd065 in pycbc_Bucket_get_multi get.c:639 #20 0x10f09831b in _PyMethodDef_RawFastCallDict+0x20b (python:x86_64+0x10002331b) #21 0x10f09e70b in methoddescr_call+0x5b (python:x86_64+0x10002970b) #22 0x10f097cc8 in PyObject_Call+0x88 (python:x86_64+0x100022cc8) #23 0x10f16559c in _PyEval_EvalFrameDefault+0x6e3c (python:x86_64+0x1000f059c) #24 0x10f1694d1 in _PyEval_EvalCodeWithName+0xae1 (python:x86_64+0x1000f44d1) #25 0x10f097658 in _PyFunction_FastCallDict+0x1c8 (python:x86_64+0x100022658) #26 0x10f098b42 in _PyObject_Call_Prepend+0x82 (python:x86_64+0x100023b42) #27 0x10f097cc8 in PyObject_Call+0x88 (python:x86_64+0x100022cc8) #28 0x10f16559c in _PyEval_EvalFrameDefault+0x6e3c (python:x86_64+0x1000f059c) #29 0x10f1694d1 in _PyEval_EvalCodeWithName+0xae1 (python:x86_64+0x1000f44d1) When the Python SDK receives a response from libcouchbase it stores error context in a dict object (this is done in the Python C-API). The memory leak was occurring with the _* multi() operations due to replacing the pointer for error context with the most recent libcouchbase response without decrementing the reference count for the previous error context. The fix was to decrement the reference count of the previous error context. Attachments: asan_get_multi_1000.txt asan_get_multi_1000_fix.txt valgrind_multi_upsert.zip
          lcosmin Cos Min added a comment - - edited

          This doesn't seem fixed in 3.2.5 or there's another issue unrelated to this. Testing with the attached cb_leaks.py (reading batches of keys with get_multi()), i can still observe a steady memory increase.  I'll try to produce a new debug build and get some logs with valgrind

          lcosmin Cos Min added a comment - - edited This doesn't seem fixed in 3.2.5 or there's another issue unrelated to this. Testing with the attached cb_leaks.py (reading batches of keys with get_multi()), i can still observe a steady memory increase.  I'll try to produce a new debug build and get some logs with valgrind
          jared.casey Jared Casey added a comment -

          Hi Cos Min - I think we will want another ticket for other potential leaks. I am quite confident the leak directly related to the multi_*() operations has been addressed. Further valgrind logs would be greatly appreciated.

          jared.casey Jared Casey added a comment - Hi Cos Min - I think we will want another ticket for other potential leaks. I am quite confident the leak directly related to the multi_*() operations has been addressed. Further valgrind logs would be greatly appreciated.
          lcosmin Cos Min added a comment - - edited

          Ok, tried to reproduce the problem with a slightly modified test program ( cb_leaks_2.py ) . The code is straightforward: it generates a batch of keys (they don't have to exist) and reads them repeatedly from couchbase, using get_multi.

          The output from running with valgrind:

          % valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --xml=yes --xml-file=leak.xml venv/bin/python3-dbg cb_leak.py/home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase_core/supportability.py:43: UserWarning:
              'CBCollection.get_multi' is a volatile API call that is still in flux and may likely be changed.    It may also be an inherently private API call that may be exposed, but "YMMV" (your mileage may vary) principles apply.
            warnings.warn(message % "'{}'".format(func_name))
          Memory usage:  262108
          [ Top 5 ]
          /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:663: size=10.9 KiB, count=120, average=93 B
          /usr/lib/python3.8/linecache.py:137: size=8637 B, count=87, average=99 B
          /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:116: size=5200 B, count=100, average=52 B
          cb_leak.py:18: size=3436 B, count=52, average=66 B
          /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:486: size=2832 B, count=51, average=56 B
          --------------------
          Memory usage:  295336
          [ Top 5 ]
          /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:663: size=2003 KiB, count=40121, average=51 B
          /usr/lib/python3.8/linecache.py:137: size=8637 B, count=87, average=99 B
          /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:116: size=5200 B, count=100, average=52 B
          cb_leak.py:18: size=3436 B, count=52, average=66 B
          /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:486: size=2832 B, count=51, average=56 B
          --------------------
          Memory usage:  322452
          [ Top 5 ]
          /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:663: size=3995 KiB, count=80121, average=51 B
          /usr/lib/python3.8/linecache.py:137: size=8637 B, count=87, average=99 B
          /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:116: size=5200 B, count=100, average=52 B
          /usr/lib/python3.8/tracemalloc.py:479: size=3680 B, count=59, average=62 B
          cb_leak.py:18: size=3436 B, count=52, average=66 B
          -------------------- 

           

          It seems that couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:663  keeps growing with each iteration.

           

          Also attached the cb-leaks-2.xml.gz with the valgrind output.

          These are the relevant versions on the test machine (ubuntu 20.04):

          % dpkg -l | grep couchbase
          ii  libcouchbase-dev:amd64             3.2.4-1                            amd64        library for the Couchbase protocol, development files
          ii  libcouchbase3:amd64                3.2.4-1                            amd64        library for the Couchbase protocol, core files
          ii  libcouchbase3-libevent:amd64       3.2.4-1                            amd64        library for the Couchbase protocol (libevent backend)
          ii  python3-couchbase                  3.2.5                              amd64        Python Client for Couchbase 

           

          Please let me know if this is related to the current issue or if a new one needs to be created.

          lcosmin Cos Min added a comment - - edited Ok, tried to reproduce the problem with a slightly modified test program ( cb_leaks_2.py ) . The code is straightforward: it generates a batch of keys (they don't have to exist) and reads them repeatedly from couchbase, using get_multi . The output from running with valgrind: % valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --xml=yes --xml-file=leak.xml venv/bin/python3-dbg cb_leak.py/home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase_core/supportability.py:43: UserWarning:     'CBCollection.get_multi' is a volatile API call that is still in flux and may likely be changed.    It may also be an inherently private API call that may be exposed, but "YMMV" (your mileage may vary) principles apply.   warnings.warn(message % "'{}'".format(func_name)) Memory usage:  262108 [ Top 5 ] /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:663: size=10.9 KiB, count=120, average=93 B /usr/lib/python3.8/linecache.py:137: size=8637 B, count=87, average=99 B /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:116: size=5200 B, count=100, average=52 B cb_leak.py:18: size=3436 B, count=52, average=66 B /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:486: size=2832 B, count=51, average=56 B -------------------- Memory usage:  295336 [ Top 5 ] /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:663: size=2003 KiB, count=40121, average=51 B /usr/lib/python3.8/linecache.py:137: size=8637 B, count=87, average=99 B /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:116: size=5200 B, count=100, average=52 B cb_leak.py:18: size=3436 B, count=52, average=66 B /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:486: size=2832 B, count=51, average=56 B -------------------- Memory usage:  322452 [ Top 5 ] /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:663: size=3995 KiB, count=80121, average=51 B /usr/lib/python3.8/linecache.py:137: size=8637 B, count=87, average=99 B /home/cluta/venv/lib/python3.8/site-packages/couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:116: size=5200 B, count=100, average=52 B /usr/lib/python3.8/tracemalloc.py:479: size=3680 B, count=59, average=62 B cb_leak.py:18: size=3436 B, count=52, average=66 B --------------------   It seems that couchbase-3.2.5-py3.8-linux-x86_64.egg/couchbase/result.py:663   keeps growing with each iteration.   Also attached the cb-leaks-2.xml.gz with the valgrind output. These are the relevant versions on the test machine (ubuntu 20.04): % dpkg -l | grep couchbase ii  libcouchbase-dev:amd64             3.2.4-1                            amd64        library for the Couchbase protocol, development files ii  libcouchbase3:amd64                3.2.4-1                            amd64        library for the Couchbase protocol, core files ii  libcouchbase3-libevent:amd64       3.2.4-1                            amd64        library for the Couchbase protocol (libevent backend) ii  python3-couchbase                  3.2.5                              amd64        Python Client for Couchbase   Please let me know if this is related to the current issue or if a new one needs to be created.

          People

            jared.casey Jared Casey
            andor_toth Andor Tóth
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty