Uploaded image for project: 'Couchbase Python Client Library'
  1. Couchbase Python Client Library
  2. PYCBC-834

Intermittent segfault in collection.exists

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 3.0.0-beta.3
    • 3.0.1
    • None
    • None
    • 1
    • SDK17: FLE/DOC/ServerlessPF

    Description

      I ran into this while looking at another issue, twice now. I didn't explore it much yet, but here is the top of the stack trace.

      System Integrity Protection: enabled

      Crashed Thread: 0 Dispatch queue: com.apple.main-thread

      Exception Type: EXC_CRASH (SIGABRT)
      Exception Codes: 0x0000000000000000, 0x0000000000000000
      Exception Note: EXC_CORPSE_NOTIFY

      Application Specific Information:
      abort() called

      Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
      0 libsystem_kernel.dylib 0x00007fff71c3849a __pthread_kill + 10
      1 libsystem_pthread.dylib 0x00007fff71cf56cb pthread_kill + 384
      2 libsystem_c.dylib 0x00007fff71bc0a1c abort + 120
      3 _libcouchbase.cpython-37m-darwin.so 0x0000000106a8a2aa pycbc_handle_assert.cold.1 + 42
      4 _libcouchbase.cpython-37m-darwin.so 0x0000000106a770fd pycbc_handle_assert + 125
      5 _libcouchbase.cpython-37m-darwin.so 0x0000000106a6db1f operation_completed_with_err_info + 191 (callbacks.c:144)
      6 _libcouchbase.cpython-37m-darwin.so 0x0000000106a6dfe7 value_callback + 183 (callbacks.c:794)
      7 libcouchbase.2.dylib 0x0000000106b26a87 rget_callback(mc_pipeline_st*, mc_packet_st*, lcb_STATUS, void const*) + 87 (get.cc:550)
      8 libcouchbase.2.dylib 0x0000000106afe1e8 mcreq_dispatch_response + 11016 (handler.cc:500)
      9 libcouchbase.2.dylib 0x0000000106b37c40 lcb::Server::purge_single(mc_packet_st*, lcb_STATUS) + 2512 (mcserver.cc:774)
      10 libcouchbase.2.dylib 0x0000000106ac9456 mcreq_pipeline_fail + 182 (mcreq.c:916)
      11 libcouchbase.2.dylib 0x0000000106b36168 lcb::Server::socket_failed(lcb_STATUS) + 56 (mcserver.cc:789)
      12 libcouchbase.2.dylib 0x0000000106b3d21b timeout_handler(void*) + 59 (negotiate.cc:107)
      13 libcouchbase.2.dylib 0x0000000106ad0c93 timer_callback + 595 (timer.c:45)
      14 libcouchbase.2.dylib 0x0000000106ab8c7f run_loop + 719 (plugin-select.c:156)
      15 libcouchbase.2.dylib 0x0000000106b4385d lcb_wait + 141 (wait.cc:110)
      16 _libcouchbase.cpython-37m-darwin.so 0x0000000106a84c92 pycbc_common_vars_wait + 194 (oputil.c:739)
      17 _libcouchbase.cpython-37m-darwin.so 0x0000000106a7e5a1 get_common + 1633
      18 libcouchbase.cpython-37m-darwin.so 0x0000000106a7eb76 pycbc_Bucket_rgetall + 86 (get.c:580)
      19 org.python.python 0x000000010595e451 _PyMethodDef_RawFastCallDict + 516

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          david.kelly David Kelly added a comment - - edited

           

          * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
           frame #0: 0x0000000101e9dba7 libcouchbase.2.dylib`::lcbtrace_span_add_tag_uint64(lcbtrace_SPAN *, const char *, uint64_t) [inlined] sllist_append(list=0x00000001005623b8, item=0x0000000100562fe0) at sllist-inl.h:148:26
           frame #1: 0x0000000101e9db9e libcouchbase.2.dylib`::lcbtrace_span_add_tag_uint64(lcbtrace_SPAN *, const char *, uint64_t) [inlined] lcb::trace::Span::add_tag(this=0x0000000100562370, name="couchbase.context_info", copy=1, value=4300613872) at span.cc:409
           * frame #2: 0x0000000101e9db6e libcouchbase.2.dylib`::lcbtrace_span_add_tag_uint64(span=0x0000000100562370, name="couchbase.context_info", value=4300613872) at span.cc:99
           frame #3: 0x0000000101dd0da4 _libcouchbase.cpython-37m-darwin.so`pycbc_propagate_context_info(span=0x0000000100562840, dest=0x0000000100562370) at ext.c:2581:5 [opt]
           frame #4: 0x0000000101dd10e3 _libcouchbase.cpython-37m-darwin.so`pycbc_span_report(tracer=<unavailable>, span=0x0000000100562840) at ext.c:2761:5 [opt]
           frame #5: 0x0000000101e9d826 libcouchbase.2.dylib`::lcbtrace_span_finish(lcbtrace_SPAN *, uint64_t) [inlined] lcb::trace::Span::finish(this=0x0000000100562840, now=<unavailable>) at span.cc:366:9
           frame #6: 0x0000000101e9d801 libcouchbase.2.dylib`::lcbtrace_span_finish(span=0x0000000100562840, now=<unavailable>) at span.cc:71
           frame #7: 0x0000000101e572be libcouchbase.2.dylib`::mcreq_dispatch_response(mc_PIPELINE *, mc_PACKET *, lcb::MemcachedResponse *, lcb_STATUS) at handler.cc:481:5
           frame #8: 0x0000000101e57118 libcouchbase.2.dylib`::mcreq_dispatch_response(pipeline=0x0000000100437cc0, req=<unavailable>, res=0x00007ffeefbfe130, immerr=LCB_SUCCESS) at handler.cc:1201
           frame #9: 0x0000000101e8eaf9 libcouchbase.2.dylib`lcb::Server::try_read(this=0x0000000100437cc0, ctx=<unavailable>, ior=0x0000000100443a58) at mcserver.cc:0
           frame #10: 0x0000000101e902ae libcouchbase.2.dylib`on_read(ctx=0x0000000100443a10, (null)=<unavailable>) at mcserver.cc:579:26
           frame #11: 0x0000000101e28cf9 libcouchbase.2.dylib`E_handler [inlined] invoke_read_cb(ctx=<unavailable>, nb=47) at ctx.c:255:5
           frame #12: 0x0000000101e28cde libcouchbase.2.dylib`E_handler(sock=<unavailable>, which=2, arg=0x0000000100443a10) at ctx.c:282
           frame #13: 0x0000000101e12122 libcouchbase.2.dylib`run_loop(io=0x0000000101814100, is_tick=0) at plugin-select.c:309:17
           frame #14: 0x0000000101e9d6dd libcouchbase.2.dylib`::lcb_wait(instance=0x000000010181d950, flags=<unavailable>) at wait.cc:109:5
           frame #15: 0x0000000101ddb172 _libcouchbase.cpython-37m-darwin.so`pycbc_common_vars_wait [inlined] pycbc_oputil_wait_common(self=<unavailable>, context=<unavailable>) at oputil.c:702:5 [opt]
           frame #16: 0x0000000101ddb132 _libcouchbase.cpython-37m-darwin.so`pycbc_common_vars_wait(cv=0x00007ffeefbfe6c0, self=0x0000000104d5ad70, context=<unavailable>) at oputil.c:205 [opt]
           frame #17: 0x0000000101dd45cb _libcouchbase.cpython-37m-darwin.so`get_common(self=0x0000000104d5ad70, args=<unavailable>, kwargs=0x0000000104daf320, optype=<unavailable>, argopts=1, context=<unavailable>) at get.c:420:15 [opt]
           frame #18: 0x0000000101dd4926 _libcouchbase.cpython-37m-darwin.so`pycbc_Bucket_exists(self=0x0000000104d5ad70, args=<unavailable>, kwargs=0x0000000104daf320) at get.c:575:1 [opt]
           frame #19:

          What is interesting is this is in lcb. But, we have a hand in the data structures. Looking to see what exactly is wrong, and I see:

          frame #0: 0x0000000101e9dba7 libcouchbase.2.dylib`::lcbtrace_span_add_tag_uint64(lcbtrace_SPAN *, const char *, uint64_t) [inlined] sllist_append(list=0x00000001005623b8, item=0x0000000100562fe0) at sllist-inl.h:148:26
             145 	        item->next = NULL;
             146 	    } else {
             147 	        slist_sanity_insert(list, item);
          -> 148 	        list->last->next = item;
             149 	        list->last = item;
             150 	    }
             151 	    item->next = NULL;
          (lldb) frame variable list
          (sllist_root *) list = 0x00000001005623b8
          (lldb) frame variable *list
          (sllist_root) *list = {
            first_prev = {
              next = 0x3930623739643431
            }
            last = 0x386535653563322f
          }
          (lldb) frame variable *list->last
          (sllist_node) *list->last = {
            next = <read memory from 0x386535653563322f failed (0 of 8 bytes read)>
           
          }
          

          so the list is bad. Looking at how it gets there (though the bindings), will determine if maybe it is LCB...

          david.kelly David Kelly added a comment - - edited   * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x0000000101e9dba7 libcouchbase.2.dylib`::lcbtrace_span_add_tag_uint64(lcbtrace_SPAN *, const char *, uint64_t) [inlined] sllist_append(list=0x00000001005623b8, item=0x0000000100562fe0) at sllist-inl.h:148:26 frame #1: 0x0000000101e9db9e libcouchbase.2.dylib`::lcbtrace_span_add_tag_uint64(lcbtrace_SPAN *, const char *, uint64_t) [inlined] lcb::trace::Span::add_tag(this=0x0000000100562370, name="couchbase.context_info", copy=1, value=4300613872) at span.cc:409 * frame #2: 0x0000000101e9db6e libcouchbase.2.dylib`::lcbtrace_span_add_tag_uint64(span=0x0000000100562370, name="couchbase.context_info", value=4300613872) at span.cc:99 frame #3: 0x0000000101dd0da4 _libcouchbase.cpython-37m-darwin.so`pycbc_propagate_context_info(span=0x0000000100562840, dest=0x0000000100562370) at ext.c:2581:5 [opt] frame #4: 0x0000000101dd10e3 _libcouchbase.cpython-37m-darwin.so`pycbc_span_report(tracer=<unavailable>, span=0x0000000100562840) at ext.c:2761:5 [opt] frame #5: 0x0000000101e9d826 libcouchbase.2.dylib`::lcbtrace_span_finish(lcbtrace_SPAN *, uint64_t) [inlined] lcb::trace::Span::finish(this=0x0000000100562840, now=<unavailable>) at span.cc:366:9 frame #6: 0x0000000101e9d801 libcouchbase.2.dylib`::lcbtrace_span_finish(span=0x0000000100562840, now=<unavailable>) at span.cc:71 frame #7: 0x0000000101e572be libcouchbase.2.dylib`::mcreq_dispatch_response(mc_PIPELINE *, mc_PACKET *, lcb::MemcachedResponse *, lcb_STATUS) at handler.cc:481:5 frame #8: 0x0000000101e57118 libcouchbase.2.dylib`::mcreq_dispatch_response(pipeline=0x0000000100437cc0, req=<unavailable>, res=0x00007ffeefbfe130, immerr=LCB_SUCCESS) at handler.cc:1201 frame #9: 0x0000000101e8eaf9 libcouchbase.2.dylib`lcb::Server::try_read(this=0x0000000100437cc0, ctx=<unavailable>, ior=0x0000000100443a58) at mcserver.cc:0 frame #10: 0x0000000101e902ae libcouchbase.2.dylib`on_read(ctx=0x0000000100443a10, (null)=<unavailable>) at mcserver.cc:579:26 frame #11: 0x0000000101e28cf9 libcouchbase.2.dylib`E_handler [inlined] invoke_read_cb(ctx=<unavailable>, nb=47) at ctx.c:255:5 frame #12: 0x0000000101e28cde libcouchbase.2.dylib`E_handler(sock=<unavailable>, which=2, arg=0x0000000100443a10) at ctx.c:282 frame #13: 0x0000000101e12122 libcouchbase.2.dylib`run_loop(io=0x0000000101814100, is_tick=0) at plugin-select.c:309:17 frame #14: 0x0000000101e9d6dd libcouchbase.2.dylib`::lcb_wait(instance=0x000000010181d950, flags=<unavailable>) at wait.cc:109:5 frame #15: 0x0000000101ddb172 _libcouchbase.cpython-37m-darwin.so`pycbc_common_vars_wait [inlined] pycbc_oputil_wait_common(self=<unavailable>, context=<unavailable>) at oputil.c:702:5 [opt] frame #16: 0x0000000101ddb132 _libcouchbase.cpython-37m-darwin.so`pycbc_common_vars_wait(cv=0x00007ffeefbfe6c0, self=0x0000000104d5ad70, context=<unavailable>) at oputil.c:205 [opt] frame #17: 0x0000000101dd45cb _libcouchbase.cpython-37m-darwin.so`get_common(self=0x0000000104d5ad70, args=<unavailable>, kwargs=0x0000000104daf320, optype=<unavailable>, argopts=1, context=<unavailable>) at get.c:420:15 [opt] frame #18: 0x0000000101dd4926 _libcouchbase.cpython-37m-darwin.so`pycbc_Bucket_exists(self=0x0000000104d5ad70, args=<unavailable>, kwargs=0x0000000104daf320) at get.c:575:1 [opt] frame #19: What is interesting is this is in lcb. But, we have a hand in the data structures. Looking to see what exactly is wrong, and I see: frame #0: 0x0000000101e9dba7 libcouchbase.2.dylib`::lcbtrace_span_add_tag_uint64(lcbtrace_SPAN *, const char *, uint64_t) [inlined] sllist_append(list=0x00000001005623b8, item=0x0000000100562fe0) at sllist-inl.h:148:26 145 item->next = NULL; 146 } else { 147 slist_sanity_insert(list, item); -> 148 list->last->next = item; 149 list->last = item; 150 } 151 item->next = NULL; (lldb) frame variable list (sllist_root *) list = 0x00000001005623b8 (lldb) frame variable *list (sllist_root) *list = { first_prev = { next = 0x3930623739643431 } last = 0x386535653563322f } (lldb) frame variable *list->last (sllist_node) *list->last = { next = <read memory from 0x386535653563322f failed (0 of 8 bytes read)>   } so the list is bad. Looking at how it gets there (though the bindings), will determine if maybe it is LCB...
          david.kelly David Kelly added a comment - - edited

          Seems it happens intermittently on 6.0.3, 6.5 and 6.5-DP. Seems to be an lcb issue given how we do things, making sure then will ping Sergey

          david.kelly David Kelly added a comment - - edited Seems it happens intermittently on 6.0.3, 6.5 and 6.5-DP. Seems to be an lcb issue given how we do things, making sure then will ping Sergey
          david.kelly David Kelly added a comment -

          Created a bug on lcb for this – CCBC-1215

           

          I believe all we can do is disable it in tests (done), and release note this for GA.

          david.kelly David Kelly added a comment - Created a bug on lcb for this –  CCBC-1215   I believe all we can do is disable it in tests (done), and release note this for GA.
          david.kelly David Kelly added a comment -

          Looks like CCBC-1215's fix was just merged, will enable the tests again.

          david.kelly David Kelly added a comment - Looks like CCBC-1215 's fix was just merged, will enable the tests again.

          People

            david.kelly David Kelly
            david.kelly David Kelly
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty