Uploaded image for project: 'Couchbase Ruby client library'
  1. Couchbase Ruby client library
  2. RCBC-111

SEGV during set operation when using/running ruby client multi-threaded

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.1
    • Fix Version/s: 1.3.0
    • Component/s: library
    • Security Level: Public
    • Labels:
      None
    • Environment:

      Description

      I have a ruby test program that creates a number of threads, and per thread takes a batch of documents, and stores/inserts them into couchbase. Each thread will be executing ruby code much like:

      1. Insert each record (Hash) into Couchbase
        records.each { |record| key = record[:id].to_s cb.set(key, record) if !key.nil? }

      The program starts, and subsequently SEGVs and yields a stack trace, for example:

      [New Thread 0x40fbf940 (LWP 13650)]
      [New Thread 0x41a33940 (LWP 13651)]
      [New Thread 0x40e58940 (LWP 13652)]
      [New Thread 0x400d5940 (LWP 13653)]
      [New Thread 0x40ed9940 (LWP 13654)]
      [New Thread 0x4020c940 (LWP 13655)]
      [New Thread 0x40977940 (LWP 13656)]
      [New Thread 0x40440940 (LWP 13657)]
      [New Thread 0x41b0c940 (LWP 13658)]
      [New Thread 0x40764940 (LWP 13659)]
      [New Thread 0x412f0940 (LWP 13660)]
      [New Thread 0x41422940 (LWP 13661)]
      [New Thread 0x40156940 (LWP 13662)]
      [New Thread 0x41cd4940 (LWP 13663)]
      [New Thread 0x41065940 (LWP 13664)]
      [New Thread 0x4202a940 (LWP 13665)]
      [New Thread 0x40c67940 (LWP 13666)]
      [New Thread 0x41dc2940 (LWP 13667)]
      [New Thread 0x41734940 (LWP 13668)]
      [New Thread 0x407e5940 (LWP 13669)]
      [New Thread 0x41b8d940 (LWP 13670)]
      [New Thread 0x417b5940 (LWP 13671)]
      [New Thread 0x41c0e940 (LWP 13672)]
      [New Thread 0x406b1940 (LWP 13673)]
      [New Thread 0x41e43940 (LWP 13674)]
      [New Thread 0x41f4c940 (LWP 13675)]
      [New Thread 0x40a5d940 (LWP 13676)]
      [New Thread 0x40866940 (LWP 13677)]
      [New Thread 0x40ade940 (LWP 13678)]
      [New Thread 0x40b5f940 (LWP 13679)]
      [New Thread 0x408e7940 (LWP 13680)]
      [New Thread 0x41ec4940 (LWP 13681)]
      [New Thread 0x40622940 (LWP 13682)]
      [New Thread 0x4164d940 (LWP 13683)]
      Fetching 1024 authors starting at offset 0
      [New Thread 0x42a2b940 (LWP 13685)]
      [Thread 0x42a2b940 (LWP 13685) exited]
      Fetching 1024 authors starting at offset 1024
      Fetching 1024 authors starting at offset 2048
      Fetching 1024 authors starting at offset 3072
      Fetching 1024 authors starting at offset 4096

      Program received signal SIGSEGV, Segmentation fault.
      [Switching to Thread 0x4020c940 (LWP 13655)]
      0x00002aaaac509a1d in cb_strip_key_prefix (bucket=0x2aaab4414300, key=46912654151960) at utils.c:471
      471 rb_str_update(key, 0, RSTRING_LEN(bucket->key_prefix_val), cb_vStrEmpty);
      (gdb) where
      #0 0x00002aaaac509a1d in cb_strip_key_prefix (bucket=0x2aaab4414300, key=46912654151960) at utils.c:471
      #1 0x00002aaaac50bd64 in cb_storage_callback (handle=<optimized out>, cookie=0x2aaab4415cf0, operation=LCB_SET, error=LCB_SUCCESS, resp=0x402099b0) at store.c:52
      #2 0x00002aaaac72dfd4 in ?? () from /usr/lib64/libcouchbase.so.2
      #3 0x00002aaaac72bb1a in ?? () from /usr/lib64/libcouchbase.so.2
      #4 0x00002aaaac504c5c in callbacks_run (callbacks=<optimized out>) at multithread_plugin.c:600
      #5 loop_run_select (argp=<optimized out>) at multithread_plugin.c:780
      #6 0x00002b0ebc3e8657 in rb_ensure (b_proc=0x2aaaac504aa0 <loop_run_select>, data1=46912653493520, e_proc=0x2aaaac504f30 <loop_select_cleanup>, data2=46912654152000) at eval.c:743
      #7 0x00002aaaac5039fc in loop_run (loop=<optimized out>) at multithread_plugin.c:1056
      #8 lcb_io_run_event_loop (iops=<optimized out>) at multithread_plugin.c:1177
      #9 0x00002aaaac736aaa in lcb_wait () from /usr/lib64/libcouchbase.so.2
      #10 0x00002aaaac50c275 in cb_bucket_store (self=<optimized out>, argv=<optimized out>, argc=<optimized out>, cmd=<optimized out>) at store.c:162
      #11 cb_bucket_set (argc=<optimized out>, argv=<optimized out>, self=<optimized out>) at store.c:270
      #12 0x00002b0ebc4ed2a8 in vm_call_cfunc (me=<optimized out>, blockptr=<optimized out>, recv=<optimized out>, num=<optimized out>, reg_cfp=<optimized out>, th=<optimized out>)
      at vm_insnhelper.c:402
      #13 vm_call_method (th=0x23a2dc0, cfp=0x2aaaad359a38, num=2, blockptr=0x1, flag=0, id=21448, me=0x20c7ae0, recv=33100200) at vm_insnhelper.c:528
      #14 0x00002b0ebc4f00f3 in vm_exec_core (th=0x23a2dc0, initial=<optimized out>) at insns.def:1006
      #15 0x00002b0ebc4f5889 in vm_exec (th=0x2aaab4414300) at vm.c:1147
      #16 0x00002b0ebc4f9e19 in invoke_block_from_c (th=0x23a2dc0, block=0x329afe0, self=30791040, argc=1, argv=0x0, blockptr=0x0, cref=0x0) at vm.c:558
      #17 0x00002b0ebc4fa2a7 in vm_yield (argv=<optimized out>, argc=<optimized out>, th=<optimized out>) at vm.c:588
      #18 rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm_eval.c:740
      #19 rb_yield (val=42406680) at vm_eval.c:750
      #20 0x00002b0ebc3b9c9c in rb_ary_each (ary=42907960) at array.c:1427
      #21 0x00002b0ebc4ed2a8 in vm_call_cfunc (me=<optimized out>, blockptr=<optimized out>, recv=<optimized out>, num=<optimized out>, reg_cfp=<optimized out>, th=<optimized out>)
      at vm_insnhelper.c:402
      #22 vm_call_method (th=0x23a2dc0, cfp=0x2aaaad359b40, num=0, blockptr=0x329afe1, flag=270, id=424, me=0x1ac7760, recv=42907960) at vm_insnhelper.c:528
      #23 0x00002b0ebc4f00f3 in vm_exec_core (th=0x23a2dc0, initial=<optimized out>) at insns.def:1006
      #24 0x00002b0ebc4f5889 in vm_exec (th=0x2aaab4414300) at vm.c:1147
      #25 0x00002b0ebc4f9e19 in invoke_block_from_c (th=0x23a2dc0, block=0x24947d0, self=27728360, argc=1, argv=0x2aaaad25a0d0, blockptr=0x0, cref=0x0) at vm.c:558
      #26 0x00002b0ebc4fa1ef in rb_vm_invoke_proc (th=0x23a2dc0, proc=0x24947d0, self=27728360, argc=1, argv=0x2aaaad25a0d0, blockptr=0x0) at vm.c:604
      #27 0x00002b0ebc3f01bd in proc_call (argc=1, argv=0x2aaaad25a0d0, procval=<optimized out>) at proc.c:556
      #28 0x00002b0ebc4ed2a8 in vm_call_cfunc (me=<optimized out>, blockptr=<optimized out>, recv=<optimized out>, num=<optimized out>, reg_cfp=<optimized out>, th=<optimized out>)
      at vm_insnhelper.c:402
      #29 vm_call_method (th=0x23a2dc0, cfp=0x2aaaad359cf8, num=1, blockptr=0x1, flag=2, id=5912, me=0x1b0a800, recv=33098120) at vm_insnhelper.c:528
      #30 0x00002b0ebc4f00f3 in vm_exec_core (th=0x23a2dc0, initial=<optimized out>) at insns.def:1006
      #31 0x00002b0ebc4f5889 in vm_exec (th=0x2aaab4414300) at vm.c:1147
      #32 0x00002b0ebc4f9e19 in invoke_block_from_c (th=0x23a2dc0, block=0x2aaaad359e28, self=33097800, argc=0, argv=0x0, blockptr=0x0, cref=0x0) at vm.c:558
      #33 0x00002b0ebc4fa3f8 in vm_yield (argv=<optimized out>, argc=<optimized out>, th=<optimized out>) at vm.c:588
      #34 rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm_eval.c:740
      #35 loop_i () at vm_eval.c:798
      #36 0x00002b0ebc3e8a4c in rb_rescue2 (b_proc=0x2b0ebc4fa3c0 <loop_i>, data1=0, r_proc=0x0, data2=0) at eval.c:646
      #37 0x00002b0ebc4e8d2e in rb_f_loop (self=33097800) at vm_eval.c:826
      #38 0x00002b0ebc4ed2a8 in vm_call_cfunc (me=<optimized out>, blockptr=<optimized out>, recv=<optimized out>, num=<optimized out>, reg_cfp=<optimized out>, th=<optimized out>)
      at vm_insnhelper.c:402
      #39 vm_call_method (th=0x23a2dc0, cfp=0x2aaaad359e00, num=0, blockptr=0x2aaaad359e29, flag=8, id=2856, me=0x1a9a1a0, recv=33097800) at vm_insnhelper.c:528
      #40 0x00002b0ebc4f00f3 in vm_exec_core (th=0x23a2dc0, initial=<optimized out>) at insns.def:1006
      #41 0x00002b0ebc4f5889 in vm_exec (th=0x2aaab4414300) at vm.c:1147
      #42 0x00002b0ebc4f9e19 in invoke_block_from_c (th=0x23a2dc0, block=0x2aaaad359f30, self=33097800, argc=1, argv=0x0, blockptr=0x0, cref=0x0) at vm.c:558
      #43 0x00002b0ebc4fa341 in rb_yield_0 (argv=<optimized out>, argc=<optimized out>) at vm.c:588
      #44 catch_i (tag=2082830, data=<optimized out>) at vm_eval.c:1459
      #45 0x00002b0ebc4e7943 in rb_catch_obj (tag=2082830, func=0x2b0ebc4fa300 <catch_i>, data=0) at vm_eval.c:1534
      #46 0x00002b0ebc4e8bdd in rb_f_catch (argc=<optimized out>, argv=<optimized out>) at vm_eval.c:1510
      #47 0x00002b0ebc4ed2a8 in vm_call_cfunc (me=<optimized out>, blockptr=<optimized out>, recv=<optimized out>, num=<optimized out>, reg_cfp=<optimized out>, th=<optimized out>)
      at vm_insnhelper.c:402
      #48 vm_call_method (th=0x23a2dc0, cfp=0x2aaaad359f08, num=1, blockptr=0x2aaaad359f31, flag=8, id=2840, me=0x1a99e20, recv=33097800) at vm_insnhelper.c:528
      #49 0x00002b0ebc4f00f3 in vm_exec_core (th=0x23a2dc0, initial=<optimized out>) at insns.def:1006
      #50 0x00002b0ebc4f5889 in vm_exec (th=0x2aaab4414300) at vm.c:1147
      #51 0x00002b0ebc4f9e19 in invoke_block_from_c (th=0x23a2dc0, block=0x23f5230, self=33097800, argc=0, argv=0x0, blockptr=0x0, cref=0x0) at vm.c:558
      #52 0x00002b0ebc4fa1ef in rb_vm_invoke_proc (th=0x23a2dc0, proc=0x23f5230, self=33097800, argc=0, argv=0x224d2d0, blockptr=0x0) at vm.c:604
      #53 0x00002b0ebc501bd2 in thread_start_func_2 (th=0x23a2dc0, stack_start=0x4020c128) at thread.c:450
      #54 0x00002b0ebc501cef in thread_start_func_1 (th_ptr=<optimized out>) at thread_pthread.c:394
      #55 0x0000003cb560677d in start_thread () from /lib64/libpthread.so.0
      #56 0x0000003cb46d3c1d in clone () from /lib64/libc.so.6
      #57 0x0000000000000000 in ?? ()

      Note that this behavior can be recreated consistently.

        Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          charlez Charles Leu created issue -
          Hide
          charlez Charles Leu added a comment -

          Notes and Questions:
          1) Running my test single threaded does not recreate this problem.
          2) Running my test multi threaded yields this problem reliably (100% of the time so far).
          3) To date I've only used/tested Couchbase via the Ruby client library.
          4) How many other developers use the Ruby client library, and of those that use it, are their apps multi-threaded?
          5) I noted that the tests for the Ruby client library didn't appear to have any that are multi-threaded.
          6) Do any of your other client tests using threading, and attempt to perform set operations simultaneously?
          7) Perhaps the 'Component" for this issue be changed to Ruby Client Library; please do so if you believe that is more correct.

          Show
          charlez Charles Leu added a comment - Notes and Questions: 1) Running my test single threaded does not recreate this problem. 2) Running my test multi threaded yields this problem reliably (100% of the time so far). 3) To date I've only used/tested Couchbase via the Ruby client library. 4) How many other developers use the Ruby client library, and of those that use it, are their apps multi-threaded? 5) I noted that the tests for the Ruby client library didn't appear to have any that are multi-threaded. 6) Do any of your other client tests using threading, and attempt to perform set operations simultaneously? 7) Perhaps the 'Component" for this issue be changed to Ruby Client Library; please do so if you believe that is more correct.
          Hide
          ingenthr Matt Ingenthron added a comment -

          Valid questions Charles. Let me pass this to our Ruby client lead and have him look it over. We can move it to the RCBC project if it's in fact different.

          Show
          ingenthr Matt Ingenthron added a comment - Valid questions Charles. Let me pass this to our Ruby client lead and have him look it over. We can move it to the RCBC project if it's in fact different.
          Hide
          ingenthr Matt Ingenthron added a comment -

          Sergey: Can you look at this to see if it should be moved to RCBC?

          Show
          ingenthr Matt Ingenthron added a comment - Sergey: Can you look at this to see if it should be moved to RCBC?
          ingenthr Matt Ingenthron made changes -
          Field Original Value New Value
          Assignee Matt Ingenthron [ ingenthr ] Sergey Avseyev [ avsej ]
          ingenthr Matt Ingenthron made changes -
          Project Couchbase Server [ 10010 ] Couchbase Ruby client library [ 10048 ]
          Key MB-7544 RCBC-111
          Affects Version/s 1.2.1 [ 10450 ]
          Affects Version/s 2.0 [ 10114 ]
          Sprint Status Current Sprint
          Component/s library [ 10130 ]
          Component/s clients [ 10042 ]
          ingenthr Matt Ingenthron made changes -
          Link This issue is duplicated by RCBC-110 [ RCBC-110 ]
          Hide
          charlez Charles Leu added a comment -

          I've learned that this particular issue is/was due to the sharing of the Couchbase connection amongst threads. When the test app uses thread-local storage to house/use a connection per thread, things work without ado.

          Perhaps you want to put logic in place (e.g. a monitor per connection) to ensure connection sharing doesn't lead to a SEGV or other problem (e.g. deadlock). I guess the client could also implement such logic, however using thread-local storage with multiple connections is clearly a better option.

          Note that I'm fine with things as they are. However if the client lib had the aforementioned logic in place, it would have saved us some time (i.e. on both my end and yours).

          Thank you for your consideration and support.

          Show
          charlez Charles Leu added a comment - I've learned that this particular issue is/was due to the sharing of the Couchbase connection amongst threads. When the test app uses thread-local storage to house/use a connection per thread, things work without ado. Perhaps you want to put logic in place (e.g. a monitor per connection) to ensure connection sharing doesn't lead to a SEGV or other problem (e.g. deadlock). I guess the client could also implement such logic, however using thread-local storage with multiple connections is clearly a better option. Note that I'm fine with things as they are. However if the client lib had the aforementioned logic in place, it would have saved us some time (i.e. on both my end and yours). Thank you for your consideration and support.
          Hide
          avsej Sergey Avseyev added a comment -

          Hi, Charles.

          As you pointed out above, the libcouchbase handle itself isn't supposed to be shared between threads, and currently the caller should protect the connection instance either by copying it with Bucket#dup for each thread, which will lead to establishing new connection, or implementing thread-safe wrapper which will make sure that the handle won't be accessed from several threads.

          Show
          avsej Sergey Avseyev added a comment - Hi, Charles. As you pointed out above, the libcouchbase handle itself isn't supposed to be shared between threads, and currently the caller should protect the connection instance either by copying it with Bucket#dup for each thread, which will lead to establishing new connection, or implementing thread-safe wrapper which will make sure that the handle won't be accessed from several threads.
          Hide
          mje113 Mike Evans added a comment -

          Interesting, just ran into this segfault in the Ruby client 1.2.2. I also have a multi-threaded app where each thread needs to do get and set operations against CB. I get that we shouldn't be sharing the handle between threads, and the Thread.current storage in 1.2.2 does fix this for connections initiated from the Couchbase module.

          The packaged CouchbaseStore ActiveSupport::Cache implementation however is not threadsafe in this regard--it creates one connection to the bucket and shares that across any threads, potentially resulting in the segfault.

          The fix is rudimentary, so look for a pull request shortly.

          Show
          mje113 Mike Evans added a comment - Interesting, just ran into this segfault in the Ruby client 1.2.2. I also have a multi-threaded app where each thread needs to do get and set operations against CB. I get that we shouldn't be sharing the handle between threads, and the Thread.current storage in 1.2.2 does fix this for connections initiated from the Couchbase module. The packaged CouchbaseStore ActiveSupport::Cache implementation however is not threadsafe in this regard--it creates one connection to the bucket and shares that across any threads, potentially resulting in the segfault. The fix is rudimentary, so look for a pull request shortly.
          Hide
          avsej Sergey Avseyev added a comment -

          Thanks, will be glad to review

          Show
          avsej Sergey Avseyev added a comment - Thanks, will be glad to review
          Hide
          mje113 Mike Evans added a comment -

          Just pushed:
          http://review.couchbase.org/#/c/25294/

          I'm wondering if it wouldn't be helpful in some circumstances to have the option of an imposed mutex/monitor instead of a thread local variable in the Couchbase module itself. Thinking about my own app-dozens of Couchbase Models, each with their own Bucket connection-then potentially dozens of threads. Seems like that would be spinning up a lot of connections to the server per-process.

          Show
          mje113 Mike Evans added a comment - Just pushed: http://review.couchbase.org/#/c/25294/ I'm wondering if it wouldn't be helpful in some circumstances to have the option of an imposed mutex/monitor instead of a thread local variable in the Couchbase module itself. Thinking about my own app- dozens of Couchbase Models, each with their own Bucket connection -then potentially dozens of threads. Seems like that would be spinning up a lot of connections to the server per-process.
          Hide
          avsej Sergey Avseyev added a comment -

          Yeah, I'm planning to implement pooling there, to allow to re-use and share connections if it is possible, but once you will store docs in different buckets, it will require separate connection per bucket

          Show
          avsej Sergey Avseyev added a comment - Yeah, I'm planning to implement pooling there, to allow to re-use and share connections if it is possible, but once you will store docs in different buckets, it will require separate connection per bucket
          Hide
          mje113 Mike Evans added a comment -

          Pushed a Connection Pool branch for review:

          http://review.couchbase.org/#/c/26028/
          which relies on:
          https://github.com/mperham/connection_pool

          It's mainly targeted at ActiveSupport::Cache::CouchbaseStore, however could be used more generically for any Couchbase bucket connections:

          @pool = Couchbase::ConnectionPool.new # => default connection args and 5 connections in the pool
          @pool.set('foo', 'bar') # => checks our a connection to run set on--will block until a connection is available
          @pool.get('foo') # => 'bar'

          I think I implemented all operations but might have missed some.

          Show
          mje113 Mike Evans added a comment - Pushed a Connection Pool branch for review: http://review.couchbase.org/#/c/26028/ which relies on: https://github.com/mperham/connection_pool It's mainly targeted at ActiveSupport::Cache::CouchbaseStore, however could be used more generically for any Couchbase bucket connections: @pool = Couchbase::ConnectionPool.new # => default connection args and 5 connections in the pool @pool.set('foo', 'bar') # => checks our a connection to run set on--will block until a connection is available @pool.get('foo') # => 'bar' I think I implemented all operations but might have missed some.
          Show
          avsej Sergey Avseyev added a comment - Merged to upstream https://github.com/couchbase/couchbase-ruby-client/commit/5454381ca3aa7ee8a02e41e0d8ff87e8408504a2
          avsej Sergey Avseyev made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 1.3.0 [ 10375 ]
          Resolution Fixed [ 1 ]
          avsej Sergey Avseyev made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          brett19 Brett Lawson made changes -
          Workflow jira [ 23667 ] Couchbase SDK Workflow [ 44894 ]

            People

            • Assignee:
              avsej Sergey Avseyev
              Reporter:
              charlez Charles Leu
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes