Uploaded image for project: 'Couchbase C client library libcouchbase'
  1. Couchbase C client library libcouchbase
  2. CCBC-158

Unable to store 1M objects in a loop ( Internal error (0x6), Unknown implicit send message op=1 , eventually crash )

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 2.0.1
    • Fix Version/s: None
    • Component/s: library
    • Security Level: Public
    • Labels:
      None
    • Environment:
      2 VMs running Couchbase 2.0 enterprise, with 2 buckets of 256MB each. libcouchbase from git ( latest pull 1/1/2013 ). Running own event loop calling event_base_loop in a separate thread

      Description

      When executing 1M lcb_store calls in a loop, storing items like "user0000001 = <16 hex bytes>" in the default bucket, I get a number of error callbacks saying "Internal error (0x6), Unknown implicit send message op=1" ( opcode 0x1 = PROTOCOL_BINARY_CMD_SET ). Finally, after a last "Internal error (0x6), Unknown implicit send message op=46" the process gets killed
      .
      Unfortunately I cannot tell where the crash happens, as running with gdb results in:
      Couchbase event loop starting/build/buildd/gdb-7.4-2012.04/gdb/infrun.c:3189: internal-error: handle_inferior_event: Assertion `inf' failed.
      A problem internal to GDB has been detected,
      further debugging may prove unreliable.

      After a number of attempts in both debug and release mode, I ended up with almost 350K out of 1M items entered. However, I don't manage to run the loop to its completion; something always breaks before it completes

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        mnunberg Mark Nunberg added a comment - - edited

        Looking at that manpage, the locking needs to be primarily focused on the state of the ringbuffers, meaning that two threads cannot be modifying the ringbuffer at any given time.

        Locking simply before scheduling operations is not sufficient because at the same time, the event loop might have woken up and the lcb_t would have started writing/reading data to/from the socket. A model similar to the one I outlined would work well - but that is pretty much the only way to get true async operations from libcouchbase from multiple threads. A different option would be to lock it whenever calling lcb_wait() but this can result in a deadlock if someone tries to use lcb_store from within a callback itself.

        Perhaps some stuff in this thread (i.e. jira thread) should be broken out into its own discussion about MT-safety and libcouchbase - the specific bits about maintaining ringbuffer/command log integrity are not immediately obvious; neither are its solutions

        Show
        mnunberg Mark Nunberg added a comment - - edited Looking at that manpage, the locking needs to be primarily focused on the state of the ringbuffers, meaning that two threads cannot be modifying the ringbuffer at any given time. Locking simply before scheduling operations is not sufficient because at the same time, the event loop might have woken up and the lcb_t would have started writing/reading data to/from the socket. A model similar to the one I outlined would work well - but that is pretty much the only way to get true async operations from libcouchbase from multiple threads. A different option would be to lock it whenever calling lcb_wait() but this can result in a deadlock if someone tries to use lcb_store from within a callback itself. Perhaps some stuff in this thread (i.e. jira thread) should be broken out into its own discussion about MT-safety and libcouchbase - the specific bits about maintaining ringbuffer/command log integrity are not immediately obvious; neither are its solutions
        Hide
        trond Trond Norbye added a comment -

        Earlier today I pushed through http://review.couchbase.org/#/c/23749/ giving you manual pages for "everything"... if you check out the lcb_attributes (looks better when you see it through man: http://review.couchbase.org/#/c/23749/3/man/svr4/man5/lcb_attributes.5 ) it'll describe how the MT thing fits together..

        Show
        trond Trond Norbye added a comment - Earlier today I pushed through http://review.couchbase.org/#/c/23749/ giving you manual pages for "everything"... if you check out the lcb_attributes (looks better when you see it through man: http://review.couchbase.org/#/c/23749/3/man/svr4/man5/lcb_attributes.5 ) it'll describe how the MT thing fits together..
        Hide
        jbemmel Jeroen van Bemmel added a comment -

        That would be possible, sure. For now I've decided to move away from using CouchBase/libcouchbase though, I'm only following up because I started this issue and to provide feedback to you guys.

        If you want to make it easier for people to use libcouchbase, from my perspective you could
        a) Document a bit better / more extensively what the implications are for the application. Basically the information from this thread would be good to have known upfront
        b) Implement something like you describe above

        I believe it could work well, the challenge is that the learning/adoption curve can be steep

        Show
        jbemmel Jeroen van Bemmel added a comment - That would be possible, sure. For now I've decided to move away from using CouchBase/libcouchbase though, I'm only following up because I started this issue and to provide feedback to you guys. If you want to make it easier for people to use libcouchbase, from my perspective you could a) Document a bit better / more extensively what the implications are for the application. Basically the information from this thread would be good to have known upfront b) Implement something like you describe above I believe it could work well, the challenge is that the learning/adoption curve can be steep
        Hide
        mnunberg Mark Nunberg added a comment -

        I would second the proposition of having a 'libcouchbase thread' that handles command and communicates using some kind of queue. The problem you are having seems to be socket buffer synchronization.

        Specifically whenever you schedule an lcb command (i.e. "lcb_store") it writes a packet to our own socket buffer. Considering libcouchbase is asynchronous, it will flush that buffer at some unknown time to the network (thereby modifying the socket buffer).

        Therefore even if you ensure that you lock your handle with things like pthread_mutex_lock and such, you will still encounter issues unless you can also control when the socket buffer is actually flushed (i.e. when 'lcb_wait' is called).

        As such, you'd probably need a libcouchbase "server" thread.

        One possible implementation is having this thread run an event framework (e.g. libev) which will drive libcouchbase. In addition to the thread running libcouchbase asynchronously, it will also accept command proxies over a queue (i.e. proxied requests for "lcb_store", "lcb_get", etc.).

        You may have this thread then schedule (within its own thread context) the command to libcouchbase. The thread will know about when there are commands available to shcedule by determining if there are items in the queue. This may be done in a non-blocking fashion by potentially constructing a 'socketpair', in which a byte of dummy data is written each time something si added to the queue.

        Subsequently, the "libcouchbase thread" will receive callbacks. You can proxy these callbacks out to your own user-defined callbacks.

        While this sounds complicated, I am guessing the whole thing (assuming you actually have a "Queue" implemented [ which being C++, you have several native structures to fit this model ]) shouldn't take up more than 300 lines of code.

        Show
        mnunberg Mark Nunberg added a comment - I would second the proposition of having a 'libcouchbase thread' that handles command and communicates using some kind of queue. The problem you are having seems to be socket buffer synchronization. Specifically whenever you schedule an lcb command (i.e. "lcb_store") it writes a packet to our own socket buffer. Considering libcouchbase is asynchronous, it will flush that buffer at some unknown time to the network (thereby modifying the socket buffer). Therefore even if you ensure that you lock your handle with things like pthread_mutex_lock and such, you will still encounter issues unless you can also control when the socket buffer is actually flushed (i.e. when 'lcb_wait' is called). As such, you'd probably need a libcouchbase "server" thread. One possible implementation is having this thread run an event framework (e.g. libev) which will drive libcouchbase. In addition to the thread running libcouchbase asynchronously, it will also accept command proxies over a queue (i.e. proxied requests for "lcb_store", "lcb_get", etc.). You may have this thread then schedule (within its own thread context) the command to libcouchbase. The thread will know about when there are commands available to shcedule by determining if there are items in the queue. This may be done in a non-blocking fashion by potentially constructing a 'socketpair', in which a byte of dummy data is written each time something si added to the queue. Subsequently, the "libcouchbase thread" will receive callbacks. You can proxy these callbacks out to your own user-defined callbacks. While this sounds complicated, I am guessing the whole thing (assuming you actually have a "Queue" implemented [ which being C++, you have several native structures to fit this model ]) shouldn't take up more than 300 lines of code.
        Hide
        ingenthr Matt Ingenthron added a comment -

        What is your code using? Something like /dev/poll on Solaris or epoll on linux or kqueue on BSD?

        If so, you can probably continue to use that and then configure the event lib to use the same thing underneath.

        Show
        ingenthr Matt Ingenthron added a comment - What is your code using? Something like /dev/poll on Solaris or epoll on linux or kqueue on BSD? If so, you can probably continue to use that and then configure the event lib to use the same thing underneath.

          People

          • Assignee:
            trond Trond Norbye
            Reporter:
            jbemmel Jeroen van Bemmel
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes