XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 2.2.0
    • 2.0.6, 2.1.3
    • library
    • Security Level: Public
    • None

    Description

      When the couchbase server is stopped while application is connected and accessing a bucket, the application is stopped by an abort in libcouchbase.
      This happens both using versions 2.0.6 and 2.1.3 of the client.

      (version 2.0.6)
      (gdb) where
      #0 0x0000003083c328a5 in raise () from /lib64/libc.so.6
      #1 0x0000003083c34085 in abort () from /lib64/libc.so.6
      #2 0x00007ffff7bca48a in lcb_purge_single_server (server=0x7f4a20, error=LCB_NETWORK_ERROR) at src/server.c:322
      #3 0x00007ffff7bca67d in lcb_failout_server (server=0x7f4a20, error=LCB_NETWORK_ERROR) at src/server.c:368
      #4 0x00007ffff7bbd0a2 in do_fill_input_buffer (c=0x7f4a20) at src/event.c:55
      #5 0x00007ffff7bbd9e0 in do_read_data (c=0x7f4a20, allow_read=2) at src/event.c:299
      #6 0x00007ffff7bbdc77 in lcb_server_event_handler (sock=33, which=2, arg=0x7f4a20) at src/event.c:387
      #7 0x00007ffff7de8b44 in event_process_active (base=0x7c72f0, flags=<value optimized out>) at event.c:385
      #8 event_base_loop (base=0x7c72f0, flags=<value optimized out>) at event.c:525
      #9 0x000000000046ed72 in con_thr (arg=0x6f31b0) at src/store_couchbase.c:2519
      #10 0x0000003084407851 in start_thread () from /lib64/libpthread.so.0
      #11 0x0000003083ce890d in clone () from /lib64/libc.so.6
      (gdb) frame 2
      #2 0x00007ffff7bca48a in lcb_purge_single_server (server=0x7f4a20, error=LCB_NETWORK_ERROR) at src/server.c:322
      322 abort();
      (gdb) p req
      $1 = {
      request =

      { magic = 128 '\200', opcode = 148 '\224', keylen = 2816, extlen = 0 '\000', datatype = 0 '\000', vbucket = 52993, bodylen = 184549376, opaque = 202, cas = 0 }

      ,
      bytes = "\200\224\000\v\000\000\001\317\000\000\000\v\312\000\000\000\000\000\000\000\000\000\000"
      }

      The opcode 148 corresponds with CMD_GET_LOCKED, which matches one of the operations the application does.

      The switch in server.c lcb_purge_single_server() seems to be missing some commands, not only the GET_LOCKED, causing the abort() in the default clause to trigger on network errors.
      Also UNLOCK is not handled in the switch.

      Adding the missing opcode for a locked get prevents the crash for my application:

      — src/server.c.orig 2013-09-27 15:26:12.729822014 +0200
      +++ src/server.c 2013-09-27 15:25:50.504008318 +0200
      @@ -174,6 +174,7 @@
      switch (req.request.opcode) {
      case PROTOCOL_BINARY_CMD_NOOP:
      break;
      + case CMD_GET_LOCKED:
      case PROTOCOL_BINARY_CMD_GAT:
      case PROTOCOL_BINARY_CMD_GATQ:
      case PROTOCOL_BINARY_CMD_GET:

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            avsej Sergey Avseyev
            penacho Robert Groenenberg
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty