Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.4
    • Component/s: library
    • Security Level: Public
    • Labels:
      None
    • Environment:
      libcouchbase built from git://github.com/couchbase/libcouchbase.git master branch at commit 9cfda9d40a270fc3dd05018eb16a2089c83bf24a

      CentOS 5.6, couchbase server 1.8.0, 2-node cluster

      Description

      When one node is down and dropping packets, a store op trying to write to that node times out (OK) but libcouchbase_wait doesn't return (not OK).

      [root@localhost cb-crank]# gcc -g -Wall -W -I$I/include -L$I/lib t.c -lcouchbase -o t
      [root@localhost cb-crank]# export LD_LIBRARY_PATH=$I/lib
      [root@localhost cb-crank]# ./t
      Stored: hello (6)
      Got: hellow (7) = world! (7)
      [root@localhost cb-crank]# ./t hellow foo
      Stored: hellow (7)
      Got: hellow (7) = foo (4)
      [root@localhost cb-crank]# ./t xyz
      Got: hellow (7) = foo (4)
      Failed during store 'xyz': Operation timed out

      Setup: Start two nodes of Couchbase Server. In my test, they're on 10.4.2.13 and 10.4.2.14. Client (see attached code) is being run on host 10.4.2.14.

      On one node (10.4.2.13), simulate a network problem with iptables:

      iptables -I INPUT 1 -p tcp --sport 1000:60000 -j DROP
      iptables -I INPUT 2 -p tcp --dport 1000:60000 -j DROP

      Run the sample app a few times with different keys until you hit one that needs to access the down node. As seen in the output above, the storage callback is called with a timeout error, but the libcouchbase_wait() call never returns. Attach debugger to the client and view thread stacks:

      1. gdb -p 21746
        GNU gdb (GDB) CentOS (7.0.1-42.el5.centos)
        Copyright (C) 2009 Free Software Foundation, Inc.
        License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
        This is free software: you are free to change and redistribute it.
        There is NO WARRANTY, to the extent permitted by law. Type "show copying"
        and "show warranty" for details.
        This GDB was configured as "x86_64-redhat-linux-gnu".
        For bug reporting instructions, please see:
        <http://www.gnu.org/software/gdb/bugs/>.
        Attaching to process 21746
        Reading symbols from /root/code/cb-crank/t...done.
        Reading symbols from /root/code/install/lib/libcouchbase.so.1...done.
        Loaded symbols for /root/code/install/lib/libcouchbase.so.1
        Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
        Loaded symbols for /lib64/libc.so.6
        Reading symbols from /root/code/install/lib/libvbucket.so.1...done.
        Loaded symbols for /root/code/install/lib/libvbucket.so.1
        Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
        Loaded symbols for /lib64/libdl.so.2
        Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
        Loaded symbols for /lib64/ld-linux-x86-64.so.2
        Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
        Loaded symbols for /lib64/libm.so.6
        Reading symbols from /root/code/install/lib/libcouchbase_libevent.so.1...done.
        Loaded symbols for /root/code/install/lib/libcouchbase_libevent.so.1
        Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...(no debugging symbols found)...done.
        Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5
        Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
        Loaded symbols for /lib64/librt.so.1
        Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
        [Thread debugging using libthread_db enabled]
        Loaded symbols for /lib64/libpthread.so.0

      warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff822c0000
      0x0000003407cd3603 in __epoll_wait_nocancel () from /lib64/libc.so.6
      (gdb) where
      #0 0x0000003407cd3603 in __epoll_wait_nocancel () from /lib64/libc.so.6
      #1 0x00002b0211713c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
      #2 0x00002b0211702a4c in event_base_loop () from /opt/couchbase/lib/libevent-2.0.so.5
      #3 0x0000000000400d60 in main (argc=1, argv=0x7fff822027f0) at t.c:113
      (gdb) thread apply all bt

      Thread 1 (Thread 0x2b02114eeaf0 (LWP 21746)):
      #0 0x0000003407cd3603 in __epoll_wait_nocancel () from /lib64/libc.so.6
      #1 0x00002b0211713c28 in ?? () from /opt/couchbase/lib/libevent-2.0.so.5
      #2 0x00002b0211702a4c in event_base_loop () from /opt/couchbase/lib/libevent-2.0.so.5
      #3 0x0000000000400d60 in main (argc=1, argv=0x7fff822027f0) at t.c:113
      (gdb) quit

      Line t.c:113 is libcouchbase_wait().

      Client compiled as:

      gcc -g -Wall -W -I$I/include -L$I/lib t.c -lcouchbase -o t

      1. cbget.c
        3 kB
        TimSmith
      2. t.c
        8 kB
        TimSmith
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

          • Assignee:
            avsej Sergey Avseyev
            Reporter:
            TimSmith Tim Smith (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes