Uploaded image for project: 'Couchbase C client library libcouchbase'
  1. Couchbase C client library libcouchbase
  2. CCBC-147

function lcb_wait works infinitly after trying to connect (by calling lcb_connect) to Couchbase server that is on Pending state

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.0.4
    • Component/s: library
    • Security Level: Public
    • Labels:
      None
    • Environment:
      I reproduced a bug on Linux Rad Hat 5.0 x64 and on Windows XP x32
      Couchbase server version 1.8.1
      libcouchbase version 2.0.1

      Description

      Start server with one node.
      Create many buckets, that uses all allowed memory -> Server have to chnage status on Pending (In my situation server stay in Pending for all time and can't change it to Up)

      After that try to connect to one of the bucket by calling lcb_connect.
      Call lcb_wait to wait for connection is done.
      As a result lcb_wait work infinitly and timeout doesn't happen

      Below I share call stack in Red Hat Linux:
      #0 0x00000034350d4473 in __epoll_wait_nocancel () from /lib64/libc.so.6
      #1 0x00002ad81b1dccc9 in ?? () from ./lib/libevent-2.0.so.5
      #2 0x00002ad81b1c9cdc in event_base_loop () from ./lib/libevent-2.0.so.5
      #3 0x00002ad8174c1dc6 in lcb_wait () from ./lib/libcouchbase.so.2
      #4 0x000000000043c0e1 in Couchbase::connect (this=0x60df508) at couchbase_loader_source/couchbase.cpp:152
      #5 0x000000000043c8b7 in connect_to_bucket (cbase=..., config=..., bucket_name=...) at couchbase_loader_source/couchbase.cpp:478
      #6 0x0000000000440924 in couchbase_loader::writer_thread::writer_thread (this=0x60df480, config=..., bucket_name=..., dbh=...) at couchbase_loader_source/writer_thread.cpp:25
      #7 0x000000000042168c in main (argc=1, argv=0x7fff22162028) at couchbase_loader_source/couchbase_loader.cpp:175

      1. cloader.rar
        1.97 MB
        Haster
      2. NMakefile_new
        5 kB
        Haster

        Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          avsej Sergey Avseyev added a comment -

          Fixed invalid memory access in win32 plugin http://review.couchbase.org/24451

          Show
          avsej Sergey Avseyev added a comment - Fixed invalid memory access in win32 plugin http://review.couchbase.org/24451
          Hide
          Haster Haster added a comment - - edited

          Добавлена тестовая программа для воспроизведения проблемы

          Show
          Haster Haster added a comment - - edited Добавлена тестовая программа для воспроизведения проблемы
          Hide
          Haster Haster added a comment -

          Sergey, I reproduced this issue on Windows platform.
          Now it works more stable, but than I've tried to create 9 buckets in same time by executing 9 copies of my program 4 of them deadlocked in lcb_wait function.

          Show
          Haster Haster added a comment - Sergey, I reproduced this issue on Windows platform. Now it works more stable, but than I've tried to create 9 buckets in same time by executing 9 copies of my program 4 of them deadlocked in lcb_wait function.
          Hide
          Haster Haster added a comment -

          It looks good now. Thanks a lot

          Show
          Haster Haster added a comment - It looks good now. Thanks a lot
          Hide
          Haster Haster added a comment -

          My code is very big :one application server creates many processes (couchbase_loader)(one process for each cache) and whose processes operate with couchbase.
          But also they cooperate with oracle database...
          If you want I can share all code, but you need Oracle to run it. Or I can try to write more simple testcase...

          Also I will check patch tomorrow and share results

          Show
          Haster Haster added a comment - My code is very big :one application server creates many processes (couchbase_loader)(one process for each cache) and whose processes operate with couchbase. But also they cooperate with oracle database... If you want I can share all code, but you need Oracle to run it. Or I can try to write more simple testcase... Also I will check patch tomorrow and share results
          Hide
          avsej Sergey Avseyev added a comment -

          Possibly this patch has been fixed the issue https://github.com/couchbase/libcouchbase/commit/d4948192439e61a8cc23d5e8572e81db1aebef7f

          Could you verify with libcouchbase from master?

          To do so either pull and build the sources:

          git clone git://github.com/couchbase/libcouchbase.git
          cd libcouchbase
          ./config/autorun.sh && ./configure && make && sudo make install

          or install from snapsot deb/rpm repositories, like for example for recent ubuntus

          sudo wget -O/etc/apt/sources.list.d/couchbase-snapshot.list http://packages.couchbase.com/snapshot/ubuntu/couchbase-ubuntu1110.list
          sudo aptitude update
          sudo aptitude install libcouchbase-dev libcouchbase2-bin

          Show
          avsej Sergey Avseyev added a comment - Possibly this patch has been fixed the issue https://github.com/couchbase/libcouchbase/commit/d4948192439e61a8cc23d5e8572e81db1aebef7f Could you verify with libcouchbase from master? To do so either pull and build the sources: git clone git://github.com/couchbase/libcouchbase.git cd libcouchbase ./config/autorun.sh && ./configure && make && sudo make install or install from snapsot deb/rpm repositories, like for example for recent ubuntus sudo wget -O/etc/apt/sources.list.d/couchbase-snapshot.list http://packages.couchbase.com/snapshot/ubuntu/couchbase-ubuntu1110.list sudo aptitude update sudo aptitude install libcouchbase-dev libcouchbase2-bin
          Hide
          avsej Sergey Avseyev added a comment -

          could you post a piece of source code, demonstrating the issue?

          Show
          avsej Sergey Avseyev added a comment - could you post a piece of source code, demonstrating the issue?
          Hide
          Haster Haster added a comment -

          I've investigated this issue more deeply.
          The problem take place then connection timeout happend.

          connect
          lcb_wait
          lcb_io_run_event_loop (for Windows Platform)
          ...
          select

          if (ret == 0) <-- 0 then timeout happend

          { <-- Here our problems begin }

          After select returns zero code tries to reconnect in callback function(is it good idea? Maybe it is better to exit from loop and return error?), but some structures (as I think) are corrupted after that.
          After that event_loop calls select function, but timeout variable contain zero value (infinitly execution)...

          As I think, problem is near here

          Show
          Haster Haster added a comment - I've investigated this issue more deeply. The problem take place then connection timeout happend. connect lcb_wait lcb_io_run_event_loop (for Windows Platform) ... select if (ret == 0) <-- 0 then timeout happend { <-- Here our problems begin } After select returns zero code tries to reconnect in callback function(is it good idea? Maybe it is better to exit from loop and return error?), but some structures (as I think) are corrupted after that. After that event_loop calls select function, but timeout variable contain zero value (infinitly execution)... As I think, problem is near here
          Hide
          Haster Haster added a comment -

          I’ve investigated a little this problem and I have additional info.

          My error happened scenario:

          First of all I try to connect to some bucket (Primary for example) which is absent

          lcb_connect
          lcb_wait

          here I get error. Then I create bucket, destroy instance by calling lcb_destroy and try connect to bucket again:

          func_create_bucket()
          sleep(some_time)
          lcb_destroy()
          lcb_create()
          lcb_connect()
          lcb_wait()

          After that I receive 3 messages from my error_callback function, where errinfo is “Number of vBuckets must be a power of two > 0 and <= 65536”
          And deadlock happened in lcb_wait.

          Show
          Haster Haster added a comment - I’ve investigated a little this problem and I have additional info. My error happened scenario: First of all I try to connect to some bucket (Primary for example) which is absent lcb_connect lcb_wait here I get error. Then I create bucket, destroy instance by calling lcb_destroy and try connect to bucket again: func_create_bucket() sleep(some_time) lcb_destroy() lcb_create() lcb_connect() lcb_wait() After that I receive 3 messages from my error_callback function, where errinfo is “Number of vBuckets must be a power of two > 0 and <= 65536” And deadlock happened in lcb_wait.

            People

            • Assignee:
              avsej Sergey Avseyev
              Reporter:
              Haster Haster
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes