Uploaded image for project: 'Couchbase C client library libcouchbase'
  1. Couchbase C client library libcouchbase
  2. CCBC-612

segfault in libcouchbase.so.2.0.26 the moment rebalance is started with index node present.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 2.5.1
    • 2.5.0
    • library
    • Security Level: Public
    • None

    Description

      Problem: Spring's benchmark_ops.py would segfault the moment rebalance starts on a cluster that has an index node present.

      Steps to reproduce:

      *) Create a cluster with a dedicated index node in it. I have: KV, KV, index, and KV nodes.
      *) Start benchmark_ops.py. Check that the bucket has ops/sec going on.
      *) Add or remove a KV node.
      *) click rebalance.
      *) benchmark_ops.py segfaults

      Analysis:

      [ gdb ]

      I was able to run GDB to see that the last breath happens with libcouchbase.so.2

      #0 0x00007ffff6ee72ba in __strcmp_sse42 () from /lib64/libc.so.6
      #1 0x00007fffe940bfb2 in ?? () from /usr/lib64/libcouchbase.so.2
      #2 0x00007fffe940021f in ?? () from /usr/lib64/libcouchbase.so.2
      #3 0x00007fffe93f1084 in ?? () from /usr/lib64/libcouchbase.so.2
      #4 0x00007fffe9649e43 in event_fire_common (ev=0x7fffe84ef808, args=<value optimized out>)
      at src/iops.c:83
      #5 Event_on_ready (ev=0x7fffe84ef808, args=<value optimized out>) at src/iops.c:106
      #6 0x00007ffff7af38d5 in call_function (f=<value optimized out>, throwflag=<value optimized out>)
      at Python/ceval.c:4033
      #7 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
      #8 0x00007ffff7af4bde in PyEval_EvalCodeEx (co=0x7fffe8724d30, globals=<value optimized out>,
      locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x7ffff7fad068,
      kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3265
      #9 0x00007ffff7a728b8 in function_call (func=0x7fffe8729050, arg=0x7fffe8167150, kw=0x7fffe8238398)
      at Objects/funcobject.c:526
      #10 0x00007ffff7a43283 in PyObject_Call (func=0x7fffe8729050, arg=<value optimized out>,
      kw=<value optimized out>) at Objects/abstract.c:2529

      [ var log messages ]

      /var/log/messages matches gdb backtrace :

      May 21 16:15:27 dkaovm1 kernel: python[29316]: segfault at 0 ip 00007f8a634092ba sp 00007fffbd00f0c8 error 4 in libc-2.12.so[7f8a632e1000+18a000]

      [ 64bit libcouchbase.so locations ]

      [root@dkaovm1 lib64]# ls -la | grep couchbase
      -rwxr-xr-x 1 root root 9264 May 13 11:16 libcouchbase_libevent.so
      lrwxrwxrwx 1 root root 17 May 21 16:00 libcouchbase.so -> libcouchbase.so.2
      lrwxrwxrwx 1 root root 22 May 21 16:00 libcouchbase.so.2 -> libcouchbase.so.2.0.26
      -rwxr-xr-x 1 root root 376016 May 13 11:16 libcouchbase.so.2.0.26

      [ libcouchbase.so.2.0.26 is provided by the 2.5 SDK ]

      libcouchbase2-core-2.5.0-1.el6.x86_64 : Couchbase Client & Protocol Library (core)
      Repo : installed
      Matched from:
      Other : Provides-match: /usr/lib64/libcouchbase.so.2.0.26

      Overall, the segfault is reproducible manually via benchmark_ops.py and automatically via perfrunner (both use Spring) on two separate clients, one on Hermes' client with Ubuntu 13 (later upgraded to Ubuntu 14 but the problem persists) and the other on my personal VM with Centos 6.6 which is detailed in this bug. However, the segfault does not happen on Ares' client when I tried manually. I'll detail the comparison below.

      In an experiment using two clients generating load to the same cluster, the original client with the above problem segfaults, while the other client (borrowing from the Ares cluster) doesn't have the problem.

      The good libcouchbase on Ares' client, I noticed, has the following traits:

      • on Ubuntu 13.04 Raring (which is what the Hermes' cluster started out with).
      • /usr/lib/libcouchbase.so.2.0.14
      • Python package couchbase._version_ is 1.2.1 (compared to 2.0.1 on the client that segfaults). Although I don't think Python library is the problem.

      Lastly, without an index node, I have seen the bad clients segfault too, but with index node it always happens.

      Attachments

        Issue Links

          For Gerrit Dashboard: CCBC-612
          # Subject Branch Project Status CR V

          Activity

            People

              mnunberg Mark Nunberg (Inactive)
              dkao David Kao (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty