XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.8.2
    • Fix Version/s: 2.9.1, 2.10.4, 3.0.0-alpha4
    • Component/s: library
    • Labels:
      None

      Description

      I've encountered an intermittent segfault while running the couchbase-python-client unit tests. Initially I thought it was related to private changes I made to the client code, but it also appears to happen running from the couchbase-python-client master. It's hard to tell if it's a specific piece of functionality that is crashing, but it appears to happen pretty often on the 'bad_host' test. This is the place it segfaults:

      * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
      frame #0: 0x0000000105867799 libcouchbase.2.dylib`lcb::clconfig::Confmon::do_next_provider(this=0x0000000102500690) at confmon.cc:248
         245 for (ProviderList::const_iterator ii = active_providers.begin();
         246 ii != active_providers.end(); ++ii) {
         247 ConfigInfo *info;
      -> 248 Provider* cached_provider = *ii;
         249 info = cached_provider->get_cached();
         250 if (!info) {
         251 continue;
      Target 0: (python2.7) stopped.
      

      Have only just spotted this, so not sure how easy it is to reproduce outside this test.

        Attachments

        1. backtrace_221017.txt
          7 kB
        2. frame_variable_221017
          0.3 kB
        3. image-2019-05-29-16-54-05-375.png
          image-2019-05-29-16-54-05-375.png
          224 kB
        4. lcb_282_crash_22102017.txt
          31 kB
        5. lcb_logs_backtrace_variables.txt
          32 kB
        6. port1_mainline_pcaps_and_logs.zip
          9 kB
        7. screenshot-1.png
          screenshot-1.png
          224 kB
        8. tests.ini
          0.7 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Ellis.Breen Ellis Breen created issue -
          avsej Sergey Avseyev made changes -
          Field Original Value New Value
          Description I've encountered an intermittent segfault while running the couchbase-python-client unit tests. Initially I thought it was related to private changes I made to the client code, but it also appears to happen running from the couchbase-python-client master. It's hard to tell if it's a specific piece of functionality that is crashing, but it appears to happen pretty often on the 'bad_host' test. This is the place it segfaults:

          {{* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)}}
          {{ frame #0: 0x0000000105867799 libcouchbase.2.dylib`lcb::clconfig::Confmon::do_next_provider(this=0x0000000102500690) at confmon.cc:248}}
          {{ 245 for (ProviderList::const_iterator ii = active_providers.begin();}}
          {{ 246 ii != active_providers.end(); ++ii) \{}}
          {{ 247 ConfigInfo *info;}}
          {{-> 248 Provider* cached_provider = *ii;}}
          {{ 249 info = cached_provider->get_cached();}}
          {{ 250 if (!info) \{}}
          {{ 251 continue;}}
          {{Target 0: (python2.7) stopped.}}

          Have only just spotted this, so not sure how easy it is to reproduce outside this test.
          I've encountered an intermittent segfault while running the couchbase-python-client unit tests. Initially I thought it was related to private changes I made to the client code, but it also appears to happen running from the couchbase-python-client master. It's hard to tell if it's a specific piece of functionality that is crashing, but it appears to happen pretty often on the 'bad_host' test. This is the place it segfaults:

          {noformat}
          * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
          frame #0: 0x0000000105867799 libcouchbase.2.dylib`lcb::clconfig::Confmon::do_next_provider(this=0x0000000102500690) at confmon.cc:248
             245 for (ProviderList::const_iterator ii = active_providers.begin();
             246 ii != active_providers.end(); ++ii) {
             247 ConfigInfo *info;
          -> 248 Provider* cached_provider = *ii;
             249 info = cached_provider->get_cached();
             250 if (!info) {
             251 continue;
          Target 0: (python2.7) stopped.
          {noformat}

          Have only just spotted this, so not sure how easy it is to reproduce outside this test.
          avsej Sergey Avseyev made changes -
          Status New [ 10003 ] Open [ 1 ]
          avsej Sergey Avseyev made changes -
          Fix Version/s 2.8.3 [ 14820 ]
          Ellis.Breen Ellis Breen made changes -
          Description I've encountered an intermittent segfault while running the couchbase-python-client unit tests. Initially I thought it was related to private changes I made to the client code, but it also appears to happen running from the couchbase-python-client master. It's hard to tell if it's a specific piece of functionality that is crashing, but it appears to happen pretty often on the 'bad_host' test. This is the place it segfaults:

          {noformat}
          * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
          frame #0: 0x0000000105867799 libcouchbase.2.dylib`lcb::clconfig::Confmon::do_next_provider(this=0x0000000102500690) at confmon.cc:248
             245 for (ProviderList::const_iterator ii = active_providers.begin();
             246 ii != active_providers.end(); ++ii) {
             247 ConfigInfo *info;
          -> 248 Provider* cached_provider = *ii;
             249 info = cached_provider->get_cached();
             250 if (!info) {
             251 continue;
          Target 0: (python2.7) stopped.
          {noformat}

          Have only just spotted this, so not sure how easy it is to reproduce outside this test.
          I've encountered an intermittent segfault while running the couchbase-python-client unit tests. Initially I thought it was related to private changes I made to the client code, but it also appears to happen running from the couchbase-python-client master. It's hard to tell if it's a specific piece of functionality that is crashing, but it appears to happen pretty often on the 'bad_host' test. This is the place it segfaults:

          {{* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)}}
          {{ \{\{ frame #0: 0x0000000105867799 libcouchbase.2.dylib`lcb::clconfig::Confmon::do_next_provider(this=0x0000000102500690) at confmon.cc:248}}}}
          {{ \{\{ 245 for (ProviderList::const_iterator ii = active_providers.begin();}}}}
          {{ \{\{ 246 ii != active_providers.end(); ++ii) \{}}}}
          {{ \{\{ 247 ConfigInfo *info;}}}}
          {{-> 248 Provider* cached_provider = *ii;}}
          {{ \{\{ 249 info = cached_provider->get_cached();}}}}
          {{ \{\{ 250 if (!info) \{}}}}
          {{ \{\{ 251 continue;}}}}
          {{Target 0: (python2.7) stopped.}}

          Have only just spotted this, so not sure how easy it is to reproduce outside this test.
          avsej Sergey Avseyev made changes -
          Description I've encountered an intermittent segfault while running the couchbase-python-client unit tests. Initially I thought it was related to private changes I made to the client code, but it also appears to happen running from the couchbase-python-client master. It's hard to tell if it's a specific piece of functionality that is crashing, but it appears to happen pretty often on the 'bad_host' test. This is the place it segfaults:

          {{* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)}}
          {{ \{\{ frame #0: 0x0000000105867799 libcouchbase.2.dylib`lcb::clconfig::Confmon::do_next_provider(this=0x0000000102500690) at confmon.cc:248}}}}
          {{ \{\{ 245 for (ProviderList::const_iterator ii = active_providers.begin();}}}}
          {{ \{\{ 246 ii != active_providers.end(); ++ii) \{}}}}
          {{ \{\{ 247 ConfigInfo *info;}}}}
          {{-> 248 Provider* cached_provider = *ii;}}
          {{ \{\{ 249 info = cached_provider->get_cached();}}}}
          {{ \{\{ 250 if (!info) \{}}}}
          {{ \{\{ 251 continue;}}}}
          {{Target 0: (python2.7) stopped.}}

          Have only just spotted this, so not sure how easy it is to reproduce outside this test.
          I've encountered an intermittent segfault while running the couchbase-python-client unit tests. Initially I thought it was related to private changes I made to the client code, but it also appears to happen running from the couchbase-python-client master. It's hard to tell if it's a specific piece of functionality that is crashing, but it appears to happen pretty often on the 'bad_host' test. This is the place it segfaults:

          {noformat}
          * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
          frame #0: 0x0000000105867799 libcouchbase.2.dylib`lcb::clconfig::Confmon::do_next_provider(this=0x0000000102500690) at confmon.cc:248
             245 for (ProviderList::const_iterator ii = active_providers.begin();
             246 ii != active_providers.end(); ++ii) {
             247 ConfigInfo *info;
          -> 248 Provider* cached_provider = *ii;
             249 info = cached_provider->get_cached();
             250 if (!info) {
             251 continue;
          Target 0: (python2.7) stopped.
          {noformat}

          Have only just spotted this, so not sure how easy it is to reproduce outside this test.
          Ellis.Breen Ellis Breen made changes -
          Attachment backtrace_221017.txt [ 45544 ]
          Ellis.Breen Ellis Breen made changes -
          Attachment frame_variable_221017 [ 45545 ]
          Ellis.Breen Ellis Breen made changes -
          Attachment lcb_logs_backtrace_variables.txt [ 45546 ]
          Ellis.Breen Ellis Breen made changes -
          Attachment tests.ini [ 45555 ]
          Hide
          Ellis.Breen Ellis Breen added a comment -

          Running server 5.0.0 build 2873.

          Show
          Ellis.Breen Ellis Breen added a comment - Running server 5.0.0 build 2873.
          Ellis.Breen Ellis Breen made changes -
          Attachment port1_mainline_pcaps_and_logs.zip [ 45569 ]
          Hide
          Ellis.Breen Ellis Breen added a comment -

          port1_mainline_pcaps_and_logs.zip - this was generated from gerrit's couchbase-python-client master/HEAD and libcouchbase master/HEAD, using port1 as admin port on couchbase.tests.cases.admin_t (run from python running nosetests within LLDB with LCB_LOGLEVEL=5). Hope this helps.

          Show
          Ellis.Breen Ellis Breen added a comment - port1_mainline_pcaps_and_logs.zip  - this was generated from gerrit's couchbase-python-client master/HEAD and libcouchbase master/HEAD, using port1 as admin port on couchbase.tests.cases.admin_t (run from python running nosetests within LLDB with LCB_LOGLEVEL=5). Hope this helps.
          avsej Sergey Avseyev made changes -
          Fix Version/s 2.8.4 [ 14917 ]
          Fix Version/s 2.8.3 [ 14820 ]
          avsej Sergey Avseyev made changes -
          Fix Version/s 2.8.4 [ 14917 ]
          avsej Sergey Avseyev made changes -
          Fix Version/s 2.9.1 [ 15201 ]
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Resolved [ 5 ]
          Hide
          build-team Couchbase Build Team added a comment -

          Build libcouchbase-2.8.5-185 contains libcouchbase commit b67e629 with commit message:
          CCBC-866: check cached provider isn't NULL

          Show
          build-team Couchbase Build Team added a comment - Build libcouchbase-2.8.5-185 contains libcouchbase commit b67e629 with commit message: CCBC-866 : check cached provider isn't NULL
          Hide
          build-team Couchbase Build Team added a comment -

          Build couchbase-server-6.0.0-1178 contains libcouchbase commit b67e629 with commit message:
          CCBC-866: check cached provider isn't NULL

          Show
          build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.0-1178 contains libcouchbase commit b67e629 with commit message: CCBC-866 : check cached provider isn't NULL
          Hide
          build-team Couchbase Build Team added a comment -

          Build couchbase-server-5.5.0-2802 contains libcouchbase commit b67e629 with commit message:
          CCBC-866: check cached provider isn't NULL

          Show
          build-team Couchbase Build Team added a comment - Build couchbase-server-5.5.0-2802 contains libcouchbase commit b67e629 with commit message: CCBC-866 : check cached provider isn't NULL
          Hide
          Ellis.Breen Ellis Breen added a comment -

          This is still occurring, sadly. Not sure if it is arising from memory corruption elsewhere or something inside the loop is invalidating the iterator.

          From my Ubuntu16 vagrant machine and LCB 3.0.0-alpha-3:

          260 Provider *cached_provider = *ii;
          (gdb) bt
          #0 lcb::clconfig::Confmon::do_next_provider (this=0x1062740)
           at /tmp/tmp.NnzLRKJkCq/build/temp.linux-x86_64-2.7/libcouchbase_src-prefix/src/libcouchbase_src/src/bucketconfig/confmon.cc:260
          #1 0x00007ffff4cc6615 in timer_callback (sock=<optimized out>, which=<optimized out>, 
           arg=0xff4b90)
           at /tmp/tmp.NnzLRKJkCq/build/temp.linux-x86_64-2.7/libcouchbase_src-prefix/src/libcouchbase_src/src/lcbio/timer.c:43
          #2 0x00007ffff323cec9 in event_base_loop ()
           from /usr/lib/x86_64-linux-gnu/libevent_core-2.0.so.5
          #3 0x00007ffff4d3908c in lcb_wait3 (instance=0xbfe0e0, flags=flags@entry=LCB_WAIT_NOCHECK)
           at /tmp/tmp.NnzLRKJkCq/build/temp.linux-x86_64-2.7/libcouchbase_src-prefix/src/libcouchbase_src/src/wait.cc:140
          #4 0x00007ffff4fc2674 in pycbc_oputil_wait_common (self=self@entry=0x7fff

          Show
          Ellis.Breen Ellis Breen added a comment - This is still occurring, sadly. Not sure if it is arising from memory corruption elsewhere or something inside the loop is invalidating the iterator. From my Ubuntu16 vagrant machine and LCB 3.0.0-alpha-3: 260 Provider *cached_provider = *ii; (gdb) bt # 0 lcb::clconfig::Confmon::do_next_provider ( this = 0x1062740 ) at /tmp/tmp.NnzLRKJkCq/build/temp.linux-x86_64- 2.7 /libcouchbase_src-prefix/src/libcouchbase_src/src/bucketconfig/confmon.cc: 260 # 1 0x00007ffff4cc6615 in timer_callback (sock=<optimized out>, which=<optimized out>, arg= 0xff4b90 ) at /tmp/tmp.NnzLRKJkCq/build/temp.linux-x86_64- 2.7 /libcouchbase_src-prefix/src/libcouchbase_src/src/lcbio/timer.c: 43 # 2 0x00007ffff323cec9 in event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent_core- 2.0 .so. 5 # 3 0x00007ffff4d3908c in lcb_wait3 (instance= 0xbfe0e0 , flags=flags @entry =LCB_WAIT_NOCHECK) at /tmp/tmp.NnzLRKJkCq/build/temp.linux-x86_64- 2.7 /libcouchbase_src-prefix/src/libcouchbase_src/src/wait.cc: 140 # 4 0x00007ffff4fc2674 in pycbc_oputil_wait_common (self=self @entry = 0x7fff
          Ellis.Breen Ellis Breen made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Ellis.Breen Ellis Breen made changes -
          Attachment image-2019-05-29-16-54-05-375.png [ 68404 ]
          Ellis.Breen Ellis Breen made changes -
          Attachment screenshot-1.png [ 68405 ]
          Hide
          Ellis.Breen Ellis Breen added a comment -

          Looks like do_next_provider is ultimately calling

           

           lcb::clconfig::Confmon::prepare

           
          Which then invalidates the iterator by clearing the vector:

          Show
          Ellis.Breen Ellis Breen added a comment - Looks like do_next_provider is ultimately calling   lcb::clconfig::Confmon::prepare   Which then invalidates the iterator by clearing the vector:
          Hide
          Ellis.Breen Ellis Breen added a comment -

          Suggested fix: http://review.couchbase.org/c/109842/ - testing now

          Show
          Ellis.Breen Ellis Breen added a comment - Suggested fix: http://review.couchbase.org/c/109842/ - testing now
          Hide
          Ellis.Breen Ellis Breen added a comment -

          Just run this for many iterations (numerous PYCBC changesets) and the problem hasn't occurred once. Looks like a successful fix.

          Hoping to have this or a similar fix merged soon to go out with PYCBC Alpha release.

          Show
          Ellis.Breen Ellis Breen added a comment - Just run this for many iterations (numerous PYCBC changesets) and the problem hasn't occurred once. Looks like a successful fix. Hoping to have this or a similar fix merged soon to go out with PYCBC Alpha release.
          avsej Sergey Avseyev made changes -
          Fix Version/s 2.10.4 [ 15917 ]
          Hide
          build-team Couchbase Build Team added a comment -

          Build couchbase-server-6.5.0-3370 contains libcouchbase commit eacefea with commit message:
          CCBC-866: track invalidated list using unique ID (2.10.3)

          Show
          build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-3370 contains libcouchbase commit eacefea with commit message: CCBC-866 : track invalidated list using unique ID (2.10.3)
          Hide
          build-team Couchbase Build Team added a comment -

          Build libcouchbase-2.8.5-457 contains libcouchbase commit 6425bbb with commit message:
          CCBC-866: track invalidated active_provider_list using unique ID (3.0.0-alpha3)

          Show
          build-team Couchbase Build Team added a comment - Build libcouchbase-2.8.5-457 contains libcouchbase commit 6425bbb with commit message: CCBC-866 : track invalidated active_provider_list using unique ID (3.0.0-alpha3)
          avsej Sergey Avseyev made changes -
          Fix Version/s 3.0.0-alpha4 [ 16167 ]
          Resolution Fixed [ 1 ]
          Status Reopened [ 4 ] Resolved [ 5 ]
          avsej Sergey Avseyev made changes -
          Actual End 2019-06-03 10:19 (issue has been resolved)

            People

            • Assignee:
              avsej Sergey Avseyev
              Reporter:
              Ellis.Breen Ellis Breen
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  PagerDuty

                  Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.