XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Incomplete
    • None
    • 2.10.6, 2.10.9, 3.0.7
    • None
    • None
    • 1

    Description

      Possible double-free issue
       

      #0 __GI___libc_free (mem=0x7f046c44ee80) at malloc.c:2941
      #1 0x00007f048c0b1bbb in lcbvb_destroy (conf=0x7f046c052b10)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/vbucket/vbucket.c:853
      #2 0x00007f048c0e6d64 in decref (this=0x7f046c050ae0)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/bucketconfig/clconfig.h:546
      #3 update (data=0x7f046c022db0 <Address 0x7f046c022db0 out of bounds>, host=<optimized out>, this=0x7f04840f6e10)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/bucketconfig/bc_cccp.cc:207
      #4 lcb::clconfig::cccp_update (provider=provider@entry=0x7f04840f6e10, host=<optimized out>, 
       data=0x7f046c022db0 <Address 0x7f046c022db0 out of bounds>)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/bucketconfig/bc_cccp.cc:175
      #5 0x00007f048c127136 in lcb::Server::handle_nmv (this=this@entry=0x7f0484111550, resinfo=..., oldpkt=oldpkt@entry=0x7f048412cbd0)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/mcserver/mcserver.cc:151
      #6 0x00007f048c129d49 in try_read (ior=0x7f048411ac28, ctx=0x7f048411abe0, this=0x7f0484111550)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/mcserver/mcserver.cc:396
      #7 on_read (ctx=0x7f048411abe0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/mcserver/mcserver.cc:460
      #8 0x00007f048c0c3d4c in invoke_read_cb (nb=10969, ctx=0x7f048411abe0)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/lcbio/ctx.c:278
      #9 E_handler (sock=<optimized out>, which=<optimized out>, arg=0x7f048411abe0)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/lcbio/ctx.c:307
      #10 0x00007f048c0a9852 in run_loop (io=<optimized out>, is_tick=<optimized out>)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/plugins/io/select/plugin-select.c:323
      #11 0x00007f048c13847e in lcb_wait (instance=0x7f04840f64c0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/wait.cc:103
      #12 0x0000000000463596 in RetryWithFixedBackoff<bool (&)(lcb_error_t), lcb_error_t (&)(lcb_st*), lcb_st*&, lcb_error_t, 0> (
       callable=@0x40a460: \{lcb_error_t (lcb_st *)} 0x40a460 <lcb_wait@plt>, isRetriable=<optimized out>, initial_delay_milliseconds=200, 
       max_retry_count=5)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/features/include/retry_util.h:40
      #13 0x0000000000464b01 in timer::TimerStore::GetCounter (this=this@entry=0x7f04840cf240, key=...)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing-ee/features/src/timer_store.cc:319
      #14 0x00000000004664d0 in timer::TimerStore::SetTimer (this=0x7f04840cf240, timer=...)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing-ee/features/src/timer_store.cc:47
      #15 0x000000000041f846 in V8Worker::SetTimer (this=this@entry=0x7f0484013700, tinfo=...)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/v8_consumer/src/v8worker.cc:1148
      #16 0x000000000043dbdf in Timer::CreateTimerImpl (this=0x7f04840c22b0, args=...)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/v8_consumer/src/timer.cc:98
      #17 0x000000000043e286 in CreateTimer (args=...)
       at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/v8_consumer/src/timer.cc:142
      #18 0x00007f048d88c239 in v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo*) () from /opt/couchbase/lib/libv8.so
      #19 0x00007f048d88b738 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) () from /opt/couchbase/lib/libv8.so
      #20 0x00007f048d88aec6 in v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) ()
       from /opt/couchbase/lib/libv8.so
      #21 0x00007f048e0c48ae in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit () from /opt/couchbase/lib/libv8.so
      #22 0x00002618c730816e in ?? ()
      #23 0x0000028282c825a1 in ?? ()
      #24 0x00002617c3a9d139 in ?? ()
      #25 0x0000000900000000 in ?? ()
      #26 0x0000028282c82681 in ?? ()
      #27 0x0000065d78b0b0e9 in ?? ()
      #28 0x0000364842a7fcd9 in ?? ()
      #29 0x0000065d78b0af49 in ?? ()
      #30 0x0000364842a094a9 in ?? ()
      #31 0x00001f7c96f04679 in ?? ()
      #32 0x0000065d78b0b0e9 in ?? ()
      #33 0x0000364842a7fcd9 in ?? ()
      #34 0x0000065d78b0af49 in ?? ()
      #35 0x0000364842a094a9 in ?? ()
      #36 0x00002617c3a9d139 in ?? ()
      #37 0x0000065d78b0b0e9 in ?? ()
      #38 0x0000065d78b0b0c9 in ?? ()
      #39 0x0000065d78b0b099 in ?? ()
      #40 0x0000065d78b0af49 in ?? ()
      #41 0x000000a300000000 in ?? ()
      #42 0x0000364842a09e91 in ?? ()
      #43 0x0000364842a09441 in ?? ()
      #44 0x00002617c3a82ad9 in ?? ()
      #45 0x00007f0481e324c8 in ?? ()
      #46 0x00007f048e034603 in Builtins_JSEntryTrampoline () from /opt/couchbase/lib/libv8.so
      #47 0x0000065d78b0aed9 in ?? ()
      #48 0x0000065d78b0ac81 in ?? ()
      #49 0x00001f7c96f04679 in ?? ()
      #50 0x0000364842a09441 in ?? ()
      #51 0x0000000000000020 in ?? ()
      #52 0x00007f0481e32530 in ?? ()
      #53 0x00002618c73040de in ?? ()
      #54 0x0000000000000000 in ?? ()
      

      This was seen on build 6.5.0-4917 while running the qe test

      ./testrunner -i /tmp/testexec.17696.ini -p get-cbcollect-info=True,GROUP=source_bucket_mutation_timers -t eventing.eventing_rebalance.EventingRebalance.test_eventing_rebalance_with_multiple_kv_nodes,doc-per-day=5,dataset=default,nodes_init=5,services_init=kv-kv-kv-eventing-index:n1ql,groups=simple,reset_services=True,handler_code=source_bucket_mutation_with_timers,source_bucket_mutation=True,GROUP=source_bucket_mutation_timers 
      

      The issue is inconsistent and not easily reproducible.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            avsej Sergey Avseyev created issue -
            lynn.straus Lynn Straus made changes -
            Field Original Value New Value
            Link This issue blocks MB-37167 [ MB-37167 ]

            Sergey Avseyev

            Thanks for opening this issue on my behalf after going through the stack trace. My bad, I should've done this in the first place. The crash was seen during one of the qe tests testing eventing rebalance with multiple kv nodes. The issue is very inconsistent, not easily reproducible. I ran the same test few times after it was reported, it passed successfully. All I can do to reproduce this would be to keep running the test until it fails in one of the runs.

            suraj.naik Suraj Naik (Inactive) added a comment - Sergey Avseyev Thanks for opening this issue on my behalf after going through the stack trace. My bad, I should've done this in the first place. The crash was seen during one of the qe tests testing eventing rebalance with multiple kv nodes. The issue is very inconsistent, not easily reproducible. I ran the same test few times after it was reported, it passed successfully. All I can do to reproduce this would be to keep running the test until it fails in one of the runs.
            suraj.naik Suraj Naik (Inactive) made changes -
            Description Possible double-free issue
             
            {noformat}
            #0 __GI___libc_free (mem=0x7f046c44ee80) at malloc.c:2941
            #1 0x00007f048c0b1bbb in lcbvb_destroy (conf=0x7f046c052b10)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/vbucket/vbucket.c:853
            #2 0x00007f048c0e6d64 in decref (this=0x7f046c050ae0)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/bucketconfig/clconfig.h:546
            #3 update (data=0x7f046c022db0 <Address 0x7f046c022db0 out of bounds>, host=<optimized out>, this=0x7f04840f6e10)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/bucketconfig/bc_cccp.cc:207
            #4 lcb::clconfig::cccp_update (provider=provider@entry=0x7f04840f6e10, host=<optimized out>,
             data=0x7f046c022db0 <Address 0x7f046c022db0 out of bounds>)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/bucketconfig/bc_cccp.cc:175
            #5 0x00007f048c127136 in lcb::Server::handle_nmv (this=this@entry=0x7f0484111550, resinfo=..., oldpkt=oldpkt@entry=0x7f048412cbd0)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/mcserver/mcserver.cc:151
            #6 0x00007f048c129d49 in try_read (ior=0x7f048411ac28, ctx=0x7f048411abe0, this=0x7f0484111550)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/mcserver/mcserver.cc:396
            #7 on_read (ctx=0x7f048411abe0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/mcserver/mcserver.cc:460
            #8 0x00007f048c0c3d4c in invoke_read_cb (nb=10969, ctx=0x7f048411abe0)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/lcbio/ctx.c:278
            #9 E_handler (sock=<optimized out>, which=<optimized out>, arg=0x7f048411abe0)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/lcbio/ctx.c:307
            #10 0x00007f048c0a9852 in run_loop (io=<optimized out>, is_tick=<optimized out>)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/plugins/io/select/plugin-select.c:323
            #11 0x00007f048c13847e in lcb_wait (instance=0x7f04840f64c0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/wait.cc:103
            #12 0x0000000000463596 in RetryWithFixedBackoff<bool (&)(lcb_error_t), lcb_error_t (&)(lcb_st*), lcb_st*&, lcb_error_t, 0> (
             callable=@0x40a460: \{lcb_error_t (lcb_st *)} 0x40a460 <lcb_wait@plt>, isRetriable=<optimized out>, initial_delay_milliseconds=200,
             max_retry_count=5)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/features/include/retry_util.h:40
            #13 0x0000000000464b01 in timer::TimerStore::GetCounter (this=this@entry=0x7f04840cf240, key=...)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing-ee/features/src/timer_store.cc:319
            #14 0x00000000004664d0 in timer::TimerStore::SetTimer (this=0x7f04840cf240, timer=...)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing-ee/features/src/timer_store.cc:47
            #15 0x000000000041f846 in V8Worker::SetTimer (this=this@entry=0x7f0484013700, tinfo=...)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/v8_consumer/src/v8worker.cc:1148
            #16 0x000000000043dbdf in Timer::CreateTimerImpl (this=0x7f04840c22b0, args=...)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/v8_consumer/src/timer.cc:98
            #17 0x000000000043e286 in CreateTimer (args=...)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/v8_consumer/src/timer.cc:142
            #18 0x00007f048d88c239 in v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo*) () from /opt/couchbase/lib/libv8.so
            #19 0x00007f048d88b738 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) () from /opt/couchbase/lib/libv8.so
            #20 0x00007f048d88aec6 in v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) ()
             from /opt/couchbase/lib/libv8.so
            #21 0x00007f048e0c48ae in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit () from /opt/couchbase/lib/libv8.so
            #22 0x00002618c730816e in ?? ()
            #23 0x0000028282c825a1 in ?? ()
            #24 0x00002617c3a9d139 in ?? ()
            #25 0x0000000900000000 in ?? ()
            #26 0x0000028282c82681 in ?? ()
            #27 0x0000065d78b0b0e9 in ?? ()
            #28 0x0000364842a7fcd9 in ?? ()
            #29 0x0000065d78b0af49 in ?? ()
            #30 0x0000364842a094a9 in ?? ()
            #31 0x00001f7c96f04679 in ?? ()
            #32 0x0000065d78b0b0e9 in ?? ()
            #33 0x0000364842a7fcd9 in ?? ()
            #34 0x0000065d78b0af49 in ?? ()
            #35 0x0000364842a094a9 in ?? ()
            #36 0x00002617c3a9d139 in ?? ()
            #37 0x0000065d78b0b0e9 in ?? ()
            #38 0x0000065d78b0b0c9 in ?? ()
            #39 0x0000065d78b0b099 in ?? ()
            #40 0x0000065d78b0af49 in ?? ()
            #41 0x000000a300000000 in ?? ()
            #42 0x0000364842a09e91 in ?? ()
            #43 0x0000364842a09441 in ?? ()
            #44 0x00002617c3a82ad9 in ?? ()
            #45 0x00007f0481e324c8 in ?? ()
            #46 0x00007f048e034603 in Builtins_JSEntryTrampoline () from /opt/couchbase/lib/libv8.so
            #47 0x0000065d78b0aed9 in ?? ()
            #48 0x0000065d78b0ac81 in ?? ()
            #49 0x00001f7c96f04679 in ?? ()
            #50 0x0000364842a09441 in ?? ()
            #51 0x0000000000000020 in ?? ()
            #52 0x00007f0481e32530 in ?? ()
            #53 0x00002618c73040de in ?? ()
            #54 0x0000000000000000 in ?? ()
            {noformat}

            This was seen on build 6.5.0-4917
            Possible double-free issue
             
            {noformat}
            #0 __GI___libc_free (mem=0x7f046c44ee80) at malloc.c:2941
            #1 0x00007f048c0b1bbb in lcbvb_destroy (conf=0x7f046c052b10)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/vbucket/vbucket.c:853
            #2 0x00007f048c0e6d64 in decref (this=0x7f046c050ae0)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/bucketconfig/clconfig.h:546
            #3 update (data=0x7f046c022db0 <Address 0x7f046c022db0 out of bounds>, host=<optimized out>, this=0x7f04840f6e10)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/bucketconfig/bc_cccp.cc:207
            #4 lcb::clconfig::cccp_update (provider=provider@entry=0x7f04840f6e10, host=<optimized out>,
             data=0x7f046c022db0 <Address 0x7f046c022db0 out of bounds>)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/bucketconfig/bc_cccp.cc:175
            #5 0x00007f048c127136 in lcb::Server::handle_nmv (this=this@entry=0x7f0484111550, resinfo=..., oldpkt=oldpkt@entry=0x7f048412cbd0)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/mcserver/mcserver.cc:151
            #6 0x00007f048c129d49 in try_read (ior=0x7f048411ac28, ctx=0x7f048411abe0, this=0x7f0484111550)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/mcserver/mcserver.cc:396
            #7 on_read (ctx=0x7f048411abe0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/mcserver/mcserver.cc:460
            #8 0x00007f048c0c3d4c in invoke_read_cb (nb=10969, ctx=0x7f048411abe0)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/lcbio/ctx.c:278
            #9 E_handler (sock=<optimized out>, which=<optimized out>, arg=0x7f048411abe0)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/lcbio/ctx.c:307
            #10 0x00007f048c0a9852 in run_loop (io=<optimized out>, is_tick=<optimized out>)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/plugins/io/select/plugin-select.c:323
            #11 0x00007f048c13847e in lcb_wait (instance=0x7f04840f64c0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/libcouchbase/src/wait.cc:103
            #12 0x0000000000463596 in RetryWithFixedBackoff<bool (&)(lcb_error_t), lcb_error_t (&)(lcb_st*), lcb_st*&, lcb_error_t, 0> (
             callable=@0x40a460: \{lcb_error_t (lcb_st *)} 0x40a460 <lcb_wait@plt>, isRetriable=<optimized out>, initial_delay_milliseconds=200,
             max_retry_count=5)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/features/include/retry_util.h:40
            #13 0x0000000000464b01 in timer::TimerStore::GetCounter (this=this@entry=0x7f04840cf240, key=...)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing-ee/features/src/timer_store.cc:319
            #14 0x00000000004664d0 in timer::TimerStore::SetTimer (this=0x7f04840cf240, timer=...)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing-ee/features/src/timer_store.cc:47
            #15 0x000000000041f846 in V8Worker::SetTimer (this=this@entry=0x7f0484013700, tinfo=...)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/v8_consumer/src/v8worker.cc:1148
            #16 0x000000000043dbdf in Timer::CreateTimerImpl (this=0x7f04840c22b0, args=...)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/v8_consumer/src/timer.cc:98
            #17 0x000000000043e286 in CreateTimer (args=...)
             at /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/v8_consumer/src/timer.cc:142
            #18 0x00007f048d88c239 in v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo*) () from /opt/couchbase/lib/libv8.so
            #19 0x00007f048d88b738 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) () from /opt/couchbase/lib/libv8.so
            #20 0x00007f048d88aec6 in v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) ()
             from /opt/couchbase/lib/libv8.so
            #21 0x00007f048e0c48ae in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit () from /opt/couchbase/lib/libv8.so
            #22 0x00002618c730816e in ?? ()
            #23 0x0000028282c825a1 in ?? ()
            #24 0x00002617c3a9d139 in ?? ()
            #25 0x0000000900000000 in ?? ()
            #26 0x0000028282c82681 in ?? ()
            #27 0x0000065d78b0b0e9 in ?? ()
            #28 0x0000364842a7fcd9 in ?? ()
            #29 0x0000065d78b0af49 in ?? ()
            #30 0x0000364842a094a9 in ?? ()
            #31 0x00001f7c96f04679 in ?? ()
            #32 0x0000065d78b0b0e9 in ?? ()
            #33 0x0000364842a7fcd9 in ?? ()
            #34 0x0000065d78b0af49 in ?? ()
            #35 0x0000364842a094a9 in ?? ()
            #36 0x00002617c3a9d139 in ?? ()
            #37 0x0000065d78b0b0e9 in ?? ()
            #38 0x0000065d78b0b0c9 in ?? ()
            #39 0x0000065d78b0b099 in ?? ()
            #40 0x0000065d78b0af49 in ?? ()
            #41 0x000000a300000000 in ?? ()
            #42 0x0000364842a09e91 in ?? ()
            #43 0x0000364842a09441 in ?? ()
            #44 0x00002617c3a82ad9 in ?? ()
            #45 0x00007f0481e324c8 in ?? ()
            #46 0x00007f048e034603 in Builtins_JSEntryTrampoline () from /opt/couchbase/lib/libv8.so
            #47 0x0000065d78b0aed9 in ?? ()
            #48 0x0000065d78b0ac81 in ?? ()
            #49 0x00001f7c96f04679 in ?? ()
            #50 0x0000364842a09441 in ?? ()
            #51 0x0000000000000020 in ?? ()
            #52 0x00007f0481e32530 in ?? ()
            #53 0x00002618c73040de in ?? ()
            #54 0x0000000000000000 in ?? ()
            {noformat}

            This was seen on build 6.5.0-4917 while running the qe test
            {noformat}
            ./testrunner -i /tmp/testexec.17696.ini -p get-cbcollect-info=True,GROUP=source_bucket_mutation_timers -t eventing.eventing_rebalance.EventingRebalance.test_eventing_rebalance_with_multiple_kv_nodes,doc-per-day=5,dataset=default,nodes_init=5,services_init=kv-kv-kv-eventing-index:n1ql,groups=simple,reset_services=True,handler_code=source_bucket_mutation_with_timers,source_bucket_mutation=True,GROUP=source_bucket_mutation_timers
            {noformat}

            The issue is inconsistent and not easily reproducible.
            jeelan.poola Jeelan Poola made changes -
            Priority Major [ 3 ] Critical [ 2 ]

            Given that it's happening, do you have a mechanism of building/running with the thread sanitizer Suraj Naik or Jeelan Poola? That would more likely catch this and other issues more consistently and earlier.

            ingenthr Matt Ingenthron added a comment - Given that it's happening, do you have a mechanism of building/running with the thread sanitizer Suraj Naik or Jeelan Poola ? That would more likely catch this and other issues more consistently and earlier.

            Maybe not much of thread sanitizer, but rather address sanitizer from clang for instance. This is how libcouchbase should be build then cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLCB_USE_ASAN=1 . && make -j8. I've tried to reproduce the issue, but without any luck yet.

            The stack trace says that the problem happens, when we try to apply config attached to NMV response. And for some reason we cannot destroy current config.

            How many libcouchbase instances are operating in this test? How many eventing worker threads? Is it possible that the worker thread IO related to libcouchbase instance might be executing by other threads when the worker thread does not execute any work?

            avsej Sergey Avseyev added a comment - Maybe not much of thread sanitizer, but rather address sanitizer from clang for instance. This is how libcouchbase should be build then cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLCB_USE_ASAN=1 . && make -j8 . I've tried to reproduce the issue, but without any luck yet. The stack trace says that the problem happens, when we try to apply config attached to NMV response. And for some reason we cannot destroy current config. How many libcouchbase instances are operating in this test? How many eventing worker threads? Is it possible that the worker thread IO related to libcouchbase instance might be executing by other threads when the worker thread does not execute any work?
            ingenthr Matt Ingenthron made changes -
            Link This issue relates to MB-37197 [ MB-37197 ]

            It appears that MB-37167 and MB-37197 may be related. Both are related to socket handling, but fail in different ways. It looks like we cannot (or cannot easily) use tsan owing to other dependencies that are in the way of linking.

            There is a patchset available for testing that we think has a good probability of solving MB-37167. It was produced by taking the general area being used and testing in an outside program under the thread sanitizer and valgrind.

            As for MB-37197, it is a bit less clear if this one will be addressed with the same patchset. We want to see if it can be observed, and may want to add some additional telemetry to STDERR log host/port and address to catch when it becomes invalid. This is not currently in the patchset.

            For both, we've requested an updated run, as there are issues definitely addressed with external testing.

            ingenthr Matt Ingenthron added a comment - It appears that MB-37167 and MB-37197 may be related. Both are related to socket handling, but fail in different ways. It looks like we cannot (or cannot easily) use tsan owing to other dependencies that are in the way of linking. There is a patchset available for testing that we think has a good probability of solving MB-37167 . It was produced by taking the general area being used and testing in an outside program under the thread sanitizer and valgrind. As for MB-37197 , it is a bit less clear if this one will be addressed with the same patchset. We want to see if it can be observed, and may want to add some additional telemetry to STDERR log host/port and address to catch when it becomes invalid. This is not currently in the patchset. For both, we've requested an updated run, as there are issues definitely addressed with external testing.
            satya.nand Satya Nand (Inactive) added a comment - - edited

            Sergey Avseyev We do access lcb handle from two different threads. All these lcb access are protected by mutex.

            satya.nand Satya Nand (Inactive) added a comment - - edited Sergey Avseyev We do access lcb handle from two different threads. All these lcb access are protected by mutex.
            suraj.naik Suraj Naik (Inactive) added a comment - - edited

            Sergey Avseyev

            I made a toy build with your patch and ran it on one of our dev clusters continuously and it finally crashed. The logs are here http://qa.sc.couchbase.com/job/dev_testbed_blr3/190/artifact/logs/testrunner-19-Dec-10_23-11-31/*zip*/testrunner-19-Dec-10_23-11-31.zip

            There is a mini dump in the logs in node 172.23.106.73.
            The console logs for the test are here http://qa.sc.couchbase.com/job/dev_testbed_blr3/190/console

            The reproduction of the issue is very inconsistent, it got reproduced once in 10 runs here http://qa.sc.couchbase.com/job/dev_testbed_blr3/ - build 182 - build 191. The reproduced test build is build 190.

            The link for the toy build is http://server.jenkins.couchbase.com/view/Toys/job/toy-unix-simple/lastSuccessfulBuild/artifact/couchbase-server-enterprise-6.5.0-10841-centos7.x86_64.rpm and the link for debug info build is http://server.jenkins.couchbase.com/view/Toys/job/toy-unix-simple/lastSuccessfulBuild/artifact/couchbase-server-enterprise-debuginfo-6.5.0-10841-centos7.x86_64.rpm

            Info about the test
            The test has one eventing node, 1 index and query node and 3 kv node. Then an eventing handler is deployed. After the deployment is successful, two more kv nodes are added and rebalanced. This rebalance fails with "Some apps are deploying or resuming on some or all Eventing nodes". This happens as one of consumer crashes with the above stack trace. And once the consumer crashes, producer respawns it again, during respawn we set the bootstrap flag, and we fail rebalance if the consumer is bootstrapping with the above message.

            suraj.naik Suraj Naik (Inactive) added a comment - - edited Sergey Avseyev I made a toy build with your patch and ran it on one of our dev clusters continuously and it finally crashed. The logs are here http://qa.sc.couchbase.com/job/dev_testbed_blr3/190/artifact/logs/testrunner-19-Dec-10_23-11-31/*zip*/testrunner-19-Dec-10_23-11-31.zip There is a mini dump in the logs in node 172.23.106.73. The console logs for the test are here http://qa.sc.couchbase.com/job/dev_testbed_blr3/190/console The reproduction of the issue is very inconsistent, it got reproduced once in 10 runs here http://qa.sc.couchbase.com/job/dev_testbed_blr3/ - build 182 - build 191. The reproduced test build is build 190. The link for the toy build is http://server.jenkins.couchbase.com/view/Toys/job/toy-unix-simple/lastSuccessfulBuild/artifact/couchbase-server-enterprise-6.5.0-10841-centos7.x86_64.rpm and the link for debug info build is http://server.jenkins.couchbase.com/view/Toys/job/toy-unix-simple/lastSuccessfulBuild/artifact/couchbase-server-enterprise-debuginfo-6.5.0-10841-centos7.x86_64.rpm Info about the test The test has one eventing node, 1 index and query node and 3 kv node. Then an eventing handler is deployed. After the deployment is successful, two more kv nodes are added and rebalanced. This rebalance fails with "Some apps are deploying or resuming on some or all Eventing nodes". This happens as one of consumer crashes with the above stack trace. And once the consumer crashes, producer respawns it again, during respawn we set the bootstrap flag, and we fail rebalance if the consumer is bootstrapping with the above message.

            Suraj Naik thank you. Could you paste stack trace from minidump here, please?

            Gautham Banasandra, I saw that you did some work to build consumer standalone, would it be possible to run it with thread sanitizer?

            In theory, yes mutex should protect libcouchbase from concurrent access, but we don't have an application to verify and analyze it. Libcouchbase itself does not have any locks in the library, but it might be some corruption as our IO layer not only executes user-provided operations, but also has its own internal operations scheduled. It would really help if we can run the code using sanitizers or similar tools.

            avsej Sergey Avseyev added a comment - Suraj Naik thank you. Could you paste stack trace from minidump here, please? Gautham Banasandra , I saw that you did some work to build consumer standalone, would it be possible to run it with thread sanitizer? In theory, yes mutex should protect libcouchbase from concurrent access, but we don't have an application to verify and analyze it. Libcouchbase itself does not have any locks in the library, but it might be some corruption as our IO layer not only executes user-provided operations, but also has its own internal operations scheduled. It would really help if we can run the code using sanitizers or similar tools.
            avsej Sergey Avseyev added a comment - - edited

            This is what I've got from machine 172.23.106.73 (UPDATE: this core dump is not related for this ticket)

            /opt/couchbase/bin/minidump-2-core /opt/couchbase/var/lib/couchbase/crash/3585291e-88c1-7d65-7e7eaf45-39303685.dmp > crash.core
            gdb /opt/couchbase/bin/eventing-consumer crash.core
            

            #0  0x00007f7451204207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
            #1  0x00007f74512058f8 in __GI_abort () at abort.c:90
            #2  0x00007f7451246d27 in __libc_message (do_abort=do_abort@entry=2, 
                fmt=fmt@entry=0x7f7451358678 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
            #3  0x00007f745124f489 in malloc_printerr (ar_ptr=0x7f7428000020, ptr=<optimized out>, 
                str=0x7f7451358738 "double free or corruption (fasttop)", action=3) at malloc.c:5004
            #4  _int_free (av=0x7f7428000020, p=<optimized out>, have_lock=0) at malloc.c:3843
            #5  0x00007f74528b7871 in mcreq_wipe_packet (pipeline=pipeline@entry=0x7f742804b3f0, packet=packet@entry=0x7f742804b370)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/mc/mcreq.c:238
            #6  0x00007f74528b91cd in mcreq_packet_done (pipeline=0x7f742804b3f0, pkt=pkt@entry=0x7f742804b370)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/mc/mcreq.c:780
            #7  0x00007f7452923e78 in mcreq__pktflush_callback (p=0x7f742804b370, hint=65, arg=0x7f744cc5ce00)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/mc/mcreq-flush-inl.h:68
            #8  0x00007f74528bcdab in netbuf_end_flush2 (mgr=mgr@entry=0x7f742804b430, nflushed=nflushed@entry=65, 
                callback=callback@entry=0x7f7452923dd0 <mcreq__pktflush_callback(void*, nb_SIZE, void*)>, lloff=lloff@entry=8, 
                arg=arg@entry=0x7f744cc5ce00) at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/netbuf/netbuf.c:684
            #9  0x00007f74529285e3 in mcreq_flush_done_ex (now=<optimized out>, expected=65, nflushed=65, pl=0x7f742804b3f0)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/mc/mcreq-flush-inl.h:97
            #10 on_flush_done (ctx=<optimized out>, expected=65, actual=65)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/mcserver/mcserver.cc:85
            #11 0x00007f74528c244b in E_put_ex (nb=65, niov=<optimized out>, iov=0x7f744cc5ce90, ctx=0x7f74280e59f0)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/lcbio/ctx.c:538
            #12 lcbio_ctx_put_ex (ctx=ctx@entry=0x7f74280e59f0, iov=iov@entry=0x7f744cc5ce90, niov=<optimized out>, nb=65)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/lcbio/ctx.c:616
            #13 0x00007f7452923d79 in on_flush_ready (ctx=0x7f74280e59f0)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/mcserver/mcserver.cc:68
            #14 0x00007f74528c0d32 in E_handler (sock=<optimized out>, which=<optimized out>, arg=0x7f74280e59f0)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/lcbio/ctx.c:322
            #15 0x00007f74528a6872 in run_loop (io=<optimized out>, is_tick=<optimized out>)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/plugins/io/select/plugin-select.c:323
            #16 0x00007f745293574e in lcb_wait (instance=0x7f74480d0060)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/wait.cc:103
            #17 0x0000000000463686 in RetryWithFixedBackoff<bool (&)(lcb_error_t), lcb_error_t (&)(lcb_st*), lcb_st*&, lcb_error_t, 0> (
                callable=@0x40a460: {lcb_error_t (lcb_st *)} 0x40a460 <lcb_wait@plt>, isRetriable=<optimized out>, 
                initial_delay_milliseconds=200, max_retry_count=5)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing/features/include/retry_util.h:40
            #18 0x0000000000463b4d in timer::TimerStore::Get (this=0x7f74480ef050, key=...)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing-ee/features/src/timer_store.cc:421
            #19 0x0000000000468408 in timer::Iterator::GetNextTimer (this=this@entry=0x7f744cc5db60, tevent=...)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing-ee/features/src/timer_iterator.cc:52
            #20 0x0000000000468bd2 in timer::Iterator::GetNext (this=this@entry=0x7f744cc5db60, tevent=...)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing-ee/features/src/timer_iterator.cc:131
            #21 0x000000000042ad1b in V8Worker::RouteMessage (this=0x7f7448014870)
                at /home/couchbase/jenkins/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing/v8_consumer/src/v8worker.cc:488
            #22 0x00007f7451d89dcf in std::execute_native_thread_routine (__p=0x7f74480eb970)
                at /tmp/deploy/gcc-7.3.0/libstdc++-v3/src/c++11/thread.cc:83
            #23 0x00007f74515a2dd5 in start_thread (arg=0x7f744cc61700) at pthread_create.c:307
            #24 0x00007f74512cbead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
            

            avsej Sergey Avseyev added a comment - - edited This is what I've got from machine 172.23.106.73 (UPDATE: this core dump is not related for this ticket) /opt/couchbase/bin/minidump-2-core /opt/couchbase/var/lib/couchbase/crash/3585291e-88c1-7d65-7e7eaf45-39303685.dmp > crash.core gdb /opt/couchbase/bin/eventing-consumer crash.core #0 0x00007f7451204207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007f74512058f8 in __GI_abort () at abort.c:90 #2 0x00007f7451246d27 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f7451358678 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00007f745124f489 in malloc_printerr (ar_ptr=0x7f7428000020, ptr=<optimized out>, str=0x7f7451358738 "double free or corruption (fasttop)", action=3) at malloc.c:5004 #4 _int_free (av=0x7f7428000020, p=<optimized out>, have_lock=0) at malloc.c:3843 #5 0x00007f74528b7871 in mcreq_wipe_packet (pipeline=pipeline@entry=0x7f742804b3f0, packet=packet@entry=0x7f742804b370) at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/mc/mcreq.c:238 #6 0x00007f74528b91cd in mcreq_packet_done (pipeline=0x7f742804b3f0, pkt=pkt@entry=0x7f742804b370) at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/mc/mcreq.c:780 #7 0x00007f7452923e78 in mcreq__pktflush_callback (p=0x7f742804b370, hint=65, arg=0x7f744cc5ce00) at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/mc/mcreq-flush-inl.h:68 #8 0x00007f74528bcdab in netbuf_end_flush2 (mgr=mgr@entry=0x7f742804b430, nflushed=nflushed@entry=65, callback=callback@entry=0x7f7452923dd0 <mcreq__pktflush_callback(void*, nb_SIZE, void*)>, lloff=lloff@entry=8, arg=arg@entry=0x7f744cc5ce00) at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/netbuf/netbuf.c:684 #9 0x00007f74529285e3 in mcreq_flush_done_ex (now=<optimized out>, expected=65, nflushed=65, pl=0x7f742804b3f0) at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/mc/mcreq-flush-inl.h:97 #10 on_flush_done (ctx=<optimized out>, expected=65, actual=65) at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/mcserver/mcserver.cc:85 #11 0x00007f74528c244b in E_put_ex (nb=65, niov=<optimized out>, iov=0x7f744cc5ce90, ctx=0x7f74280e59f0) at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/lcbio/ctx.c:538 #12 lcbio_ctx_put_ex (ctx=ctx@entry=0x7f74280e59f0, iov=iov@entry=0x7f744cc5ce90, niov=<optimized out>, nb=65) at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/lcbio/ctx.c:616 #13 0x00007f7452923d79 in on_flush_ready (ctx=0x7f74280e59f0) at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/mcserver/mcserver.cc:68 #14 0x00007f74528c0d32 in E_handler (sock=<optimized out>, which=<optimized out>, arg=0x7f74280e59f0) at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/lcbio/ctx.c:322 #15 0x00007f74528a6872 in run_loop (io=<optimized out>, is_tick=<optimized out>) at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/plugins/io/select/plugin-select.c:323 #16 0x00007f745293574e in lcb_wait (instance=0x7f74480d0060) at /home/couchbase/jenkins/workspace/toy-unix-simple/libcouchbase/src/wait.cc:103 #17 0x0000000000463686 in RetryWithFixedBackoff<bool (&)(lcb_error_t), lcb_error_t (&)(lcb_st*), lcb_st*&, lcb_error_t, 0> ( callable=@0x40a460: {lcb_error_t (lcb_st *)} 0x40a460 <lcb_wait@plt>, isRetriable=<optimized out>, initial_delay_milliseconds=200, max_retry_count=5) at /home/couchbase/jenkins/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing/features/include/retry_util.h:40 #18 0x0000000000463b4d in timer::TimerStore::Get (this=0x7f74480ef050, key=...) at /home/couchbase/jenkins/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing-ee/features/src/timer_store.cc:421 #19 0x0000000000468408 in timer::Iterator::GetNextTimer (this=this@entry=0x7f744cc5db60, tevent=...) at /home/couchbase/jenkins/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing-ee/features/src/timer_iterator.cc:52 #20 0x0000000000468bd2 in timer::Iterator::GetNext (this=this@entry=0x7f744cc5db60, tevent=...) at /home/couchbase/jenkins/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing-ee/features/src/timer_iterator.cc:131 #21 0x000000000042ad1b in V8Worker::RouteMessage (this=0x7f7448014870) at /home/couchbase/jenkins/workspace/toy-unix-simple/goproj/src/github.com/couchbase/eventing/v8_consumer/src/v8worker.cc:488 #22 0x00007f7451d89dcf in std::execute_native_thread_routine (__p=0x7f74480eb970) at /tmp/deploy/gcc-7.3.0/libstdc++-v3/src/c++11/thread.cc:83 #23 0x00007f74515a2dd5 in start_thread (arg=0x7f744cc61700) at pthread_create.c:307 #24 0x00007f74512cbead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

            >> I saw that you did some work to build consumer standalone, would it be possible to run it with thread sanitizer?
            Sergey Avseyev From the stack trace seen above, libcouchbase SDK is crashing while doing KV ops in timers. The standalone consumer was very minimal. It only ran N1QL queries and doesn't have the ability create/execute timers.

            Gautham.Banasandra Gautham Banasandra (Inactive) added a comment - - edited >> I saw that you did some work to build consumer standalone, would it be possible to run it with thread sanitizer? Sergey Avseyev From the stack trace seen above, libcouchbase SDK is crashing while doing KV ops in timers. The standalone consumer was very minimal. It only ran N1QL queries and doesn't have the ability create/execute timers.

            Do those timers also involve passing control on the lcb_t between threads? Or for KV ops all of them isolated an pinned to the thread?

            avsej Sergey Avseyev added a comment - Do those timers also involve passing control on the lcb_t between threads? Or for KV ops all of them isolated an pinned to the thread?
            jeelan.poola Jeelan Poola added a comment -

            >> Do those timers also involve passing control on the lcb_t between threads?
            No, there is no passing of control involved between threads.

            >> Or for KV ops all of them isolated an pinned to the thread?
            KV ops can happen from 2 different threads using the same lcb_handle but protected by a mutex. We scanned code multiple times and did not find a case where they can happen unprotected.

            jeelan.poola Jeelan Poola added a comment - >> Do those timers also involve passing control on the lcb_t between threads? No, there is no passing of control involved between threads. >> Or for KV ops all of them isolated an pinned to the thread? KV ops can happen from 2 different threads using the same lcb_handle but protected by a mutex. We scanned code multiple times and did not find a case where they can happen unprotected.

            did the scan include sources of libcouchbase?

            When you call `lcb_wait` it will actively check for data on all sockets (and open/close sockets during rebalance), not only related to command scheduled by the thread. That potentially might involve triggering the callbacks that should not be triggered in current thread.

            avsej Sergey Avseyev added a comment - did the scan include sources of libcouchbase? When you call `lcb_wait` it will actively check for data on all sockets (and open/close sockets during rebalance), not only related to command scheduled by the thread. That potentially might involve triggering the callbacks that should not be triggered in current thread.
            jeelan.poola Jeelan Poola added a comment - - edited

            >> did the scan include sources of libcouchbase?

            No, we did not scan libcouchbase code. Only eventing code.

            >> When you call `lcb_wait` it will actively check for data on all sockets (and open/close
            >> sockets during rebalance), not only related to command scheduled by the thread. That
            >> potentially might involve triggering the callbacks that should not be triggered in current >> thread.

            We have 2 timer_stores each with it's own lcb_instance in 2 different threads. It is possible that these 2 timer_stores are using their respective lcb_instances at the same time doing their own lcb_wait()s during rebalance or any other time. Do you think this can cause any problem?

            The previous multi-thread-access-with-mutex I mentioned earlier is at a single timer_store level involving a single lcb_instance.

            jeelan.poola Jeelan Poola added a comment - - edited >> did the scan include sources of libcouchbase? No, we did not scan libcouchbase code. Only eventing code. >> When you call `lcb_wait` it will actively check for data on all sockets (and open/close >> sockets during rebalance), not only related to command scheduled by the thread. That >> potentially might involve triggering the callbacks that should not be triggered in current >> thread. We have 2 timer_stores each with it's own lcb_instance in 2 different threads. It is possible that these 2 timer_stores are using their respective lcb_instances at the same time doing their own lcb_wait()s during rebalance or any other time. Do you think this can cause any problem? The previous multi-thread-access-with-mutex I mentioned earlier is at a single timer_store level involving a single lcb_instance.

            In case when lcb instance is dedicated to the thread, then it is okay, and they are not interfere with each other. Each thread will run its own `select()` loop and should not exchange any state.

            avsej Sergey Avseyev added a comment - In case when lcb instance is dedicated to the thread, then it is okay, and they are not interfere with each other. Each thread will run its own `select()` loop and should not exchange any state.

            Just had call with Jeelan Poola and Suraj Naik. The stack track I've posted here earlier is not related to the actual problem. New run with the patch fails in exactly the same place, which means the problem must be in libcouchbase logic for handling configuration updates. As an action items I will provide patch with more introspection on where the config is destroyed, and how it is updated. Also I will try to create an example program which simulate this timers workload and be suitable for sanitizers under the rebalance stress.

            avsej Sergey Avseyev added a comment - Just had call with Jeelan Poola and Suraj Naik . The stack track I've posted here earlier is not related to the actual problem. New run with the patch fails in exactly the same place, which means the problem must be in libcouchbase logic for handling configuration updates. As an action items I will provide patch with more introspection on where the config is destroyed, and how it is updated. Also I will try to create an example program which simulate this timers workload and be suitable for sanitizers under the rebalance stress.

            Note since we're merging two sets of things to fix this for posterity.

            1) In an isolated test under dynamic testing tools, a set of issues with construction/destruction of the sockets was found. These are legitimate issues, while not the direct cause for the observation above it appears after additional testing.
            2) Removal of the field from the struct which was always implicated in the crash. We lack a way to run additional stateful memory analysis tools with the tests right now and adding instrumentation did not surface any issues in either initialization or destruction of the structure. Still, it's happening somewhere we cannot identify. We removed the field from the structure expecting the failure to move to somewhere else (since something was clearly referencing it), but instead what happens is the tests pass.

            ingenthr Matt Ingenthron added a comment - Note since we're merging two sets of things to fix this for posterity. 1) In an isolated test under dynamic testing tools, a set of issues with construction/destruction of the sockets was found. These are legitimate issues, while not the direct cause for the observation above it appears after additional testing. 2) Removal of the field from the struct which was always implicated in the crash. We lack a way to run additional stateful memory analysis tools with the tests right now and adding instrumentation did not surface any issues in either initialization or destruction of the structure. Still, it's happening somewhere we cannot identify. We removed the field from the structure expecting the failure to move to somewhere else (since something was clearly referencing it), but instead what happens is the tests pass.

            Build couchbase-server-7.0.0-1141 contains libcouchbase commit 5a9160b with commit message:
            CCBC-1130: Fix leaking sockets and timers

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1141 contains libcouchbase commit 5a9160b with commit message: CCBC-1130 : Fix leaking sockets and timers

            Build couchbase-server-7.0.0-1141 contains libcouchbase commit 417e30a with commit message:
            CCBC-1130: Remove usage of 'buuid' of vbucket config

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1141 contains libcouchbase commit 417e30a with commit message: CCBC-1130 : Remove usage of 'buuid' of vbucket config
            avsej Sergey Avseyev made changes -
            Status New [ 10003 ] Open [ 1 ]
            avsej Sergey Avseyev made changes -
            Fix Version/s 2.10.6 [ 16301 ]

            I'm resolving this as incomplete, because we managed to get rid of the crash by removing code in site where crash happens. But we still don't know what might be triggering the behaviour. Please reopen the issue if more information will be gathered (such as libasan/Valgrind reports)

            avsej Sergey Avseyev added a comment - I'm resolving this as incomplete, because we managed to get rid of the crash by removing code in site where crash happens. But we still don't know what might be triggering the behaviour. Please reopen the issue if more information will be gathered (such as libasan/Valgrind reports)
            avsej Sergey Avseyev made changes -
            Resolution Incomplete [ 4 ]
            Status Open [ 1 ] Resolved [ 5 ]

            Build couchbase-server-6.5.1-6019 contains libcouchbase commit 5a9160b with commit message:
            CCBC-1130: Fix leaking sockets and timers

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.1-6019 contains libcouchbase commit 5a9160b with commit message: CCBC-1130 : Fix leaking sockets and timers

            Build couchbase-server-6.5.1-6019 contains libcouchbase commit 417e30a with commit message:
            CCBC-1130: Remove usage of 'buuid' of vbucket config

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.1-6019 contains libcouchbase commit 417e30a with commit message: CCBC-1130 : Remove usage of 'buuid' of vbucket config

            Build couchbase-server-6.5.0-4955 contains libcouchbase commit 5a9160b with commit message:
            CCBC-1130: Fix leaking sockets and timers

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4955 contains libcouchbase commit 5a9160b with commit message: CCBC-1130 : Fix leaking sockets and timers

            Build couchbase-server-6.5.0-4955 contains libcouchbase commit 417e30a with commit message:
            CCBC-1130: Remove usage of 'buuid' of vbucket config

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4955 contains libcouchbase commit 417e30a with commit message: CCBC-1130 : Remove usage of 'buuid' of vbucket config

            Build couchbase-server-1006.5.1-1120 contains libcouchbase commit 5a9160b with commit message:
            CCBC-1130: Fix leaking sockets and timers

            build-team Couchbase Build Team added a comment - Build couchbase-server-1006.5.1-1120 contains libcouchbase commit 5a9160b with commit message: CCBC-1130 : Fix leaking sockets and timers

            Build couchbase-server-1006.5.1-1120 contains libcouchbase commit 417e30a with commit message:
            CCBC-1130: Remove usage of 'buuid' of vbucket config

            build-team Couchbase Build Team added a comment - Build couchbase-server-1006.5.1-1120 contains libcouchbase commit 417e30a with commit message: CCBC-1130 : Remove usage of 'buuid' of vbucket config
            brett19 Brett Lawson made changes -
            Story Points 1
            avsej Sergey Avseyev added a comment - - edited

            Better fix of this issue here: https://github.com/couchbase/libcouchbase/commit/fa80a2c029854c33216e679127d630f435e0cc0f

            It will be backported to release-2.10

            avsej Sergey Avseyev added a comment - - edited Better fix of this issue here: https://github.com/couchbase/libcouchbase/commit/fa80a2c029854c33216e679127d630f435e0cc0f It will be backported to release-2.10
            avsej Sergey Avseyev made changes -
            Fix Version/s 3.0.7 [ 17120 ]
            avsej Sergey Avseyev made changes -
            Fix Version/s 2.10.9 [ 17116 ]

            libcouchbase 3.0.7 and 2.10.9 will have new fix

            avsej Sergey Avseyev added a comment - libcouchbase 3.0.7 and 2.10.9 will have new fix
            ingenthr Matt Ingenthron made changes -
            Workflow Couchbase SDK Workflow [ 156331 ] Couchbase SDK Workflow with Review [ 250217 ]
            ingenthr Matt Ingenthron made changes -
            Workflow Couchbase SDK Workflow with Review [ 250217 ] Couchbase SDK Workflow [ 262169 ]
            ingenthr Matt Ingenthron made changes -
            Workflow Couchbase SDK Workflow [ 262169 ] Couchbase SDK Workflow with Review [ 263899 ]
            ingenthr Matt Ingenthron made changes -
            Workflow Couchbase SDK Workflow with Review [ 263899 ] SDK Workflow with Review NG [ 265419 ]

            People

              avsej Sergey Avseyev
              suraj.naik Suraj Naik (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty