Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-3592

rebalance fails with bucket_engine error : bucket_engine.c:1876: bucket_engine_release_cookie: Assertion `peh' failed

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.7 alpha 1
    • Fix Version/s: 1.7.0
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:
      None

      Description

      rebalance is failing on the latest v0.0.0. 132 builds with these error messages :

      error messages:

      Port server memcached on node 'ns_1@10.1.5.228' exited with status 134. Restarting. Messages: memcached: bucket_engine.c:1876: bucket_engine_release_cookie: Assertion `peh' failed

      Rebalance exited with reason {badmatch,{error,closed,
      [

      {mc_client_binary,cmd_binary_vocal_recv,5}

      ,

      {mc_client_binary,get_vbucket,2}

      ,

      {ns_memcached,handle_call,3}

      ,

      {gen_server,handle_msg,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]},
      {gen_server,call,
      [

      {'ns_memcached-default','ns_1@10.1.5.228'}

      ,

      {get_vbucket,512}

      ,
      30000]}}
      ns_orchestrator002 ns_1@10.1.5.227 11:23:55 - Tue Apr 12, 2011
      Control connection to memcached on 'ns_1@10.1.5.228' disconnected: {{badmatch,
      {error,
      closed}},
      [

      {mc_client_binary, cmd_binary_vocal_recv, 5}

      ,

      {mc_client_binary, get_vbucket, 2}

      ,

      {ns_memcached, handle_call, 3}

      ,

      {gen_server, handle_msg, 5}

      ,
      {proc_lib,
      init_p_do_apply,

      steps to reproduce :

      1- install
      2- create bucket 'default'
      3- add node X
      4- rebalance
      5- remove X
      6- rebalance

      rebalance fails in step 4 and step 6

      1. core-10.1.5.59-0.log
        15 kB
        Farshid Ghods
      2. ns-diag-20110412112502.txt
        3.44 MB
        Farshid Ghods
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        farshid ghods farshid ghods created issue -
        sharon Sharon Barr (Inactive) made changes -
        Field Original Value New Value
        Assignee Trond Norbye [ trond ]
        farshid ghods farshid ghods made changes -
        Fix Version/s 1.7 Alpha 2 [ 10180 ]
        Fix Version/s 1.7 alpha 1 [ 10170 ]
        Hide
        trond Trond Norbye added a comment -

        Fixed by the recent bucket_engine and ep_engine fixes

        Show
        trond Trond Norbye added a comment - Fixed by the recent bucket_engine and ep_engine fixes
        trond Trond Norbye made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        this crash happened on the latest changes again

        Thread 1 (Thread 22470):
        #0 0x00007f0347746a75 in raise () from /lib/libc.so.6
        #1 0x00007f034774a5c0 in abort () from /lib/libc.so.6
        #2 0x00007f034773f941 in __assert_fail () from /lib/libc.so.6
        #3 0x00007f0346307bb8 in bucket_engine_release_cookie (cookie=0x3ea0da8)
        at bucket_engine.c:2004
        #4 0x00007f034380a04f in TapConnection::releaseReference (this=0x3f24e20,
        force=198) at tapconnection.cc:35
        #5 0x00007f034381b6d1 in TapConnectionReaperCallback::TapConnectionReaperCallback(EventuallyPersistentEngine&, TapConnection*)

        will post the core logs and diags

        Show
        farshid Farshid Ghods (Inactive) added a comment - this crash happened on the latest changes again Thread 1 (Thread 22470): #0 0x00007f0347746a75 in raise () from /lib/libc.so.6 #1 0x00007f034774a5c0 in abort () from /lib/libc.so.6 #2 0x00007f034773f941 in __assert_fail () from /lib/libc.so.6 #3 0x00007f0346307bb8 in bucket_engine_release_cookie (cookie=0x3ea0da8) at bucket_engine.c:2004 #4 0x00007f034380a04f in TapConnection::releaseReference (this=0x3f24e20, force=198) at tapconnection.cc:35 #5 0x00007f034381b6d1 in TapConnectionReaperCallback::TapConnectionReaperCallback(EventuallyPersistentEngine&, TapConnection*) will post the core logs and diags
        farshid Farshid Ghods (Inactive) made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        installed version:

        basestar-260-g2330d10

        ubuntu 64-bit

        Show
        farshid Farshid Ghods (Inactive) added a comment - installed version: basestar-260-g2330d10 ubuntu 64-bit
        farshid Farshid Ghods (Inactive) made changes -
        Attachment 10.1.5.59-diag.zip [ 11300 ]
        Attachment 10.1.5.38-diag.zip [ 11301 ]
        Attachment core-10.1.5.59-0.log [ 11302 ]
        Hide
        trond Trond Norbye added a comment -

        This is a variant of MB-3764

        Show
        trond Trond Norbye added a comment - This is a variant of MB-3764
        farshid Farshid Ghods (Inactive) made changes -
        Fix Version/s 1.7 GA [ 10111 ]
        Fix Version/s 1.7 alpha 2 [ 10180 ]
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        saw this on last night run against basestar-262

        I think this crash happens when we delete a huge bucket few seconds after calling flush on the bucket

        Thread 4 (Thread 9793):
        #0 0x00000037fc60aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
        from /lib64/libpthread.so.0
        #1 0x00002aaaaacfdd5a in wait (this=0x4b33630) at syncobject.hh:31
        #2 Dispatcher::run (this=0x4b33630) at dispatcher.cc:85
        #3 0x00002aaaaacfe953 in launch_dispatcher_thread (arg=0x4b3367c)
        at dispatcher.cc:28
        #4 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0
        #5 0x00000037fbad3f6d in clone () from /lib64/libc.so.6

        Thread 3 (Thread 9794):
        #0 0x00000037fbacd587 in fdatasync () from /lib64/libc.so.6
        #1 0x00002aaaaad7aca9 in full_fsync (id=<value optimized out>, flags=2)
        at embedded/sqlite3.c:25510
        #2 unixSync (id=<value optimized out>, flags=2) at embedded/sqlite3.c:25558
        #3 0x00002aaaaadd0777 in vdbeCommit (db=0x60a3ad8, p=<value optimized out>)
        at embedded/sqlite3.c:13413
        #4 0x00002aaaaadd1fbd in sqlite3VdbeHalt (p=0x604bcd8)
        at embedded/sqlite3.c:56514
        #5 0x00002aaaaae21be9 in sqlite3VdbeExec (p=0x604bcd8)
        at embedded/sqlite3.c:62196
        #6 0x00002aaaaae0055a in sqlite3Step (pStmt=0x604bcd8)
        at embedded/sqlite3.c:57947
        #7 sqlite3_step (pStmt=0x604bcd8) at embedded/sqlite3.c:58011
        #8 0x00002aaaaad6f199 in PreparedStatement::execute (this=0x464d8aa0)
        at sqlite-pst.cc:73
        #9 0x00002aaaaad70648 in SqliteStrategy::execute (
        this=<value optimized out>, query=<value optimized out>)
        at sqlite-strategies.cc:151
        #10 0x00002aaaaad72da4 in SqliteStrategy::open (this=0x3926c50)
        at sqlite-strategies.cc:122
        #11 0x00002aaaaad6d23d in open (this=0x3926d30) at sqlite-kvstore.hh:175
        #12 StrategicSqlite3::reset (this=0x3926d30) at sqlite-kvstore.cc:144
        #13 0x00002aaaaad00e6e in EventuallyPersistentStore::flushOneDeleteAll (
        this=<value optimized out>) at ep.cc:1731
        #14 0x00002aaaaad0b13d in EventuallyPersistentStore::flushOne (
        this=0x3bf4800, q=0x3bf4920, rejectQueue=0x67130c0) at ep.cc:1869
        #15 0x00002aaaaad0b32e in EventuallyPersistentStore::flushSome (
        this=0x3bf4800, q=0x3bf4920, rejectQueue=0x67130c0) at ep.cc:1487
        #16 0x00002aaaaad40e1b in Flusher::doFlush (this=0x58db9b0) at flusher.cc:240
        #17 0x00002aaaaad416e5 in Flusher::step (this=0x51, d=...,
        tid=std::tr1::shared_ptr (count 0) 0x464d8f20) at flusher.cc:154
        #18 0x00002aaaaad41eae in FlusherStepper::callback (this=0x4b338c0, d=...,
        t=<value optimized out>) at flusher.cc:23
        #19 0x00002aaaaacff34f in Task::run (this=<value optimized out>, d=...,
        t=<value optimized out>) at dispatcher.hh:139
        #20 0x00002aaaaacfdf2b in Dispatcher::run (this=0x49b5b90)
        at dispatcher.cc:119
        #21 0x00002aaaaacfe953 in launch_dispatcher_thread (arg=0x51)
        at dispatcher.cc:28
        #22 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0
        #23 0x00000037fbad3f6d in clone () from /lib64/libc.so.6

        Thread 2 (Thread 9795):
        #0 0x00000037fc60b150 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
        from /lib64/libpthread.so.0
        #1 0x00002aaaaacfbc19 in wait (this=0x58db900, d=...) at syncobject.hh:42
        #2 IdleTask::run (this=0x58db900, d=...) at dispatcher.cc:244
        #3 0x00002aaaaacfdf2b in Dispatcher::run (this=0x58da870)
        at dispatcher.cc:119
        #4 0x00002aaaaacfe953 in launch_dispatcher_thread (arg=0x58da8bc)
        at dispatcher.cc:28
        #5 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0
        #6 0x00000037fbad3f6d in clone () from /lib64/libc.so.6

        Thread 1 (Thread 9806):
        #0 0x00000037fba30265 in raise () from /lib64/libc.so.6
        #1 0x00000037fba31d10 in abort () from /lib64/libc.so.6
        #2 0x00000037fba296e6 in __assert_fail () from /lib64/libc.so.6
        #3 0x00002aaaaaaad968 in bucket_engine_release_cookie (cookie=0x621ebe8)
        at bucket_engine.c:2004
        #4 0x00002aaaaad54a59 in TapConnection::releaseReference (this=0x6025620,
        force=78) at tapconnection.cc:35
        #5 0x00002aaaaad66867 in TapConnectionReaperCallback (this=0x391e440)
        at tapconnmap.cc:21
        #6 TapConnMap::shutdownAllTapConnections (this=0x391e440)
        at tapconnmap.cc:341
        #7 0x00002aaaaad1df93 in EventuallyPersistentEngine::destroy (
        this=0x391dea0, force=false) at ep_engine.cc:1734
        #8 0x00002aaaaad29ef4 in EvpDestroy (handle=0x391dea0, force=true)
        at ep_engine.cc:96
        #9 0x00002aaaaaaae711 in engine_shutdown_thread (arg=0x38e9c60)
        at bucket_engine.c:1099
        #10 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0
        #11 0x00000037fbad3f6d in clone () from /lib64/libc.so.6

        Show
        farshid Farshid Ghods (Inactive) added a comment - saw this on last night run against basestar-262 I think this crash happens when we delete a huge bucket few seconds after calling flush on the bucket Thread 4 (Thread 9793): #0 0x00000037fc60aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaacfdd5a in wait (this=0x4b33630) at syncobject.hh:31 #2 Dispatcher::run (this=0x4b33630) at dispatcher.cc:85 #3 0x00002aaaaacfe953 in launch_dispatcher_thread (arg=0x4b3367c) at dispatcher.cc:28 #4 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000037fbad3f6d in clone () from /lib64/libc.so.6 Thread 3 (Thread 9794): #0 0x00000037fbacd587 in fdatasync () from /lib64/libc.so.6 #1 0x00002aaaaad7aca9 in full_fsync (id=<value optimized out>, flags=2) at embedded/sqlite3.c:25510 #2 unixSync (id=<value optimized out>, flags=2) at embedded/sqlite3.c:25558 #3 0x00002aaaaadd0777 in vdbeCommit (db=0x60a3ad8, p=<value optimized out>) at embedded/sqlite3.c:13413 #4 0x00002aaaaadd1fbd in sqlite3VdbeHalt (p=0x604bcd8) at embedded/sqlite3.c:56514 #5 0x00002aaaaae21be9 in sqlite3VdbeExec (p=0x604bcd8) at embedded/sqlite3.c:62196 #6 0x00002aaaaae0055a in sqlite3Step (pStmt=0x604bcd8) at embedded/sqlite3.c:57947 #7 sqlite3_step (pStmt=0x604bcd8) at embedded/sqlite3.c:58011 #8 0x00002aaaaad6f199 in PreparedStatement::execute (this=0x464d8aa0) at sqlite-pst.cc:73 #9 0x00002aaaaad70648 in SqliteStrategy::execute ( this=<value optimized out>, query=<value optimized out>) at sqlite-strategies.cc:151 #10 0x00002aaaaad72da4 in SqliteStrategy::open (this=0x3926c50) at sqlite-strategies.cc:122 #11 0x00002aaaaad6d23d in open (this=0x3926d30) at sqlite-kvstore.hh:175 #12 StrategicSqlite3::reset (this=0x3926d30) at sqlite-kvstore.cc:144 #13 0x00002aaaaad00e6e in EventuallyPersistentStore::flushOneDeleteAll ( this=<value optimized out>) at ep.cc:1731 #14 0x00002aaaaad0b13d in EventuallyPersistentStore::flushOne ( this=0x3bf4800, q=0x3bf4920, rejectQueue=0x67130c0) at ep.cc:1869 #15 0x00002aaaaad0b32e in EventuallyPersistentStore::flushSome ( this=0x3bf4800, q=0x3bf4920, rejectQueue=0x67130c0) at ep.cc:1487 #16 0x00002aaaaad40e1b in Flusher::doFlush (this=0x58db9b0) at flusher.cc:240 #17 0x00002aaaaad416e5 in Flusher::step (this=0x51, d=..., tid=std::tr1::shared_ptr (count 0) 0x464d8f20) at flusher.cc:154 #18 0x00002aaaaad41eae in FlusherStepper::callback (this=0x4b338c0, d=..., t=<value optimized out>) at flusher.cc:23 #19 0x00002aaaaacff34f in Task::run (this=<value optimized out>, d=..., t=<value optimized out>) at dispatcher.hh:139 #20 0x00002aaaaacfdf2b in Dispatcher::run (this=0x49b5b90) at dispatcher.cc:119 #21 0x00002aaaaacfe953 in launch_dispatcher_thread (arg=0x51) at dispatcher.cc:28 #22 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0 #23 0x00000037fbad3f6d in clone () from /lib64/libc.so.6 Thread 2 (Thread 9795): #0 0x00000037fc60b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaacfbc19 in wait (this=0x58db900, d=...) at syncobject.hh:42 #2 IdleTask::run (this=0x58db900, d=...) at dispatcher.cc:244 #3 0x00002aaaaacfdf2b in Dispatcher::run (this=0x58da870) at dispatcher.cc:119 #4 0x00002aaaaacfe953 in launch_dispatcher_thread (arg=0x58da8bc) at dispatcher.cc:28 #5 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0 #6 0x00000037fbad3f6d in clone () from /lib64/libc.so.6 Thread 1 (Thread 9806): #0 0x00000037fba30265 in raise () from /lib64/libc.so.6 #1 0x00000037fba31d10 in abort () from /lib64/libc.so.6 #2 0x00000037fba296e6 in __assert_fail () from /lib64/libc.so.6 #3 0x00002aaaaaaad968 in bucket_engine_release_cookie (cookie=0x621ebe8) at bucket_engine.c:2004 #4 0x00002aaaaad54a59 in TapConnection::releaseReference (this=0x6025620, force=78) at tapconnection.cc:35 #5 0x00002aaaaad66867 in TapConnectionReaperCallback (this=0x391e440) at tapconnmap.cc:21 #6 TapConnMap::shutdownAllTapConnections (this=0x391e440) at tapconnmap.cc:341 #7 0x00002aaaaad1df93 in EventuallyPersistentEngine::destroy ( this=0x391dea0, force=false) at ep_engine.cc:1734 #8 0x00002aaaaad29ef4 in EvpDestroy (handle=0x391dea0, force=true) at ep_engine.cc:96 #9 0x00002aaaaaaae711 in engine_shutdown_thread (arg=0x38e9c60) at bucket_engine.c:1099 #10 0x00000037fc60673d in start_thread () from /lib64/libpthread.so.0 #11 0x00000037fbad3f6d in clone () from /lib64/libc.so.6
        Hide
        trond Trond Norbye added a comment -

        MB-3777 It manifests itself in different ways

        – Posted from Bugbox for Android

        Show
        trond Trond Norbye added a comment - MB-3777 It manifests itself in different ways – Posted from Bugbox for Android
        trond Trond Norbye made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Duplicate [ 3 ]
        Hide
        trond Trond Norbye added a comment -

        MB-3777 It manifests itself in different ways

        – Posted from Bugbox for Android

        Show
        trond Trond Norbye added a comment - MB-3777 It manifests itself in different ways – Posted from Bugbox for Android
        Hide
        trond Trond Norbye added a comment -

        MB-3777 It manifests itself in different ways

        – Posted from Bugbox for Android

        Show
        trond Trond Norbye added a comment - MB-3777 It manifests itself in different ways – Posted from Bugbox for Android
        maria Maria McDuff (Inactive) made changes -
        Component/s couchbase-bucket [ 10173 ]
        Component/s bucket-engine [ 10010 ]
        Hide
        maria Maria McDuff (Inactive) added a comment -

        closing as dupes.

        Show
        maria Maria McDuff (Inactive) added a comment - closing as dupes.
        maria Maria McDuff (Inactive) made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            trond Trond Norbye
            Reporter:
            farshid ghods farshid ghods
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes