Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-42688

Potential crash during ExpiredItemPager::run() if vBucket state changes during visit

    XMLWordPrintable

    Details

      Description

      Script to Repo:

      ./testrunner -i /tmp/testexec.74991.ini GROUP=durability_majority,rerun=False,skip_log_scan=False,get-cbcollect-info=False,infra_log_level=critical,log_level=error,bucket_storage=couchstore,upgrade_version=7.0.0-3705 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_in,nodes_init=3,nodes_in=2,override_spec_params=durability;replicas,durability=MAJORITY,replicas=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests,data_load_stage=during,skip_validations=False,GROUP=durability_majority
      

      2020-11-11 15:12:16,638 | test | ERROR | MainThread | [basetestcase:check_coredump_exist:683] Node 172.23.122.110 - Core dump seen: 1 172.23.122.110 : Stack Trace of first crash: e8f14034-2ab2-4818-7b271cb6-1a4b437d.dmp

      172.23.122.110 : Found message in /opt/couchbase/var/lib/couchbase/logs/memcached.log.000002.txt

      2020-11-11T15:11:32.027733-08:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-3705). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/e8f14034-2ab2-4818-7b271cb6-1a4b437d.dmp before terminating.
      2020-11-11T15:11:32.027767-08:00 CRITICAL Stack backtrace of crashed thread:
      2020-11-11T15:11:32.029410-08:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x19b1ed]
      2020-11-11T15:11:32.029437-08:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x1b094a]
      2020-11-11T15:11:32.029473-08:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x1b0c88]
      2020-11-11T15:11:32.029482-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7f1a6b226000+0xf5f0]
      2020-11-11T15:11:32.029495-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f1a6ef47000+0x1c126f]
      2020-11-11T15:11:32.029503-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f1a6ef47000+0x1b2bcc]
      2020-11-11T15:11:32.029510-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f1a6ef47000+0x1aa097]
      2020-11-11T15:11:32.029517-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f1a6ef47000+0x1aa48d]
      2020-11-11T15:11:32.029526-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f1a6ef47000+0x196abd]
      2020-11-11T15:11:32.029533-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f1a6ef47000+0x1860b3]
      2020-11-11T15:11:32.029541-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f1a6ef47000+0x8748f]
      2020-11-11T15:11:32.029548-08:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f1a6d9de000+0x10917]
      2020-11-11T15:11:32.029554-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7f1a6b226000+0x7e65]
      

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            sumedh.basarkod Sumedh Basarkod created issue -
            Hide
            sumedh.basarkod Sumedh Basarkod added a comment -

            Looked last good on weekly build 7.0.0-3653

            Show
            sumedh.basarkod Sumedh Basarkod added a comment - Looked last good on weekly build 7.0.0-3653
            Hide
            sumedh.basarkod Sumedh Basarkod added a comment - - edited

            Looks like it has affected a lot of 7.0_P0 (must pass) rebalance jobs on this weekly build
            http://qa.sc.couchbase.com/job/test_suite_executor-TAF/70382/
            Hence I prefer to mark this must pass.

            Show
            sumedh.basarkod Sumedh Basarkod added a comment - - edited Looks like it has affected a lot of 7.0_P0 (must pass) rebalance jobs on this weekly build http://qa.sc.couchbase.com/job/test_suite_executor-TAF/70382/ Hence I prefer to mark this must pass.
            sumedh.basarkod Sumedh Basarkod made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            sumedh.basarkod Sumedh Basarkod made changes -
            Labels affects-cc-testing collections functional-test 7.0mustpass affects-cc-testing collections functional-test
            Balakumaran.Gopal Balakumaran Gopal made changes -
            Affects Version/s Cheshire-Cat [ 15915 ]
            Hide
            owend Daniel Owen added a comment -

            Many thanks Sumedh Basarkod for the full BT that really helped in quick triage

            Crash is here:

            std::function<bool(const Vbid&, const Vbid&)>
            PagingVisitor::getVBucketComparator() const {
                // Get the pageable mem used of each vb _once_ and cache it.
                // Fetching these values repeatedly in the comparator could cause issues as
                // the values can change _during_ a given sort call.
             
                std::map<Vbid, size_t> pageableMemUsed;
             
                for (const auto& vbid : store.getVBuckets().getBuckets()) {
                    auto vb = store.getVBucket(vbid);
                    pageableMemUsed[vbid] = vb ? vb->getPageableMemUsage() : 0;
                }
             
                return [pageableMemUsed = std::move(pageableMemUsed), &store = store](
                               const Vbid& a, const Vbid& b) mutable {
                    auto vbA = store.getVBucket(a);
                    auto vbB = store.getVBucket(b);     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< CRASH IS HERE
                    bool aReplica = vbA && vbA->getState() == vbucket_state_replica;
                    bool bReplica = vbB && vbB->getState() == vbucket_state_replica;
                    // sort replicas before all other vbucket states, then sort by
                    // pageableMemUsed
                    return std::make_pair(aReplica, pageableMemUsed[a]) >
                           std::make_pair(bReplica, pageableMemUsed[b]);
                };
            }
            

            Hi James Harrison, Unfortunately looks like might be an issue with the item pager changes. Could you take a look? thanks

            Show
            owend Daniel Owen added a comment - Many thanks Sumedh Basarkod for the full BT that really helped in quick triage Crash is here: std::function<bool(const Vbid&, const Vbid&)> PagingVisitor::getVBucketComparator() const { // Get the pageable mem used of each vb _once_ and cache it. // Fetching these values repeatedly in the comparator could cause issues as // the values can change _during_ a given sort call.   std::map<Vbid, size_t> pageableMemUsed;   for (const auto& vbid : store.getVBuckets().getBuckets()) { auto vb = store.getVBucket(vbid); pageableMemUsed[vbid] = vb ? vb->getPageableMemUsage() : 0; }   return [pageableMemUsed = std::move(pageableMemUsed), &store = store]( const Vbid& a, const Vbid& b) mutable { auto vbA = store.getVBucket(a); auto vbB = store.getVBucket(b); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< CRASH IS HERE bool aReplica = vbA && vbA->getState() == vbucket_state_replica; bool bReplica = vbB && vbB->getState() == vbucket_state_replica; // sort replicas before all other vbucket states, then sort by // pageableMemUsed return std::make_pair(aReplica, pageableMemUsed[a]) > std::make_pair(bReplica, pageableMemUsed[b]); }; } Hi James Harrison , Unfortunately looks like might be an issue with the item pager changes. Could you take a look? thanks
            owend Daniel Owen made changes -
            Assignee Daniel Owen [ owend ] James Harrison [ james.harrison ]
            owend Daniel Owen made changes -
            Epic Link MB-16181 [ 44869 ]
            owend Daniel Owen made changes -
            Summary [Collections] KVBucket::visitAsync in ExpiredItemPager::run() KVBucket::visitAsync in ExpiredItemPager::run()
            owend Daniel Owen made changes -
            Link This issue relates to MB-40531 [ MB-40531 ]
            james.harrison James Harrison made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            wayne Wayne Siu made changes -
            Link This issue blocks MB-40511 [ MB-40511 ]
            wayne Wayne Siu made changes -
            Link This issue blocks MB-40528 [ MB-40528 ]
            lynn.straus Lynn Straus made changes -
            Due Date 13/Nov/20
            wayne Wayne Siu made changes -
            Labels 7.0mustpass affects-cc-testing collections functional-test 7.0mustpass affects-cc-testing approved-for-6.0.5 approved-for-6.6.1 collections functional-test
            wayne Wayne Siu made changes -
            Fix Version/s 6.6.1 [ 17002 ]
            Fix Version/s 6.0.5 [ 16729 ]
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.0.4-3107 contains kv_engine commit 3b03448 with commit message:
            MB-42688: Do not check vbucket state on each comparator call

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.4-3107 contains kv_engine commit 3b03448 with commit message: MB-42688 : Do not check vbucket state on each comparator call
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.0.5-3318 contains kv_engine commit 3b03448 with commit message:
            MB-42688: Do not check vbucket state on each comparator call

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.5-3318 contains kv_engine commit 3b03448 with commit message: MB-42688 : Do not check vbucket state on each comparator call
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-3736 contains kv_engine commit 96d613e with commit message:
            MB-42688: Do not check vbucket state on each comparator call

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3736 contains kv_engine commit 96d613e with commit message: MB-42688 : Do not check vbucket state on each comparator call
            Hide
            mihir.kamdar Mihir Kamdar added a comment -

            Daniel Owen should this issue be in Resolved state now, or are there any further changes pending ?

            Show
            mihir.kamdar Mihir Kamdar added a comment - Daniel Owen should this issue be in Resolved state now, or are there any further changes pending ?
            Hide
            drigby Dave Rigby added a comment -

            Code is all in 7.0, but it’s still awaiting merging to 6.6 hence still open.

            Show
            drigby Dave Rigby added a comment - Code is all in 7.0, but it’s still awaiting merging to 6.6 hence still open.
            james.harrison James Harrison made changes -
            Status In Progress [ 3 ] Open [ 1 ]
            james.harrison James Harrison made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.1-9191 contains kv_engine commit 41426c6 with commit message:
            MB-42688: Merge branch 'couchbase/alice' into mad-hatter

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9191 contains kv_engine commit 41426c6 with commit message: MB-42688 : Merge branch 'couchbase/alice' into mad-hatter
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.1-9191 contains kv_engine commit 3b03448 with commit message:
            MB-42688: Do not check vbucket state on each comparator call

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9191 contains kv_engine commit 3b03448 with commit message: MB-42688 : Do not check vbucket state on each comparator call
            Hide
            sumedh.basarkod Sumedh Basarkod added a comment -

            Verified from CC weekly build's (7.0.0-3739) results.

            Show
            sumedh.basarkod Sumedh Basarkod added a comment - Verified from CC weekly build's (7.0.0-3739) results.
            sumedh.basarkod Sumedh Basarkod made changes -
            Assignee James Harrison [ james.harrison ] Sumedh Basarkod [ sumedh.basarkod ]
            Status Resolved [ 5 ] Closed [ 6 ]
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-3991 contains kv_engine commit 41426c6 with commit message:
            MB-42688: Merge branch 'couchbase/alice' into mad-hatter

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3991 contains kv_engine commit 41426c6 with commit message: MB-42688 : Merge branch 'couchbase/alice' into mad-hatter
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-3991 contains kv_engine commit 3b03448 with commit message:
            MB-42688: Do not check vbucket state on each comparator call

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3991 contains kv_engine commit 3b03448 with commit message: MB-42688 : Do not check vbucket state on each comparator call
            drigby Dave Rigby made changes -
            Summary KVBucket::visitAsync in ExpiredItemPager::run() Potential crash during ExpiredItemPager::run() if vBucket state changes during visit
            Hide
            ashwin.govindarajulu Ashwin Govindarajulu added a comment -

            As discussed with Dave Rigby , marking this as "request-dev-verify" for 6.0.5 release

            Show
            ashwin.govindarajulu Ashwin Govindarajulu added a comment - As discussed with Dave Rigby , marking this as "request-dev-verify" for 6.0.5 release
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Assignee Sumedh Basarkod [ sumedh.basarkod ] Dave Rigby [ drigby ]
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Labels 7.0mustpass affects-cc-testing approved-for-6.0.5 approved-for-6.6.1 collections functional-test 7.0mustpass affects-cc-testing approved-for-6.0.5 approved-for-6.6.1 collections functional-test request-dev-verify
            Hide
            ashwin.govindarajulu Ashwin Govindarajulu added a comment -

            Reopening for validating it against 6.0.5 build.

            Show
            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Reopening for validating it against 6.0.5 build.
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Reopened [ 4 ]
            drigby Dave Rigby made changes -
            Triage Untriaged [ 10351 ] Triaged [ 10350 ]
            VERIFICATION STEPS Verified by unit tests which exhaustively verify all possible state changes are correctly handled.
            Resolution Fixed [ 1 ]
            Status Reopened [ 4 ] Resolved [ 5 ]
            Hide
            ashwin.govindarajulu Ashwin Govindarajulu added a comment -

            Closing the ticket since its validated against unit-test on 6.0.5 branch.

            Show
            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Closing the ticket since its validated against unit-test on 6.0.5 branch.
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            lynn.straus Lynn Straus made changes -
            Fix Version/s 7.0.0 [ 17233 ]
            lynn.straus Lynn Straus made changes -
            Fix Version/s Cheshire-Cat [ 15915 ]

              People

              Assignee:
              drigby Dave Rigby
              Reporter:
              sumedh.basarkod Sumedh Basarkod
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Due:
                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty