Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44079

Ephemeral out of order purging can cause prepares to be recommitted and DurabilityMonitor montonicity exceptions to throw

    XMLWordPrintable

Details

    Description

      Script to Repo

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.116702.ini GROUP=durability_majority,rerun=False,skip_log_scan=False,get-cbcollect-info=False,infra_log_level=critical,log_level=error,bucket_storage=couchstore,upgrade_version=7.0.0-4362 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_in,nodes_init=3,nodes_in=2,override_spec_params=durability;replicas,durability=MAJORITY,replicas=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests,data_load_stage=during,skip_validations=False,GROUP=durability_majority'
      

      The test rebalances-in 2 nodes to init_nodes of 3, with daurability=MAJORITY level data load in parallel
      Rebalance fails (ST attached)

      1 minidump is seen on .57 node

      2021-02-02 18:24:39,107 | test  | CRITICAL | MainThread | [basetestcase:check_coredump_exist:728] 172.23.123.57: 1 core dump seen
      

      Checking memcached.log on .57

      2021-02-02T18:24:20.031995-08:00 CRITICAL *** Fatal error encountered during exception handling ***
      2021-02-02T18:24:20.032058-08:00 CRITICAL Caught unhandled std::exception-derived exception. what(): std::exception
      2021-02-02T18:24:20.055523-08:00 INFO 1464: (bucket2) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.123.68->ns_1@172.23.123.57:bucket2 - (vb:544) Attempting to add stream: opaque_:123, start_seqno_:67, end_seqno_:18446744073709551615, vb_uuid:85106092359276, snap_start_seqno_:67, snap_end_seqno_:67, last_seqno:67, stream_req_value:{"uid":"7"}

      2021-02-02T18:24:20.272001-08:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-4362). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/c9271b0b-61a6-463f-1314aea1-609387e4.dmp before terminating.
      2021-02-02T18:24:20.272032-08:00 CRITICAL Stack backtrace of crashed thread:
      2021-02-02T18:24:20.272286-08:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x145bbd]
      2021-02-02T18:24:20.272299-08:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x15b3fa]
      2021-02-02T18:24:20.272309-08:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x15b738]
      2021-02-02T18:24:20.272393-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7f4b54fff000+0xf5f0]
      2021-02-02T18:24:20.272417-08:00 CRITICAL     /lib64/libc.so.6(gsignal+0x37) [0x7f4b54c31000+0x36337]
      2021-02-02T18:24:20.272436-08:00 CRITICAL     /lib64/libc.so.6(abort+0x148) [0x7f4b54c31000+0x37a28]
      2021-02-02T18:24:20.272473-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7f4b55734000+0x91195]
      2021-02-02T18:24:20.272489-08:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x155632]
      2021-02-02T18:24:20.272504-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f4b55734000+0x8ef86]
      2021-02-02T18:24:20.272519-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f4b55734000+0x8efd1]
      2021-02-02T18:24:20.272534-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f4b59061000+0x16eb23]
      2021-02-02T18:24:20.272543-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f4b59061000+0x168b82]
      2021-02-02T18:24:20.272553-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f4b59061000+0x2e7be6]
      2021-02-02T18:24:20.272562-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f4b59061000+0x2d00da]
      2021-02-02T18:24:20.272571-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f4b59061000+0x2ead09]
      2021-02-02T18:24:20.272580-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f4b59061000+0x166fc3]
      2021-02-02T18:24:20.272606-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f4b55734000+0xb9dcf]
      2021-02-02T18:24:20.272611-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7f4b54fff000+0x7e65]
      2021-02-02T18:24:20.272640-08:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7f4b54c31000+0xfe88d]
      

      and also on .68 node

      2021-02-02 18:24:37,249 | test  | CRITICAL | MainThread | [basetestcase:check_coredump_exist:801] 172.23.123.68: Found 'exception occurred in runloop' logs - ['2021-02-02T18:24:20.259974-08:00 WARNING 1598: exception occurred in runloop during packet execution. Closing connection: PassiveDurabilityMonitor::completeSyncWrite vb:141 No tracked, but received commit for key <ud>cid:0x8:test_collections-359</ud>. Cookies: [{"aiostat":"success","connection":"[ {\\"ip\\":\\"127.0.0.1\\",\\"port\\":60264} - {\\"ip\\":\\"127.0.0.1\\",\\"port\\":11209} (<ud>@ns_server</ud>) ]","engine_storage":"0x0000000000000000","ewouldblock":false,"packet":{"bodylen":37,"cas":0,"datatype":"raw","extlen":16,"key":"<ud>.test_collections-359</ud>","keylen":21,"magic":"ClientRequest","opaque":47,"opcode":"DCP_COMMIT","vbucket":141},"refcount":1}]\n']
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Build couchbase-server-7.0.0-4603 contains kv_engine commit 40b0f7c with commit message:
            MB-44079: Add HCS to seqlist

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4603 contains kv_engine commit 40b0f7c with commit message: MB-44079 : Add HCS to seqlist

            Build couchbase-server-7.0.0-4603 contains kv_engine commit 1d9ba6f with commit message:
            MB-44079: Refactor test functions for reuse

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4603 contains kv_engine commit 1d9ba6f with commit message: MB-44079 : Refactor test functions for reuse

            Verified by running the test of MB-44255 on 7.0.0-4603 (that MB is a dup of this MB). Closing this. 

            sumedh.basarkod Sumedh Basarkod (Inactive) added a comment - Verified by running the test of MB-44255 on 7.0.0-4603 (that MB is a dup of this MB). Closing this. 
            drigby Dave Rigby added a comment -

            Ben Huddleston Please could you add a description for the release notes here?

            drigby Dave Rigby added a comment - Ben Huddleston Please could you add a description for the release notes here?

            Description for release notes:

            Summary: Known Issue Ephemeral item purging may not be done in seqno order as we iterate HashTable buckets rather than the Ephemeral sequence list. As such, it's possible for the commit of a durable write to be purged before the corresponding prepare. If a replica vBucket received a prepare without the corresponding commit then it would attempt to recommit the prepare if the vBucket was promoted to active. This causes montonicity exceptions to be thrown on the new active vBucket and any replica vBucket which did receive the corresponding commit.

            Workaround: Avoid use of durable writes with Ephemeral buckets

            ben.huddleston Ben Huddleston added a comment - Description for release notes: Summary: Known Issue Ephemeral item purging may not be done in seqno order as we iterate HashTable buckets rather than the Ephemeral sequence list. As such, it's possible for the commit of a durable write to be purged before the corresponding prepare. If a replica vBucket received a prepare without the corresponding commit then it would attempt to recommit the prepare if the vBucket was promoted to active. This causes montonicity exceptions to be thrown on the new active vBucket and any replica vBucket which did receive the corresponding commit. Workaround : Avoid use of durable writes with Ephemeral buckets

            People

              sumedh.basarkod Sumedh Basarkod (Inactive)
              sumedh.basarkod Sumedh Basarkod (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty