Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45347

[System Test]memcached crash observed in longevity - Collections::VB::Manifest::verifyFlatbuffersData: getCreateScopeEventData data invalid

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown
    • KV-Engine 2021-March

    Description

      7.0.0-4809

      Test:
      -test tests/integration/cheshirecat/test_cheshirecat_kv_gsi_coll_xdcr_backup_sgw_fts_itemct_txns_eventing_cbas.yml -scope tests/integration/cheshirecat/scope_cheshirecat_with_backup.yml
      Scale 3
      Iteration 1

      .108.103:

      Service 'memcached' exited with status 134. Restarting. Messages:
      2021-03-30T13:31:05.829353-07:00 CRITICAL #4 /lib64/libc.so.6(gsignal+0x37) [0x7fa8554b5000+0x351d7]
      2021-03-30T13:31:05.829370-07:00 CRITICAL #5 /lib64/libc.so.6(abort+0x148) [0x7fa8554b5000+0x368c8]
      2021-03-30T13:31:05.829397-07:00 CRITICAL #6 /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7fa855fab000+0x91195]
      2021-03-30T13:31:05.829405-07:00 CRITICAL #7 /opt/couchbase/bin/memcached() [0x400000+0x159a62]
      2021-03-30T13:31:05.829419-07:00 CRITICAL #8 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fa855fab000+0x8ef86]
      2021-03-30T13:31:05.829433-07:00 CRITICAL #9 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fa855fab000+0x8efd1]
      2021-03-30T13:31:05.829455-07:00 CRITICAL #10 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fa855fab000+0xb9dfe]
      2021-03-30T13:31:05.829460-07:00 CRITICAL #11 /lib64/libpthread.so.0() [0x7fa855876000+0x7dc5]
      2021-03-30T13:31:05.829487-07:00 CRITICAL #12 /lib64/libc.so.6(clone+0x6d) [0x7fa8554b5000+0xf776d]
      

      .97.119:

      2021-03-30T13:31:05.552830-07:00 INFO 1416: (No Engine) DCP (Producer) eq_dcpq:eventing:jilnBadK-38671:getfailoverlog-ITEM - Removing connection [ {"ip":"172.23.98.135","port":52451} - {"ip":"172.23.97.119","port":11210} (System, <ud>@eventing</ud>) ]
      2021-03-30T13:31:05.556051-07:00 CRITICAL *** Fatal error encountered during exception handling ***
      2021-03-30T13:31:05.556101-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): Collections::VB::Manifest::verifyFlatbuffersData: getCreateScopeEventData data invalid, ptr:0x7fa7adaa1749, size:0
      2021-03-30T13:31:05.644409-07:00 INFO 1431: (No Engine) DCP (Producer) eq_dcpq:eventing:jilnBadK-38693:getfailoverlog-ITEM - Removing connection [ {"ip":"172.23.98.135","port":52473} - {"ip":"172.23.97.119","port":11210} (System, <ud>@eventing</ud>) ]
      2021-03-30T13:31:05.653962-07:00 INFO 1419: Using SSL cipher:TLS_AES_256_GCM_SHA384
      2021-03-30T13:31:05.654202-07:00 INFO 1419: HELO [GoMemcached] XATTR, JSON, Collections [ {"ip":"172.23.120.245","port":41940} - {"ip":"172.23.97.119","port":11207} (not authenticated) ]
      2021-03-30T13:31:05.764907-07:00 INFO 1419: Client {"ip":"172.23.120.245","port":41940} authenticated as <ud>@cbq-engine</ud>
      2021-03-30T13:31:05.829086-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-4809). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/ee5bc93b-0b30-4451-59a81ab4-b597ecd8.dmp before terminating.
      2021-03-30T13:31:05.829119-07:00 CRITICAL Stack backtrace of crashed thread:
      2021-03-30T13:31:05.829311-07:00 CRITICAL     #0  /opt/couchbase/bin/memcached() [0x400000+0x14a96d]
      2021-03-30T13:31:05.829321-07:00 CRITICAL     #1  /opt/couchbase/bin/../lib/libdefault_engine.so(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x7fa859dc3000+0x308ca]
      2021-03-30T13:31:05.829326-07:00 CRITICAL     #2  /opt/couchbase/bin/../lib/libdefault_engine.so(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x7fa859dc3000+0x30c08]
      2021-03-30T13:31:05.829332-07:00 CRITICAL     #3  /lib64/libpthread.so.0() [0x7fa855876000+0xf370]
      2021-03-30T13:31:05.829353-07:00 CRITICAL     #4  /lib64/libc.so.6(gsignal+0x37) [0x7fa8554b5000+0x351d7]
      2021-03-30T13:31:05.829370-07:00 CRITICAL     #5  /lib64/libc.so.6(abort+0x148) [0x7fa8554b5000+0x368c8]
      2021-03-30T13:31:05.829397-07:00 CRITICAL     #6  /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7fa855fab000+0x91195]
      2021-03-30T13:31:05.829405-07:00 CRITICAL     #7  /opt/couchbase/bin/memcached() [0x400000+0x159a62]
      2021-03-30T13:31:05.829419-07:00 CRITICAL     #8  /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fa855fab000+0x8ef86]
      2021-03-30T13:31:05.829433-07:00 CRITICAL     #9  /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fa855fab000+0x8efd1]
      2021-03-30T13:31:05.829455-07:00 CRITICAL     #10 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fa855fab000+0xb9dfe]
      2021-03-30T13:31:05.829460-07:00 CRITICAL     #11 /lib64/libpthread.so.0() [0x7fa855876000+0x7dc5]
      2021-03-30T13:31:05.829487-07:00 CRITICAL     #12 /lib64/libc.so.6(clone+0x6d) [0x7fa8554b5000+0xf776d
      

      Logs:
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.104.155.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.104.5.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.106.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.106.188.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.108.103.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.120.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.121.117.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.121.3.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.123.27.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.123.28.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.96.148.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.96.251.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.96.252.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.96.253.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.97.119.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.97.121.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.97.122.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.97.239.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.97.242.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.98.135.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.99.11.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.99.20.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.99.21.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1617137474/collectinfo-2021-03-30T205116-ns_1%40172.23.99.25.zip

      Attachments

        For Gerrit Dashboard: MB-45347
        # Subject Branch Project Status CR V

        Activity

          jwalker Jim Walker added a comment - - edited

          A quick look at this (and where we use the failing function). I wonder if this is a DCP "no_value" stream.

          Noted just before the crash, for example

          2021-03-30T13:31:05.284037-07:00 INFO 1426: DCP connection opened successfully. PRODUCER, NO_VALUE, DELETE_TIMES [ {"ip":"172.23.123.27","port":56350} - {"ip":"172.23.97.119","port":11210} (System, <ud>@eventing</ud>) ]
          

          If we had no the no_value setting, we might have have called into here with an Item that has no value (i haven't checked if it is possible, but suspect it is)

          http://src.couchbase.org/source/xref/trunk/kv_engine/engines/ep/src/systemevent_factory.cc#114

          If the DCP stream is no value, we probably can't send system events (or at least we have to special case them). Need to figure out if not sending system events is a problem - basically if there is no value for a system event, you would only know something has happened to a scope/collection, but not what

          {create/drop}
          jwalker Jim Walker added a comment - - edited A quick look at this (and where we use the failing function). I wonder if this is a DCP "no_value" stream. Noted just before the crash, for example 2021-03-30T13:31:05.284037-07:00 INFO 1426: DCP connection opened successfully. PRODUCER, NO_VALUE, DELETE_TIMES [ {"ip":"172.23.123.27","port":56350} - {"ip":"172.23.97.119","port":11210} (System, <ud>@eventing</ud>) ] If we had no the no_value setting, we might have have called into here with an Item that has no value (i haven't checked if it is possible, but suspect it is) http://src.couchbase.org/source/xref/trunk/kv_engine/engines/ep/src/systemevent_factory.cc#114 If the DCP stream is no value, we probably can't send system events (or at least we have to special case them). Need to figure out if not sending system events is a problem - basically if there is no value for a system event, you would only know something has happened to a scope/collection, but not what {create/drop}
          drigby Dave Rigby added a comment -

          I suspect we need to send system events regardless of if NO_VALUE is set or not - i.e. NO_VALUE is only applicable to "normal" documents.

          drigby Dave Rigby added a comment - I suspect we need to send system events regardless of if NO_VALUE is set or not - i.e. NO_VALUE is only applicable to "normal" documents.
          paolo.cocchi Paolo Cocchi added a comment - - edited

          Jim Walker Still checking all this out, but if that's what you say then I think that we should just not apply that setting to SystemEvents. I think that NoValue is just an optimization for DCP client that don't need the full mutation payload.

          paolo.cocchi Paolo Cocchi added a comment - - edited Jim Walker Still checking all this out, but if that's what you say then I think that we should just not apply that setting to SystemEvents. I think that NoValue is just an optimization for DCP client that don't need the full mutation payload.

          Observed in build sanity as well

          arunkumar Arunkumar Senthilnathan (Inactive) added a comment - Observed in build sanity as well
          paolo.cocchi Paolo Cocchi added a comment - - edited

          DCP clients that open DCP Producers with NO_VALUE may hit this issue at backfill.
          Eventing has recently started using the NO_VALUE flag at DcpOpen, which is probably the reason why we see this issue now.
          Fix in progress.

          paolo.cocchi Paolo Cocchi added a comment - - edited DCP clients that open DCP Producers with NO_VALUE may hit this issue at backfill. Eventing has recently started using the NO_VALUE flag at DcpOpen, which is probably the reason why we see this issue now. Fix in progress.

          Build couchbase-server-7.0.0-4873 contains kv_engine commit 11990ca with commit message:
          MB-45347: Backfill always reads the full payload for System Events

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4873 contains kv_engine commit 11990ca with commit message: MB-45347 : Backfill always reads the full payload for System Events

          Issue not seen in the run with 7.0.0-4910

          mihir.kamdar Mihir Kamdar (Inactive) added a comment - Issue not seen in the run with 7.0.0-4910

          People

            arunkumar Arunkumar Senthilnathan (Inactive)
            arunkumar Arunkumar Senthilnathan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty