Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-58046

Unable to debug programs using OpenSSL (e.g. KV-Engine unit tests) after OpenSSL 3.1.1 upgrade

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.6.0
    • 7.6.0
    • build
    • M1 Pro (arm64), macOS 12.6.5
    • Untriaged
    • 0
    • Yes

    Description

      After updating tlm deps recently and picking up the new version of OpenSSL via MB-57839, I am no longer able to run the debugger on any programs linking to libcrypto.3.dylib, such as KV-Engine unit tests (ep-engine_ep_unit_tests).

      When I try, the debugger stops with a EXC_BAD_INSTRUCTION instruction:

      $ lldb -- ./ep-engine_ep_unit_tests --gtest_filter=*stream_request_uid* -v -v
      (lldb) target create "./ep-engine_ep_unit_tests"
      Current executable set to '/Users/dave/repos/couchbase/server/source/build-debug-arm64/kv_engine/ep-engine_ep_unit_tests' (arm64).
      (lldb) settings set -- target.run-args  "--gtest_filter=*stream_request_uid*" "-v" "-v"
      (lldb) b DcpConsumer::handleNoop
      Breakpoint 1: where = ep-engine_ep_unit_tests`DcpConsumer::handleNoop(DcpMessageProducersIface&) + 32 at consumer.cc:1522:9, address = 0x00000001000a4db4
      (lldb) r
      Process 41446 launched: '/Users/dave/repos/couchbase/server/source/build-debug-arm64/kv_engine/ep-engine_ep_unit_tests' (arm64)
      Process 41446 stopped
      * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x4a03000)
          frame #0: 0x000000010d23d008 libcrypto.3.dylib` _armv8_sve_probe 
      libcrypto.3.dylib`:
      ->  0x10d23d008 <+0>: eor    z0.d, z0.d, z0.d
          0x10d23d00c <+4>: ret    
       
      libcrypto.3.dylib`:
          0x10d23d010 <+0>: xar    z0.d, z0.d, z0.d, #0x20
          0x10d23d014 <+4>: ret    
      Target 0: (ep-engine_ep_unit_tests) stopped.
      (lldb) bt
      * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x4a03000)
        * frame #0: 0x000000010d23d008 libcrypto.3.dylib` _armv8_sve_probe 
          frame #1: 0x000000010d23d7a4 libcrypto.3.dylib` OPENSSL_cpuid_setup  + 924
          frame #2: 0x000000010bf9df4c dyld` invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const  + 164
          frame #3: 0x000000010bfc7784 dyld` invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const  + 340
          frame #4: 0x000000010bfbded8 dyld` invocation function for block in dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const  + 528
          frame #5: 0x000000010bf89f98 dyld` dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void (load_command const*, bool&) block_pointer) const  + 168
          frame #6: 0x000000010bfbdc80 dyld` dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const  + 192
          frame #7: 0x000000010bfc71d4 dyld` dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const  + 516
          frame #8: 0x000000010bf9de8c dyld` dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const  + 172
          frame #9: 0x000000010bf9e038 dyld` dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const  + 216
          frame #10: 0x000000010bf9e014 dyld` dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const  + 180
          frame #11: 0x000000010bf9e014 dyld` dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const  + 180
          frame #12: 0x000000010bf9e014 dyld` dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const  + 180
          frame #13: 0x000000010bf9e104 dyld` dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const  + 124
          frame #14: 0x000000010bfb33ac dyld` dyld4::APIs::runAllInitializersForMain()  + 312
          frame #15: 0x000000010bf8ddbc dyld` dyld4::prepare(dyld4::APIs&, dyld3::MachOAnalyzer const*)  + 3136
          frame #16: 0x000000010bf8d06c dyld` start  + 488
      

      i.e. it appears the symbol _armv8_sve_probe in libcrypto.3.dylib is emitting illegal instructions which stop the debugger.

      If I rollback to the previous version of OpenSSL (git revert ac4049c) and rebuild, the problem goes away.

      There's some discussion on SO about this here. The summary is that latest versions of OpenSSL attempt to use some ARMv8 extension instructions (e.g. SVE2) which are not supported on Apple M1. OpenSSL sets up an signal handler to catch the invalid instruction exception (note it's not supported) and continue.

      That links to GitHub issue 20753: EXC_BAD_INSTRUCTION in lib crypto.3.dylib when v3.1.0 is run under the debugger. v3.0.8 does not generate problem lib. (Apple M1/M2).

      Following the path through the repo, OpenSSL have disabled this sigill-style feature detection for Apple Silicon on the openssl-3.1 branch as of Jun 25 - https://github.com/openssl/openssl/commit/50af7294e514a2aba19c5248a4ed612ba3ba4c1b

      However that has not yet been included in a release - OpenSSL 3.1.1 (latest release) was on 30th May, however OpenSSL 3.1.2 is scheduled for 1st August (https://mta.openssl.org/pipermail/openssl-announce/2023-July/000266.html).

      Workaround

      Roll back to the previous version of OpenSSL (3.0.7) - from the top-level of a checkout:

      cd tlm
      git revert ac4049c
      <rebuild as normal>
      

      Potential Workaround (or not...)
      According to the above SO post, if we configure lldb to ignore this exception type then we can still debug programs using OpenSSL 3.1:

      settings set platform.plugin.darwin.ignored-exceptions EXC_BAD_INSTRUCTION
      process handle SIGILL -n false -p true -s false
      

      However that doesn't work in my environment (macOS 12.6.5, lldb-1400.0.38.17), I get an error when setting the ...ignored-exceptions:

      error: invalid value path 'platform.plugin.darwin.ignored-exceptions'
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            owend Daniel Owen
            drigby Dave Rigby (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty