Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48865

CB Server Docker image (x86) fails to run on Docker-for-Mac on Apple Silicon

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 6.6.0
    • Neo.next
    • build
    • Untriaged
    • 1
    • No
    • Build Team 2021 Sprint 21, Build Team 2022 Sprint 2

    Description

      The Golang binaries in ns_server (gozip vbmap goport godu minify gosecrets) are currently built with Go 1.8.5, which hasn't been a supported version for nearly four years. In particular, this is causing an odd situation reported by a user (https://github.com/couchbase/docker/issues/165 ) where our official Docker images won't run in Docker on Mac M1, because the binaries don't meet some Mac requirement even for running in emulation.

      We should upgrade to something much newer, ideally Go 1.15 or so.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Chris Hillery There is no must reason to upgrade. In fact, indexing team have seen issues with newer golang which got reverted. So it is not so trivial to pick up a good golang version. It may cause regressions, which can also be quite subtle. I am willing to give it a try with a toy build and run it through rigorous functional and performance tests, but we won't have the time or resources to dedicate for this at this time. If you have the cycles and want to give that a try, we will provide support as much as we can.

            meni.hillel Meni Hillel (Inactive) added a comment - Chris Hillery There is no must reason to upgrade. In fact, indexing team have seen issues with newer golang which got reverted. So it is not so trivial to pick up a good golang version. It may cause regressions, which can also be quite subtle. I am willing to give it a try with a toy build and run it through rigorous functional and performance tests, but we won't have the time or resources to dedicate for this at this time. If you have the cycles and want to give that a try, we will provide support as much as we can.

            Meni Hillel True enough; we've definitely had numerous issues doing Golang upgrades (mostly performance-related, which I wouldn't think would affect these tools, but also functionality and even compiling). That's why the build system even allows for multiple Golang versions in the same Server build. However, it's already the case that all of these tools have been built wtih Golang 1.13.7 on MacOS for years, so it seems reasonable that at least upgrading to that version would be relatively safe. I will run a few tests.

            ceej Chris Hillery added a comment - Meni Hillel  True enough; we've definitely had numerous issues doing Golang upgrades (mostly performance-related, which I wouldn't think would affect these tools, but also functionality and even compiling). That's why the build system even allows for multiple Golang versions in the same Server build. However, it's already the case that all of these tools have been built wtih Golang 1.13.7 on MacOS for years, so it seems reasonable that at least upgrading to that version would be relatively safe. I will run a few tests.

            This has been reported by a second user: https://github.com/couchbase/docker/issues/167

            Meni Hillel Please assign this to an appropriate person for prioritization and scheduling. This will only become more relevant as M1 Macs become the norm. It's already the case that our IT in India can no longer get non-M1 Macs from Apple, so even our own developers will be impacted.

            ceej Chris Hillery added a comment - This has been reported by a second user: https://github.com/couchbase/docker/issues/167 Meni Hillel  Please assign this to an appropriate person for prioritization and scheduling. This will only become more relevant as M1 Macs become the norm. It's already the case that our IT in India can no longer get non-M1 Macs from Apple, so even our own developers will be impacted.
            ceej Chris Hillery added a comment - - edited

            Ian McCloy "The emulation" refers to the Linux VM that Docker Desktop on MacOS uses. Binaries compiled on Linux aarch64 using Go 1.8 will not run on Docker Desktop on an M1 Mac. I know it's weird, but it's true - at the very least, that is the information I have today.

            ceej Chris Hillery added a comment - - edited Ian McCloy "The emulation" refers to the Linux VM that Docker Desktop on MacOS uses. Binaries compiled on Linux aarch64 using Go 1.8 will not run on Docker Desktop on an M1 Mac. I know it's weird, but it's true - at the very least, that is the information I have today.

            Ming Ho has been attempting Docker-on-M1 experiments. He first hit an issue with Erlang JIT which we are addressing; I'll ask him to verify one way or the other whether my current information about the Go 1.8 binaries is true.

            ceej Chris Hillery added a comment - Ming Ho has been attempting Docker-on-M1 experiments. He first hit an issue with Erlang JIT which we are addressing; I'll ask him to verify one way or the other whether my current information about the Go 1.8 binaries is true.

            Ian McCloy I disagree with this being critical. We did not commit to support M1 in Neo timeframe. We are already looking into it but I can say quite confidently that it will not be part of Neo.

            meni.hillel Meni Hillel (Inactive) added a comment - Ian McCloy I disagree with this being critical. We did not commit to support M1 in Neo timeframe. We are already looking into it but I can say quite confidently that it will not be part of Neo.
            ianmccloy Ian McCloy added a comment - - edited

            So an ARM (aarch64) Amazon Linux VM/Container running on a ARM v8 Mac M1 doesn't work because of the gloang version in Couchbase Server.  But this same container works fine on ARM v8 AWS.   That doesn't make sense to me.

            ianmccloy Ian McCloy added a comment - - edited So an ARM (aarch64) Amazon Linux VM/Container running on a ARM v8 Mac M1 doesn't work because of the gloang version in Couchbase Server.  But this same container works fine on ARM v8 AWS.   That doesn't make sense to me.

            Adjusting to Neo per meeting on 20211216 discussion

            ingenthr Matt Ingenthron added a comment - Adjusting to Neo per meeting on 20211216 discussion
            ming.ho Ming Ho added a comment -

            JIT is not an issue with arm64 container since Erlang disables JIT by default on arm64/aarch64.  It is an issue if one tries to start a x86_64 container on M1 in Rosetta mode.  It results in "Segmentation fault".

            I am able to start couchbase server, and import sample bucket, i.e. beer-sample.  I haven't tried anything else, given the potential issue w/ golang issue described here.

            ming.ho Ming Ho added a comment - JIT is not an issue with arm64 container since Erlang disables JIT by default on arm64/aarch64.  It is an issue if one tries to start a x86_64 container on M1 in Rosetta mode.  It results in "Segmentation fault". I am able to start couchbase server, and import sample bucket, i.e. beer-sample.  I haven't tried anything else, given the potential issue w/ golang issue described here.

            Timofey Barmin I guess we don't need to use M1 for ARM. As notes on the ticket suggest, we can use AWS instances.

            meni.hillel Meni Hillel (Inactive) added a comment - Timofey Barmin I guess we don't need to use M1 for ARM. As notes on the ticket suggest, we can use AWS instances.
            drigby Dave Rigby added a comment - - edited

            I think there's a bit of confusion here (understandable, given the complex state of the Apple Silicon / Docker ecosystem). Let me try to clarify / confirm a few things to hopefully aid us in making forward progress.

            1. Rosetta2 only supports macOS x86-64 userspace applications. As such, as soon as one is talking about x86 Linux Docker images then Rosetta2 is out of the picture - it has no role / ability to translate Linux applications.
            2. Docker-for-Mac for Apple Silicon has a somewhat complex software stack: It creates a hidden aarch64/linux Virtual Machine (using the built-in macOS Hyperkit framework), and then:
              1. For aarch/linux Docker image creates and runs a docker container inside the VM.
              2. For x86-64/linux Docker images it uses the same aarch64/linux VM, but then runs the x86-64/linux userspace applications using QEMU - an open-source, mutli-architecture emulator - see ref 1 - docker-for-mac/multi-arch.md.

            Given the above, what I believe is happening with couchbase/docker issue #165 is that the user is trying to run our (x86-64) CB Server docker image under QEMU via Docker. It is QEMU which appears to have issues with some of the instructions emitted by go-1.8.5; but that is ultimately a QEMU/Docker-for-mac bug; nothing to do with CB Server par-se.

            Note that running x86 docker images on Apple Silicon via Docker-for-Mac is only supported on a "best-effort" basis - quoting from the Docker-for-Mac Known Issues:

            Not all images are available for ARM64 architecture. You can add --platform linux/amd64 to run an Intel image under emulation. In particular, the mysql image is not available for ARM64. You can work around this issue by using a mariadb image.

            However, attempts to run Intel-based containers on Apple silicon machines under emulation can crash as qemu sometimes fails to run the container. In addition, filesystem change notification APIs (inotify) do not work under qemu emulation. Even when the containers do run correctly under emulation, they will be slower and use more memory than the native equivalent.

            In summary, running Intel-based containers on Arm-based machines should be regarded as “best effort” only. We recommend running arm64 containers on Apple silicon machines whenever possible, and encouraging container authors to produce arm64, or multi-arch, versions of their containers. We expect this issue to become less common over time, as more and more images are rebuilt supporting multiple architectures.

            To back this up, demonstrating this isn't just a theoretical limitation and Docker are covering themselves, see these other bug reports about the embedded QEMU failing to run various other (x86) Docker images in this deployment - see 2 (postgres), 3 (neo4j), 4 (Jupyter)...

            A choice quote to back this up from the Docker devs:

            Running containers under emulation is documented as "best effort" only. We know that qemu sometimes crashes, but we have no control over that. Even when it works, it is likely to be low performance. The only long-term solution is to use multi-arch images, or images targetted at your native architecture.

            Conclusion

            Ok, so where does this leave us with respect to this bug? Well, as per the above quote there's probably little we can do if our x86-64 Linux CB Server images don't work correctly on Docker-for-Mac-for-Apple-Silicon - it's an issue / limitation with QEMU'e emulation. To the larger issue of how do Apple Silicon users run CB Server Docker images, I believe the answer to that is a native architecture (i.e. aarch64) image; which does not require QEMU (or Rosetta2) and can simply run natively on Apple Silicon machines.

            As such, I propose this issue is resolved as "Not a Bug / Known Error" or similar; referring back to the Docker documentation about this not being a fully supported configuration.

            References:
            [1]: https://github.com/docker/docker.github.io/blob/2d8b420d3c49712ec4a7bcec1464278fa4c41936/docker-for-mac/multi-arch.md
            [2]: https://github.com/docker/for-mac/issues/6016 - Intermittent failures with certain amd64 images when using > 1 CPU (Apple M1)
            [3]: https://github.com/docker/for-mac/issues/6060 - Running amd64 neo4j container gets randomly stuck on M1
            [4]: https://github.com/docker/for-mac/issues/6097 - M1 Mac amd64 Jupyter kernel errors in Docker for Mac >= 4.1.1

            drigby Dave Rigby added a comment - - edited I think there's a bit of confusion here (understandable, given the complex state of the Apple Silicon / Docker ecosystem). Let me try to clarify / confirm a few things to hopefully aid us in making forward progress. Rosetta2 only supports macOS x86-64 userspace applications. As such, as soon as one is talking about x86 Linux Docker images then Rosetta2 is out of the picture - it has no role / ability to translate Linux applications. Docker-for-Mac for Apple Silicon has a somewhat complex software stack: It creates a hidden aarch64/linux Virtual Machine (using the built-in macOS Hyperkit framework), and then: For aarch/linux Docker image creates and runs a docker container inside the VM. For x86-64/linux Docker images it uses the same aarch64/linux VM, but then runs the x86-64/linux userspace applications using QEMU - an open-source, mutli-architecture emulator - see ref 1 - docker-for-mac/multi-arch.md . Given the above, what I believe is happening with couchbase/docker issue #165 is that the user is trying to run our (x86-64) CB Server docker image under QEMU via Docker. It is QEMU which appears to have issues with some of the instructions emitted by go-1.8.5; but that is ultimately a QEMU/Docker-for-mac bug; nothing to do with CB Server par-se. Note that running x86 docker images on Apple Silicon via Docker-for-Mac is only supported on a "best-effort" basis - quoting from the Docker-for-Mac Known Issues : Not all images are available for ARM64 architecture. You can add --platform linux/amd64 to run an Intel image under emulation. In particular, the mysql image is not available for ARM64. You can work around this issue by using a mariadb image. However, attempts to run Intel-based containers on Apple silicon machines under emulation can crash as qemu sometimes fails to run the container. In addition, filesystem change notification APIs (inotify) do not work under qemu emulation. Even when the containers do run correctly under emulation, they will be slower and use more memory than the native equivalent. In summary, running Intel-based containers on Arm-based machines should be regarded as “best effort” only. We recommend running arm64 containers on Apple silicon machines whenever possible, and encouraging container authors to produce arm64, or multi-arch, versions of their containers. We expect this issue to become less common over time, as more and more images are rebuilt supporting multiple architectures. To back this up, demonstrating this isn't just a theoretical limitation and Docker are covering themselves, see these other bug reports about the embedded QEMU failing to run various other (x86) Docker images in this deployment - see 2 (postgres), 3 (neo4j), 4 (Jupyter)... A choice quote to back this up from the Docker devs: Running containers under emulation is documented as "best effort" only. We know that qemu sometimes crashes, but we have no control over that. Even when it works, it is likely to be low performance. The only long-term solution is to use multi-arch images, or images targetted at your native architecture. Conclusion Ok, so where does this leave us with respect to this bug? Well, as per the above quote there's probably little we can do if our x86-64 Linux CB Server images don't work correctly on Docker-for-Mac-for-Apple-Silicon - it's an issue / limitation with QEMU'e emulation. To the larger issue of how do Apple Silicon users run CB Server Docker images, I believe the answer to that is a native architecture (i.e. aarch64) image; which does not require QEMU (or Rosetta2) and can simply run natively on Apple Silicon machines. As such, I propose this issue is resolved as "Not a Bug / Known Error" or similar; referring back to the Docker documentation about this not being a fully supported configuration. References : [1] : https://github.com/docker/docker.github.io/blob/2d8b420d3c49712ec4a7bcec1464278fa4c41936/docker-for-mac/multi-arch.md [2] : https://github.com/docker/for-mac/issues/6016 - Intermittent failures with certain amd64 images when using > 1 CPU (Apple M1) [3] : https://github.com/docker/for-mac/issues/6060 - Running amd64 neo4j container gets randomly stuck on M1 [4] : https://github.com/docker/for-mac/issues/6097 - M1 Mac amd64 Jupyter kernel errors in Docker for Mac >= 4.1.1

            Dave Rigby Thank you for the comprehensive explanation. As for your conclusion:

            I propose this issue is resolved as "Not a Bug / Known Error" or similar; referring back to the Docker documentation about this not being a fully supported configuration.

            I agree, if it is true that the Linux aarch64 ns_server binaries (godu, etc) can run in a Docker image on an M1 Mac. Again, my understanding was that this is NOT true; however I'm having trouble finding the source of that information, so it's quite possible I crossed some wires as well. Based on Ming's last comment, it sounds like this may work after all.

            I'm assigning to myself to get to a final answer.

            ceej Chris Hillery added a comment - Dave Rigby Thank you for the comprehensive explanation. As for your conclusion: I propose this issue is resolved as "Not a Bug / Known Error" or similar; referring back to the Docker documentation about this not being a fully supported configuration. I agree, if it is true that the Linux aarch64 ns_server binaries (godu, etc) can run in a Docker image on an M1 Mac. Again, my understanding was that this is NOT true; however I'm having trouble finding the source of that information, so it's quite possible I crossed some wires as well. Based on Ming's last comment, it sounds like this may work after all. I'm assigning to myself to get to a final answer.

            Remove NS_SERVER as it is not clear to me if we need to do something on our end. Please re-add and clarify ask if needed.

            meni.hillel Meni Hillel (Inactive) added a comment - Remove NS_SERVER as it is not clear to me if we need to do something on our end. Please re-add and clarify ask if needed.

            As confirmed by Ian McCloy and others, the newly-created Linux aarch64 Docker image does work OK on M1 Macs. So my initial information here was incorrect, and there is no need to fix anything.

            ceej Chris Hillery added a comment - As confirmed by Ian McCloy and others, the newly-created Linux aarch64 Docker image does work OK on M1 Macs. So my initial information here was incorrect, and there is no need to fix anything.
            drigby Dave Rigby added a comment -

            The current bug title doesn’t match the rationale for closing - the x86_64 Linux images still have problems. Either this needs the title changing to refer to aarch64/Linux Docker images, or it needs re-opening.

            drigby Dave Rigby added a comment - The current bug title doesn’t match the rationale for closing - the x86_64 Linux images still have problems. Either this needs the title changing to refer to aarch64/Linux Docker images, or it needs re-opening.

            Dave Rigby You're not wrong - I'll re-close it as "Will Not Fix". The initial bug was legit, but the solution is "use the aarch64 image (once it's available)".

            ceej Chris Hillery added a comment - Dave Rigby You're not wrong - I'll re-close it as "Will Not Fix". The initial bug was legit, but the solution is "use the aarch64 image (once it's available)".

            People

              ceej Chris Hillery
              ceej Chris Hillery
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty