Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-20519

memcached of watson build crashes constantly when update windows 10 with Anniversary update (1607)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 4.5.0
    • Fix Version/s: 4.6.0
    • Component/s: couchbase-bucket
    • Labels:
    • Environment:
      windows 10 64-bit
    • Triage:
      Untriaged
    • Operating System:
      Windows 64-bit
    • Is this a Regression?:
      Unknown

      Description

      Windows 10 64-bit with windows build 151x
      Install Couchbase Server 4.5.0 GA. Couchbase Server works as expected.
      I am able to create default bucket and travel-sample bucket.
      Both buckets are up and work as expected.
      Uninstall Couchbase Server.
      Update Windows 10 to Anniversary build (1607)
      Install Couchbase Server 4.5.0 GA again.
      Create travel-sample bucket. Failed to create bucket.
      Check windows services. See memcached constantly crashed

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            drigby Dave Rigby added a comment -

            Need to decide how to resolve this issue. The bug is in upstream TCMalloc, but it's in some pretty hairy code which runtime-patches the Windows CRT. It appears some change in Windows 10 Anniversary update has triggered this.

            • Note the runtime-patching is essentially an unsupported API as far as Microsoft is concerned, so there's no guarantee of stability across different releases.)
            • Note (2) that we no longer use TCMalloc on the other supported platforms (Linux and OS X), instead we use je_malloc. The only reason we still use TCMalloc on Windows is that it can automatically replace malloc/free, which je_malloc cannot do.

            There's essentially two possible ways to resolve this issue:

            1. Fix TCMalloc, either by upstream fixing it or we try to fix it. Given the fact it is using unsupported, unstable & undocumented APIs this is potentially quite difficult.
            2. Update our code to explicitly call into our own malloc hooks (e.g. cb_malloc). This is conceptually straightforward - we need to audit all our code in memcached and replace any C-style memory allocation calls (malloc / realloc / free ...) with either C++ calls (which can be hooked on all platforms) or call to our own functions. The main challenge here is just going through and updating all of our code - or at least all the code which we want to ensure usage is tracked.

            I'm somewhat hesitant relying on (1) - this is pretty hairy code, and there's no guarantee even if we do get a fix from upstream that some subsequent release of Windows doesn't break things again. While (2) does require fixes on our part, once done it means we are "masters of our own destiny", and are using supported APIs. Additionally it would allow us to restore parity between Windows and Linux - we would be able to enable the Defragmenter on Windows.

            Show
            drigby Dave Rigby added a comment - Need to decide how to resolve this issue. The bug is in upstream TCMalloc, but it's in some pretty hairy code which runtime-patches the Windows CRT. It appears some change in Windows 10 Anniversary update has triggered this. Note the runtime-patching is essentially an unsupported API as far as Microsoft is concerned, so there's no guarantee of stability across different releases.) Note (2) that we no longer use TCMalloc on the other supported platforms (Linux and OS X), instead we use je_malloc. The only reason we still use TCMalloc on Windows is that it can automatically replace malloc/free, which je_malloc cannot do. There's essentially two possible ways to resolve this issue: Fix TCMalloc, either by upstream fixing it or we try to fix it. Given the fact it is using unsupported, unstable & undocumented APIs this is potentially quite difficult. Update our code to explicitly call into our own malloc hooks (e.g. cb_malloc ). This is conceptually straightforward - we need to audit all our code in memcached and replace any C-style memory allocation calls ( malloc / realloc / free ...) with either C++ calls (which can be hooked on all platforms) or call to our own functions. The main challenge here is just going through and updating all of our code - or at least all the code which we want to ensure usage is tracked. I'm somewhat hesitant relying on (1) - this is pretty hairy code, and there's no guarantee even if we do get a fix from upstream that some subsequent release of Windows doesn't break things again. While (2) does require fixes on our part, once done it means we are "masters of our own destiny", and are using supported APIs. Additionally it would allow us to restore parity between Windows and Linux - we would be able to enable the Defragmenter on Windows.
            Hide
            drigby Dave Rigby added a comment -

            BTW, it's highly likely that MB-19615 (memcached crashing on Windows Server 2016 Tech Preview 5) is a duplicate of this.

            Show
            drigby Dave Rigby added a comment - BTW, it's highly likely that MB-19615 (memcached crashing on Windows Server 2016 Tech Preview 5) is a duplicate of this.
            Hide
            ritam.sharma Ritam Sharma added a comment -

            Thuan Nguyen - What are the next steps on the ticket, there are few failures on the job.

            Show
            ritam.sharma Ritam Sharma added a comment - Thuan Nguyen - What are the next steps on the ticket, there are few failures on the job.
            Hide
            thuan Thuan Nguyen added a comment -

            Tested on build 4.5.1-2845. I could not reproduce this issue.

            Show
            thuan Thuan Nguyen added a comment - Tested on build 4.5.1-2845. I could not reproduce this issue.
            Hide
            ceej Chris Hillery added a comment -

            Note for anyone interested: 4.5.1 CE is being released, and we've decided to release 4.5.1-2845 on Windows as "4.5.1 CE" since presumably Windows 10 Anniversary and Windows Server 2016 are much more prevalent now.

            Show
            ceej Chris Hillery added a comment - Note for anyone interested: 4.5.1 CE is being released, and we've decided to release 4.5.1-2845 on Windows as "4.5.1 CE" since presumably Windows 10 Anniversary and Windows Server 2016 are much more prevalent now.

              People

              • Assignee:
                thuan Thuan Nguyen
                Reporter:
                thuan Thuan Nguyen
              • Votes:
                1 Vote for this issue
                Watchers:
                17 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes