Need to decide how to resolve this issue. The bug is in upstream TCMalloc, but it's in some pretty hairy code which runtime-patches the Windows CRT. It appears some change in Windows 10 Anniversary update has triggered this.
- Note the runtime-patching is essentially an unsupported API as far as Microsoft is concerned, so there's no guarantee of stability across different releases.)
- Note (2) that we no longer use TCMalloc on the other supported platforms (Linux and OS X), instead we use je_malloc. The only reason we still use TCMalloc on Windows is that it can automatically replace malloc/free, which je_malloc cannot do.
There's essentially two possible ways to resolve this issue:
- Fix TCMalloc, either by upstream fixing it or we try to fix it. Given the fact it is using unsupported, unstable & undocumented APIs this is potentially quite difficult.
- Update our code to explicitly call into our own malloc hooks (e.g. cb_malloc). This is conceptually straightforward - we need to audit all our code in memcached and replace any C-style memory allocation calls (malloc / realloc / free ...) with either C++ calls (which can be hooked on all platforms) or call to our own functions. The main challenge here is just going through and updating all of our code - or at least all the code which we want to ensure usage is tracked.
I'm somewhat hesitant relying on (1) - this is pretty hairy code, and there's no guarantee even if we do get a fix from upstream that some subsequent release of Windows doesn't break things again. While (2) does require fixes on our part, once done it means we are "masters of our own destiny", and are using supported APIs. Additionally it would allow us to restore parity between Windows and Linux - we would be able to enable the Defragmenter on Windows.