Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50325

Windows performance runs failed build 7.1.0-2021 when configuring RAM quota

    XMLWordPrintable

Details

    • Triaged
    • 1
    • Yes
    • Build Team 2022 Sprint 1

    Description

      All windows performance runs failed on 7.1.0-2021. The runs were trying to configure RAM quota, and they failed after 20 attempts. 

      2022-01-10T10:41:26 [INFO] Configuring data RAM quota: 20480 MB

      2022-01-10T10:41:31 [WARNING] ["Unexpected server error, request logged."]

      2022-01-10T10:41:31 [WARNING] Retrying http://zeus-srv-01.perf.couchbase.com:8091/pools/default

      ...

      2022-01-10T10:46:16 [WARNING] Retrying http://zeus-srv-01.perf.couchbase.com:8091/pools/default

      2022-01-10T10:46:26 [ERROR] Request http://zeus-srv-01.perf.couchbase.com:8091/pools/default failed after 20 attempts

      Build step 'Execute shell' marked build as failure

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            dfinlay Dave Finlay added a comment -

            Steve - would you mind looking into this one?

            dfinlay Dave Finlay added a comment - Steve - would you mind looking into this one?

            From the ns_server.http_access.log we see the requests that are getting the server error. They are likely timeouts as they're all ~5000 msecs in processing time.

            172.23.100.212 - - [10/Jan/2022:10:41:18 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:41:33 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:41:48 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:42:03 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:42:18 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:42:33 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:42:48 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:43:03 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:43:18 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5015
            172.23.100.212 - - [10/Jan/2022:10:43:33 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:43:48 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5016
            172.23.100.212 - - [10/Jan/2022:10:44:03 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5015
            172.23.100.212 - - [10/Jan/2022:10:44:19 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:44:34 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:44:49 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:45:04 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:45:19 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:45:34 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:45:49 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            172.23.100.212 - - [10/Jan/2022:10:46:04 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000
            

            During this time the ns_server.debug.log shows we're trying to connect to memcached.

            [ns_server:warn,2022-01-10T10:41:16.040-08:00,ns_1@cb.local:memcached_refresh<0.4561.0>:ns_memcached:connect:1221]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}.
            [ns_server:debug,2022-01-10T10:41:16.040-08:00,ns_1@cb.local:memcached_refresh<0.4561.0>:memcached_refresh:handle_info:93]Refresh of [isasl,rbac] failed. Retry in 1000 ms.
            [ns_server:warn,2022-01-10T10:41:16.197-08:00,ns_1@cb.local:<0.8304.0>:ns_memcached:connect:1224]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying.
            [ns_server:warn,2022-01-10T10:41:16.535-08:00,ns_1@cb.local:<0.8313.0>:ns_memcached:connect:1224]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying.
            [user:info,2022-01-10T10:41:16.893-08:00,ns_1@cb.local:<0.4680.0>:ns_log:consume_log:76]Service 'memcached' exited with status -1073741515. Restarting. Messages:
             
            [ns_server:info,2022-01-10T10:41:16.893-08:00,ns_1@cb.local:ns_log<0.4673.0>:ns_log:is_duplicate_log:156]suppressing duplicate log ns_log:0([<<"Service 'memcached' exited with status -1073741515. Restarting. Messages:\n">>]) because it's been seen 4 times in the past 36.118 secs (last seen 9.046 secs ago
            [ns_server:debug,2022-01-10T10:41:16.894-08:00,ns_1@cb.local:<0.8310.0>:remote_monitors:monitor_loop:122]Monitored remote process <15888.406.0> went down with: {abnormal,-1073741515}
            [ns_server:warn,2022-01-10T10:41:17.044-08:00,ns_1@cb.local:memcached_refresh<0.4561.0>:ns_memcached:connect:1221]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}.
            [ns_server:debug,2022-01-10T10:41:17.044-08:00,ns_1@cb.local:memcached_refresh<0.4561.0>:memcached_refresh:handle_info:93]Refresh of [isasl,rbac] failed. Retry in 1000 ms.
            [ns_server:warn,2022-01-10T10:41:17.990-08:00,ns_1@cb.local:<0.8300.0>:ns_memcached:connect:1224]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying.
            [ns_server:warn,2022-01-10T10:41:17.991-08:00,ns_1@cb.local:<0.8296.0>:ns_memcached:connect:1224]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying.
            [ns_server:warn,2022-01-10T10:41:18.046-08:00,ns_1@cb.local:memcached_refresh<0.4561.0>:ns_memcached:connect:1221]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}.
            [ns_server:debug,2022-01-10T10:41:18.046-08:00,ns_1@cb.local:memcached_refresh<0.4561.0>:memcached_refresh:handle_info:93]Refresh of [isasl,rbac] failed. Retry in 1000 ms.
            [ns_server:warn,2022-01-10T10:41:18.194-08:00,ns_1@cb.local:<0.8304.0>:ns_memcached:connect:1224]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying.
            [ns_server:warn,2022-01-10T10:41:18.540-08:00,ns_1@cb.local:<0.8313.0>:ns_memcached:connect:1224]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying.
            

            There's nothing in the memcached.log

            $ cat memcached.log
            ==============================================================================
            Memcached logs
            cd "C:\Program Files\Couchbase\Server\var\lib\couchbase\logs" && for /f %a IN ('dir memcached.log.* /od /tw /b') do type %a
            ==============================================================================
            File Not Found
            

            Looking at the couchbase.log there's a large number (8602) of faults like this

            EventType=1^M^M
            Logfile=Application^M^M
            Message=Faulting application name: memcached.exe, version: 0.0.0.0, time stamp: 0x61d87cd9^M^M
            Faulting module name: VCRUNTIME140D.dll, version: 6.3.9600.18696, time stamp: 0x59153753^M^M
            Exception code: 0xc0000135^M^M
            Fault offset: 0x00000000000ece60^M^M
            Faulting process id: 0x9c4^M^M
            Faulting application start time: 0x01d806549a97ab42^M^M
            Faulting application path: c:\Program Files\Couchbase\Server\bin\memcached.exe^M^M
            Faulting module path: VCRUNTIME140D.dll^M^M
            Report Id: d84839ce-7247-11ec-80d1-001e678989dc^M^M
            Faulting package full name: ^M^M
            Faulting package-relative application ID: ^M^M
            SourceName=Application Error^M^M
            TimeGenerated=20220110190224.000000-000^M^M
            

            So it appears memcached is faulting each time it starts up. Assigning the ticket to them for triage.

            steve.watanabe Steve Watanabe added a comment - From the ns_server.http_access.log we see the requests that are getting the server error. They are likely timeouts as they're all ~5000 msecs in processing time. 172.23.100.212 - - [10/Jan/2022:10:41:18 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:41:33 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:41:48 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:42:03 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:42:18 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:42:33 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:42:48 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:43:03 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:43:18 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5015 172.23.100.212 - - [10/Jan/2022:10:43:33 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:43:48 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5016 172.23.100.212 - - [10/Jan/2022:10:44:03 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5015 172.23.100.212 - - [10/Jan/2022:10:44:19 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:44:34 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:44:49 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:45:04 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:45:19 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:45:34 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:45:49 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 172.23.100.212 - - [10/Jan/2022:10:46:04 -0800] "POST /pools/default HTTP/1.1" 500 44 - "python-requests/2.24.0" 5000 During this time the ns_server.debug.log shows we're trying to connect to memcached. [ns_server:warn,2022-01-10T10:41:16.040-08:00,ns_1@cb.local:memcached_refresh<0.4561.0>:ns_memcached:connect:1221]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}. [ns_server:debug,2022-01-10T10:41:16.040-08:00,ns_1@cb.local:memcached_refresh<0.4561.0>:memcached_refresh:handle_info:93]Refresh of [isasl,rbac] failed. Retry in 1000 ms. [ns_server:warn,2022-01-10T10:41:16.197-08:00,ns_1@cb.local:<0.8304.0>:ns_memcached:connect:1224]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying. [ns_server:warn,2022-01-10T10:41:16.535-08:00,ns_1@cb.local:<0.8313.0>:ns_memcached:connect:1224]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying. [user:info,2022-01-10T10:41:16.893-08:00,ns_1@cb.local:<0.4680.0>:ns_log:consume_log:76]Service 'memcached' exited with status -1073741515. Restarting. Messages:   [ns_server:info,2022-01-10T10:41:16.893-08:00,ns_1@cb.local:ns_log<0.4673.0>:ns_log:is_duplicate_log:156]suppressing duplicate log ns_log:0([<<"Service 'memcached' exited with status -1073741515. Restarting. Messages:\n">>]) because it's been seen 4 times in the past 36.118 secs (last seen 9.046 secs ago [ns_server:debug,2022-01-10T10:41:16.894-08:00,ns_1@cb.local:<0.8310.0>:remote_monitors:monitor_loop:122]Monitored remote process <15888.406.0> went down with: {abnormal,-1073741515} [ns_server:warn,2022-01-10T10:41:17.044-08:00,ns_1@cb.local:memcached_refresh<0.4561.0>:ns_memcached:connect:1221]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}. [ns_server:debug,2022-01-10T10:41:17.044-08:00,ns_1@cb.local:memcached_refresh<0.4561.0>:memcached_refresh:handle_info:93]Refresh of [isasl,rbac] failed. Retry in 1000 ms. [ns_server:warn,2022-01-10T10:41:17.990-08:00,ns_1@cb.local:<0.8300.0>:ns_memcached:connect:1224]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying. [ns_server:warn,2022-01-10T10:41:17.991-08:00,ns_1@cb.local:<0.8296.0>:ns_memcached:connect:1224]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying. [ns_server:warn,2022-01-10T10:41:18.046-08:00,ns_1@cb.local:memcached_refresh<0.4561.0>:ns_memcached:connect:1221]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}. [ns_server:debug,2022-01-10T10:41:18.046-08:00,ns_1@cb.local:memcached_refresh<0.4561.0>:memcached_refresh:handle_info:93]Refresh of [isasl,rbac] failed. Retry in 1000 ms. [ns_server:warn,2022-01-10T10:41:18.194-08:00,ns_1@cb.local:<0.8304.0>:ns_memcached:connect:1224]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying. [ns_server:warn,2022-01-10T10:41:18.540-08:00,ns_1@cb.local:<0.8313.0>:ns_memcached:connect:1224]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying. There's nothing in the memcached.log $ cat memcached.log ============================================================================== Memcached logs cd "C:\Program Files\Couchbase\Server\var\lib\couchbase\logs" && for /f %a IN ('dir memcached.log.* /od /tw /b') do type %a ============================================================================== File Not Found Looking at the couchbase.log there's a large number (8602) of faults like this EventType=1^M^M Logfile=Application^M^M Message=Faulting application name: memcached.exe, version: 0.0.0.0, time stamp: 0x61d87cd9^M^M Faulting module name: VCRUNTIME140D.dll, version: 6.3.9600.18696, time stamp: 0x59153753^M^M Exception code: 0xc0000135^M^M Fault offset: 0x00000000000ece60^M^M Faulting process id: 0x9c4^M^M Faulting application start time: 0x01d806549a97ab42^M^M Faulting application path: c:\Program Files\Couchbase\Server\bin\memcached.exe^M^M Faulting module path: VCRUNTIME140D.dll^M^M Report Id: d84839ce-7247-11ec-80d1-001e678989dc^M^M Faulting package full name: ^M^M Faulting package-relative application ID: ^M^M SourceName=Application Error^M^M TimeGenerated=20220110190224.000000-000^M^M So it appears memcached is faulting each time it starts up. Assigning the ticket to them for triage.
            owend Daniel Owen added a comment -

            Looking at the Exception code - 0xc0000135 - Its means "The application failed to initialize properly"

            The faulting module is VCRUNTIME140D.dll - https://ugetfix.com/ask/how-to-fix-vcruntime140_1-dll-missing-or-not-found-error-in-windows/

            VCRUNTIME140_1.dll is a Dynamic Link Library[1] file that is used by several applications to launch specific commands or run a particular code. The file is not a part of the Windows operating system but is generally needed by third-party applications to run smoothly.

            Found this - https://stackoverflow.com/questions/39904402/c-windows-service-and-msvcp140-dll

            0xc0000135 is a simple "DLL not found" error. Since it is MSVCP140.dll that keeled over, very high odds that the machine is missing the Universal CRT. It is deployed by Windows Update.

            Looks like solution is

            install Visual C++ redistributable libraries. - https://docs.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170

            Hi Bo-Chun Wang, Could you check you have VCRUNTIME140D.dll installed? (and if so what version you have)

            owend Daniel Owen added a comment - Looking at the Exception code - 0xc0000135 - Its means "The application failed to initialize properly" The faulting module is VCRUNTIME140D.dll - https://ugetfix.com/ask/how-to-fix-vcruntime140_1-dll-missing-or-not-found-error-in-windows/ VCRUNTIME140_1.dll is a Dynamic Link Library [1] file that is used by several applications to launch specific commands or run a particular code. The file is not a part of the Windows operating system but is generally needed by third-party applications to run smoothly. Found this - https://stackoverflow.com/questions/39904402/c-windows-service-and-msvcp140-dll 0xc0000135 is a simple "DLL not found" error. Since it is MSVCP140.dll that keeled over, very high odds that the machine is missing the Universal CRT. It is deployed by Windows Update. Looks like solution is install Visual C++ redistributable libraries. - https://docs.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170 Hi Bo-Chun Wang , Could you check you have VCRUNTIME140D.dll installed? (and if so what version you have)
            owend Daniel Owen added a comment -

            cc. build team:- Chris Hillery Blair Watt

            owend Daniel Owen added a comment - cc. build team:- Chris Hillery Blair Watt
            owend Daniel Owen added a comment -

            The machine is running

            Microsoft Windows Server 2012 R2 Standard
            OS Version:                6.3.9600 N/A Build 9600
            

            On speaking to Chris Hillery

            That hasn't been supported for Server since... 5.x?
            Wayne mentioned that the perf clusters (possibly specifically the Windows boxes) were updated / recreated recently due to IT initiatives to apply OS updates.
            If they were re-created fresh, then they won't have the Microsoft Universal C Runtime installed since that wasn't included in Windows Server 2012. VCRUNTIME140D.dll almost certainly depends on the UCRT being installed. (FYI the error message in the bug implies that VCRUNTIME140D.dll itself isn't missing, but it's failing to load, probably because one of its dependencies is missing.)

            Reassigning to the issue to performance.

            owend Daniel Owen added a comment - The machine is running Microsoft Windows Server 2012 R2 Standard OS Version: 6.3.9600 N/A Build 9600 On speaking to Chris Hillery That hasn't been supported for Server since... 5.x? Wayne mentioned that the perf clusters (possibly specifically the Windows boxes) were updated / recreated recently due to IT initiatives to apply OS updates. If they were re-created fresh, then they won't have the Microsoft Universal C Runtime installed since that wasn't included in Windows Server 2012. VCRUNTIME140D.dll almost certainly depends on the UCRT being installed. (FYI the error message in the bug implies that VCRUNTIME140D.dll itself isn't missing, but it's failing to load, probably because one of its dependencies is missing.) Reassigning to the issue to performance.
            bo-chun.wang Bo-Chun Wang added a comment - - edited

            I am able to do a run with an old build (7.1.0-1885). Therefore, it's not clear to me why we hit the issue on newer builds. I assume these runs are using the same VCRUNTIME140D.dll. 

            http://perf.jenkins.couchbase.com/job/zeus/8167/ 

             

            I checked the history and found the issue started between 7.1.0-1885 and 7.1.0-1960. Therefore, I think it may not be related to the cluster.

             

            Daniel Owen Chris Hillery 

            Do you have any ideas? I am going to do bisection to find the root cause.

            bo-chun.wang Bo-Chun Wang added a comment - - edited I am able to do a run with an old build (7.1.0-1885). Therefore, it's not clear to me why we hit the issue on newer builds. I assume these runs are using the same VCRUNTIME140D.dll.  http://perf.jenkins.couchbase.com/job/zeus/8167/     I checked the history and found the issue started between 7.1.0-1885 and 7.1.0-1960. Therefore, I think it may not be related to the cluster. Build Job Success 6.6.4-9961 http://perf.jenkins.couchbase.com/job/zeus/8039/ Y 6.6.5-10068 http://perf.jenkins.couchbase.com/job/zeus/8055/ Y 7.0.3-7031 http://perf.jenkins.couchbase.com/job/zeus/8030/ Y 7.1.0-1885 http://perf.jenkins.couchbase.com/job/zeus/7902/ Y 7.1.0-1920 http://perf.jenkins.couchbase.com/job/zeus/8168/ Y 7.1.0-1925 http://perf.jenkins.couchbase.com/job/zeus/8170/ Y 7.1.0-1926 http://perf.jenkins.couchbase.com/job/zeus/8171/ Y 7.1.0-1931 http://perf.jenkins.couchbase.com/job/zeus/8172/ Y 7.1.0-1932 http://perf.jenkins.couchbase.com/job/zeus/8174/ Y 7.1.0-1934 http://perf.jenkins.couchbase.com/job/zeus/8175/ Y 7.1.0-1935 http://perf.jenkins.couchbase.com/job/zeus/8176/ N 7.1.0-1936 http://perf.jenkins.couchbase.com/job/zeus/8173/ N 7.1.0-1940 http://perf.jenkins.couchbase.com/job/zeus/8169/ N 7.1.0-1960 http://perf.jenkins.couchbase.com/job/zeus/8052/ N 7.1.0-1985 http://perf.jenkins.couchbase.com/job/zeus/8109/ N 7.1.0-2021 http://perf.jenkins.couchbase.com/job/zeus/8157/ N   Daniel Owen Chris Hillery   Do you have any ideas? I am going to do bisection to find the root cause.
            owend Daniel Owen added a comment -

            Thanks for performing the bisection Bo-Chun Wang - I don't know why it would suddenly appear in a build (perhaps we are making use of new API call...)
            Will add couchbase-bucket component back-in until we determine the cause of the issue.

            owend Daniel Owen added a comment - Thanks for performing the bisection Bo-Chun Wang - I don't know why it would suddenly appear in a build (perhaps we are making use of new API call...) Will add couchbase-bucket component back-in until we determine the cause of the issue.
            owend Daniel Owen added a comment - I'm wondering if it is a platform change - http://changelog.build.couchbase.com/?product=couchbase-server&fromVersion=7.1.0&fromBuild=1920&toVersion=7.1.0&toBuild=1960&f_asterixdb=off&f_backup=off&f_build=off&f_cbas-core=off&f_cbas-ui=off&f_cbauth=off&f_cbbs=off&f_cbft=off&f_chronicle=off&f_eventing=off&f_eventing-ee=off&f_goxdcr=off&f_indexing=off&f_kv_engine=off&f_magma=off&f_ns_server=off&f_plasma=off&f_platform=on&f_product-metadata=off&f_query=off&f_query-ee=off&f_query-ui=off&f_sigar=off&f_testrunner=off&f_tlm=off&f_cbftx=off&f_cbgt=off&f_couchbase-cli=off&f_couchstore=off&f_gometa=off&f_n1fty=off Therefore might be good to try 1925 and 1926

            Bo-Chun Wang  Did you actually re-run all those performance runs today, or are you just looking at the historical run data?

            ceej Chris Hillery added a comment - Bo-Chun Wang   Did you actually re-run all those performance runs today, or are you just looking at the historical run data?

            I re-ran some of them today, including 7.1.0-1885, 7.1.0-1920, and 7.1.0-1940. Based on the results, the issue starts between 1920 and 1940. I am going to do run with 1925 and 1926 as Daniel suggested.

            bo-chun.wang Bo-Chun Wang added a comment - I re-ran some of them today, including 7.1.0-1885, 7.1.0-1920, and 7.1.0-1940. Based on the results, the issue starts between 1920 and 1940. I am going to do run with 1925 and 1926 as Daniel suggested.
            bo-chun.wang Bo-Chun Wang added a comment -

            I finished the bisection. The root cause is build 7.1.0-1935. There are 4 eventing changes and 1 magma change in 1935. I can't figure out why the changes can cause the issue.

            http://changelog.build.couchbase.com/?product=couchbase-server&fromVersion=7.1.0&fromBuild=1934&toVersion=7.1.0&toBuild=1935&f_eventing=on&f_magma=on&f_product-metadata=on

            Daniel Owen Could you please take a look? 

            bo-chun.wang Bo-Chun Wang added a comment - I finished the bisection. The root cause is build 7.1.0-1935. There are 4 eventing changes and 1 magma change in 1935. I can't figure out why the changes can cause the issue. http://changelog.build.couchbase.com/?product=couchbase-server&fromVersion=7.1.0&fromBuild=1934&toVersion=7.1.0&toBuild=1935&f_eventing=on&f_magma=on&f_product-metadata=on Daniel Owen Could you please take a look? 
            dfinlay Dave Finlay added a comment - - edited

            +Sarath Lakshman who made the magma change. Since the problem is with the KV executable and looks to be a problem associated with a missing DLL and the magma change looks to be adding a new dependency on zstandard into the KV binary, this seems to be almost certainly where the problem is.

            dfinlay Dave Finlay added a comment - - edited + Sarath Lakshman who made the magma change. Since the problem is with the KV executable and looks to be a problem associated with a missing DLL and the magma change looks to be adding a new dependency on zstandard into the KV binary, this seems to be almost certainly where the problem is.

            Argh. Ok, I did think this was a problem last night, but Googling didn't bear me out. But now I've managed to find configuration: vcruntime140d.dll is the debug library. So it seems like we built the zstd-cpp Windows package in debug mode, and hence have a dependency on the debug version of the library, which isn't part of the MSVC redistributable package. We'll need to rebuild it. Cc Blair Watt 

            Bo-Chun Wang Daniel did ask if you could confirm that vcruntime140d.dll was on the VM in question - did you do so? I'm guessing that it's NOT actually there.

            --------

            Regardless of all of that, you really do need to start running the perf tests on a supported platform. For Neo that's Windows Server 2019.

            ceej Chris Hillery added a comment - Argh. Ok, I did think this was a problem last night, but Googling didn't bear me out. But now I've managed to find configuration: vcruntime140 d .dll is the debug library. So it seems like we built the zstd-cpp Windows package in debug mode, and hence have a dependency on the debug version of the library, which isn't part of the MSVC redistributable package. We'll need to rebuild it. Cc Blair Watt   Bo-Chun Wang Daniel did ask if you could confirm that vcruntime140d.dll was on the VM in question - did you do so? I'm guessing that it's NOT actually there. -------- Regardless of all of that, you really do need to start running the perf tests on a supported platform. For Neo that's Windows Server 2019.
            bo-chun.wang Bo-Chun Wang added a comment -

            There is no vcruntime140d.dll on the VM. 

            Administrator@zeus01-s526 /cygdrive/c/Windows/System32

            $ ls vcruntime140

            vcruntime140.dll    vcruntime140_1.dll  

            We use the same cluster for 6.x and 7.x builds. Does Windows Server 2019 support both? The cluster is running Windows Server 2012.

            OS Name:                   Microsoft Windows Server 2012 R2 Standard

            OS Version:                6.3.9600 N/A Build 9600

            bo-chun.wang Bo-Chun Wang added a comment - There is no vcruntime140d.dll on the VM.  Administrator@zeus01-s526 /cygdrive/c/Windows/System32 $ ls vcruntime140 vcruntime140.dll    vcruntime140_1.dll   We use the same cluster for 6.x and 7.x builds. Does Windows Server 2019 support both? The cluster is running Windows Server 2012. OS Name:                   Microsoft Windows Server 2012 R2 Standard OS Version:                6.3.9600 N/A Build 9600

            Ok, I think that confirms that we need to rebuild zstd-cpp, so I'm moving this over to Build.

            As for 6.x/7.x platform support, 7.1.0 is only officially supported on Windows Server 2019. 6.6.0 and later are also supported Windows Server 2019. I'm not sure about 6.5.0 - the Platform Support Matrix says Windows Server 2019 is "New - To Be Supported" for 6.5.0, which to me doesn't parse cleanly either way. If you still need to run perf on 6.0.x, though, Windows Server 2019 isn't supported, so ideally at least that should be a different cluster.

            ceej Chris Hillery added a comment - Ok, I think that confirms that we need to rebuild zstd-cpp, so I'm moving this over to Build. As for 6.x/7.x platform support, 7.1.0 is only officially supported on Windows Server 2019. 6.6.0 and later are also supported Windows Server 2019. I'm not sure about 6.5.0 - the Platform Support Matrix says Windows Server 2019 is "New - To Be Supported" for 6.5.0, which to me doesn't parse cleanly either way. If you still need to run perf on 6.0.x, though, Windows Server 2019 isn't supported, so ideally at least that should be a different cluster.

            Build cbdeps::zstd-cpp-1.5.0-2 contains build-tools commit c2df20b with commit message:
            MB-50325: Produce release builds of zstd-cpp

            build-team Couchbase Build Team added a comment - Build cbdeps::zstd-cpp-1.5.0-2 contains build-tools commit c2df20b with commit message: MB-50325 : Produce release builds of zstd-cpp

            Build couchbase-server-7.1.0-2055 contains tlm commit 8ec6b8a with commit message:
            MB-50325: Zstd-cpp build 2 (non-debug/release build)

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2055 contains tlm commit 8ec6b8a with commit message: MB-50325 : Zstd-cpp build 2 (non-debug/release build)
            blair.watt Blair Watt added a comment - - edited

            Thanks for the heads-up on this, build 2055 contains the updated package which imports the correct vcruntime dll:

             

            C:\>"c:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\dumpbin.exe" /imports c:\temp\zstd-check\1.5.0-1\bin\zstd.dll | findstr /C:VCRUNTIME
            VCRUNTIME140D.dll
            C:\>"c:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\dumpbin.exe" /imports c:\temp\zstd-check\1.5.0-2\bin\zstd.dll | findstr /C:VCRUNTIME
            VCRUNTIME140.dll
            

             

            blair.watt Blair Watt added a comment - - edited Thanks for the heads-up on this, build 2055 contains the updated package which imports the correct vcruntime dll:   C:\>"c:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\dumpbin.exe" /imports c:\temp\zstd-check\1.5.0-1\bin\zstd.dll | findstr /C:VCRUNTIME VCRUNTIME140D.dll C:\>"c:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\dumpbin.exe" /imports c:\temp\zstd-check\1.5.0-2\bin\zstd.dll | findstr /C:VCRUNTIME VCRUNTIME140.dll  

            Build cbdeps::erlang-neo-5 contains build-tools commit c2df20b with commit message:
            MB-50325: Produce release builds of zstd-cpp

            build-team Couchbase Build Team added a comment - Build cbdeps::erlang-neo-5 contains build-tools commit c2df20b with commit message: MB-50325 : Produce release builds of zstd-cpp
            bo-chun.wang Bo-Chun Wang added a comment -

            I have a good run with build 7.1.0-2055.

            http://perf.jenkins.couchbase.com/job/zeus/8177/ 

            bo-chun.wang Bo-Chun Wang added a comment - I have a good run with build 7.1.0-2055. http://perf.jenkins.couchbase.com/job/zeus/8177/  
            blair.watt Blair Watt added a comment -

            Excellent, thank you for confirming. 

            blair.watt Blair Watt added a comment - Excellent, thank you for confirming. 

            Build cbdeps::curl-7.78.0-6 contains build-tools commit c2df20b with commit message:
            MB-50325: Produce release builds of zstd-cpp

            build-team Couchbase Build Team added a comment - Build cbdeps::curl-7.78.0-6 contains build-tools commit c2df20b with commit message: MB-50325 : Produce release builds of zstd-cpp

            Build cbdeps::prometheus-2.23-4 contains build-tools commit c2df20b with commit message:
            MB-50325: Produce release builds of zstd-cpp

            build-team Couchbase Build Team added a comment - Build cbdeps::prometheus-2.23-4 contains build-tools commit c2df20b with commit message: MB-50325 : Produce release builds of zstd-cpp

            Build couchbase-lite-core-3.1.0-205 contains tlm commit 8ec6b8a with commit message:
            MB-50325: Zstd-cpp build 2 (non-debug/release build)

            build-team Couchbase Build Team added a comment - Build couchbase-lite-core-3.1.0-205 contains tlm commit 8ec6b8a with commit message: MB-50325 : Zstd-cpp build 2 (non-debug/release build)

            People

              bo-chun.wang Bo-Chun Wang
              bo-chun.wang Bo-Chun Wang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h

                  Gerrit Reviews

                    There are no open Gerrit changes

                    PagerDuty