Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49539

[Windows] To investigate the crash in MB-49471

    XMLWordPrintable

Details

    • 1
    • Yes
    • CX Sprint 270

    Description

      In MB-49471, we see a crash /node/controller/reloadCertificate request is made. 
      It looks something like this in error.log

      [ns_server:error,2021-11-10T04:12:57.419-08:00,ns_1@172.23.136.106:<0.17653.0>:menelaus_util:reply_server_error:210]Server error during processing: ["web request failed",                                 {path,"/node/controller/reloadCertificate"},                                 {method,'POST'},                                 {type,exit},                                 {what,                                  {{{badmatch,{error,eacces}},                                    [{ns_ssl_services_setup,                                      save_node_certs_phase2,0,                                      [{file,"src/ns_ssl_services_setup.erl"},                                       {line,740}]},                                     {ns_ssl_services_setup,save_node_certs,

      Basically ns-server seems to be crashing with "eaces" when certs are getting written to 'certs" folder in config. We need to investigate the cause for this. Essentially otherwise uploading x509 certs on windows in 7.1 will get blocked.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Copying Timofey Barmin's comment from MB-49471 for context here:
            "It crashes when it tries to save file chain.pem in config/certs dir (it first saves the file to chain.pem.tmp and then renames it to chain.pem).
            The error 'eacces' means "Missing read or write permissions for the parent directories of Source or Destination. On some platforms, this error is given if either Source or Destination is open."

            That's a bit weird because right before writing cert we write another file in the same dir, and that works well. The only reason I can think of is if the destination file (chain.pem) is open at the moment when we are trying to rename chain.pem.tmp to chain.pem"

            sumedh.basarkod Sumedh Basarkod (Inactive) added a comment - Copying Timofey Barmin 's comment from MB-49471 for context here: "It crashes when it tries to save file chain.pem in config/certs dir (it first saves the file to chain.pem.tmp and then renames it to chain.pem). The error 'eacces' means "Missing read or write permissions for the parent directories of Source or Destination. On some platforms, this error is given if either Source or Destination is open." That's a bit weird because right before writing cert we write another file in the same dir, and that works well. The only reason I can think of is if the destination file (chain.pem) is open at the moment when we are trying to rename chain.pem.tmp to chain.pem"

            So I managed to reproduce the issue manually as well (without any code) on the latest build(7.1.0-1707), and the issue appears to happen only when cbas service is running on the node.

            The simplest way to reproduce it is:
            Steps
            1. Create a 2 node cluster on windows
            172.23.136.113 = kv
            172.23.136.115 = cbas
            2. Create and upload a root certificate. And let the root certificate sign an intermediate certificate which in-turn will sign the nodes' certificates. 
            3. Upload .113's certificate - works fine
            4. Upload .115's certificate - fails because of the crash

            (Note that if 115 node had any other service(s) in place of cbas, it will work fine ie; when I tried it with other services it worked; seems to be failing only when cbas is running on it)

            As we can see the crash in error.log on .115

            [ns_server:error,2021-11-14T07:57:25.870-08:00,ns_1@172.23.136.115:<0.30049.0>:menelaus_util:reply_server_error:210]Server error during processing: ["web request failed",
                                             {path,"/node/controller/reloadCertificate"},
                                             {method,'POST'},
                                             {type,exit},
                                             {what,
                                              {{{badmatch,{error,eacces}},
                                                [{ns_ssl_services_setup,
                                                  save_node_certs_phase2,0,
                                                  [{file,"src/ns_ssl_services_setup.erl"},
                                                   {line,742}]},
                                                 {ns_ssl_services_setup,save_node_certs,
                                                  6,
                                                  [{file,"src/ns_ssl_services_setup.erl"},
                                                   {line,733}]},
                                                 {ns_ssl_services_setup,handle_call,3,
                                                  [{file,"src/ns_ssl_services_setup.erl"},
                                                   {line,481}]},
                                                 {gen_server,try_handle_call,4,
                                                  [{file,"gen_server.erl"},{line,721}]},
                                                 {gen_server,handle_msg,6,
                                                  [{file,"gen_server.erl"},{line,750}]},
                                                 {proc_lib,init_p_do_apply,3,
                                                  [{file,"proc_lib.erl"},{line,226}]}]},
                                               {gen_server,call,
                                                [ns_ssl_services_setup,
                                                 {set_node_certificate_chain,

            and also checking the certs folder's state after the crash

            Administrator@WIN-1T98IIFH727 /cygdrive/c/Program Files/Couchbase/Server/var/lib/couchbase/config/certs
            $ ls
            ca.pem  certs.info  certs.tmp  chain.pem  pkey.pem

            Logs:
            (I had to restart server on .115 in order to collect the logs as the node wasn't responding)
            http://supportal.couchbase.com/snapshot/554f4421707ba0fff7276e911a2025e5::0
            s3://cb-customers-secure/mb-49539/2021-11-14/collectinfo-2021-11-14t160027-ns_1@172.23.136.113.zip
            s3://cb-customers-secure/mb-49539/2021-11-14/collectinfo-2021-11-14t160027-ns_1@172.23.136.115.zip

            Timofey Barmin  I think it would be best if analytics team took a look at it.

             

            sumedh.basarkod Sumedh Basarkod (Inactive) added a comment - - edited So I managed to reproduce the issue manually as well (without any code) on the latest build(7.1.0-1707), and the issue appears to happen only when cbas service is running on the node. The simplest way to reproduce it is: Steps 1. Create a 2 node cluster on windows 172.23.136.113 = kv 172.23.136.115 = cbas 2. Create and upload a root certificate. And let the root certificate sign an intermediate certificate which in-turn will sign the nodes' certificates.  3. Upload .113's certificate - works fine 4. Upload .115's certificate - fails because of the crash (Note that if 115 node had any other service(s) in place of cbas, it will work fine ie; when I tried it with other services it worked; seems to be failing only when cbas is running on it) As we can see the crash in error.log on .115 [ns_server:error,2021-11-14T07:57:25.870-08:00,ns_1@172.23.136.115:<0.30049.0>:menelaus_util:reply_server_error:210]Server error during processing: ["web request failed", {path,"/node/controller/reloadCertificate"}, {method,'POST'}, {type,exit}, {what, {{{badmatch,{error,eacces}}, [{ns_ssl_services_setup, save_node_certs_phase2,0, [{file,"src/ns_ssl_services_setup.erl"}, {line,742}]}, {ns_ssl_services_setup,save_node_certs, 6, [{file,"src/ns_ssl_services_setup.erl"}, {line,733}]}, {ns_ssl_services_setup,handle_call,3, [{file,"src/ns_ssl_services_setup.erl"}, {line,481}]}, {gen_server,try_handle_call,4, [{file,"gen_server.erl"},{line,721}]}, {gen_server,handle_msg,6, [{file,"gen_server.erl"},{line,750}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,226}]}]}, {gen_server,call, [ns_ssl_services_setup, {set_node_certificate_chain, and also checking the certs folder's state after the crash Administrator@WIN-1T98IIFH727 /cygdrive/c/Program Files/Couchbase/Server/var/lib/couchbase/config/certs $ ls ca.pem certs.info certs.tmp chain.pem pkey.pem Logs: (I had to restart server on .115 in order to collect the logs as the node wasn't responding) http://supportal.couchbase.com/snapshot/554f4421707ba0fff7276e911a2025e5::0 s3://cb-customers-secure/mb-49539/2021-11-14/collectinfo-2021-11-14t160027-ns_1@172.23.136.113.zip s3:// cb-customers-secure/mb-49539/2021-11-14/collectinfo-2021-11-14t160027-ns_1@172.23.136.115.zip Timofey Barmin   I think it would be best if analytics team took a look at it.  

            My best guess is that cbas opens the chain file and doesn't close it, so when ns_server tries to replace the file with another chain file we get the eacces error.
            Would be interesting to hear cbas team's opinion.

            timofey.barmin Timofey Barmin added a comment - My best guess is that cbas opens the chain file and doesn't close it, so when ns_server tries to replace the file with another chain file we get the eacces error. Would be interesting to hear cbas team's opinion.

            Build couchbase-server-7.1.0-1719 contains cbas commit c4a6c9e with commit message:
            MB-49539: ensure certificate & trust files are closed

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1719 contains cbas commit c4a6c9e with commit message: MB-49539 : ensure certificate & trust files are closed

            Build couchbase-server-7.1.0-1719 contains cbas-core commit 84d2fe9 with commit message:
            MB-49539: ensure certificate & trust files are closed

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1719 contains cbas-core commit 84d2fe9 with commit message: MB-49539 : ensure certificate & trust files are closed
            michael.blow Michael Blow added a comment -

            Ensured certificate & CA files are closed as soon as processed.

            michael.blow Michael Blow added a comment - Ensured certificate & CA files are closed as soon as processed.

            Verified on 7.1.0 build 1720. Closing.

            sumedh.basarkod Sumedh Basarkod (Inactive) added a comment - Verified on 7.1.0 build 1720. Closing.

            This issue prevents the changing of certificates.

            till Till Westmann added a comment - This issue prevents the changing of certificates.

            People

              sumedh.basarkod Sumedh Basarkod (Inactive)
              sumedh.basarkod Sumedh Basarkod (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty