Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45199

Node init fails on debian9 and ubuntu18

    XMLWordPrintable

Details

    • Untriaged
    • Ubuntu 64-bit
    • 1
    • Unknown
    • Build Team 2021 Sprint 6, Build Team 2021 Sprint 7

    Description

      Node init fails on some debian9 and ubuntu18 VMs with this error:

      root@deb91-qe:/tmp# /opt/couchbase/bin/couchbase-cli node-init -c localhost -u Administrator -p password;
      /opt/couchbase/bin/couchbase-cli: 4: cd: can't cd to /opt/couchbase/bin/../lib/python
      Installing Python 3 - one moment...
      sh: 0: Can't open /opt/couchbase/bin/../lib/python/cbpy-installer.sh
      /opt/couchbase/bin/couchbase-cli: 25: exec: /root/Library/Python/couchbase-py/7.0.0-py4/bin/python3: not found
      root@deb91-qe:/tmp# /opt/couchbase/bin/couchbase-cli node-init -c localhost -u Administrator -p password;
      ERROR: Unable to connect to host at http://localhost:8091: HTTPConnectionPool(host='localhost', port=8091): Max retries exceeded with url: /nodeInit (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa5621cd828>: Failed to establish a new connection: [Errno 111] Connection refused'
      

      These are the install steps:

      dpkg --purge couchbase-server; kill -9 `ps -ef |egrep couchbase|cut -f3 -d' '`; 
      rm /var/lib/dpkg/info/couchbase-server.*; 
      rm -rf /opt/couchbase/;
      apt-get update;
      cd /tmp;
      rm -rf *upgrade-from;
      dpkg -i couchbase-server-enterprise_7.0.0-4756-debian9_amd64.deb
      sed -i 's/export PATH/export PATH\nexport CBFT_ENV_OPTIONS=bleveMaxResultWindow=10000000/' /opt/couchbase/bin/couchbase-server; grep bleveMaxResultWindow=10000000 /opt/couchbase/bin/couchbase-server > /dev/null && echo 1 || echo 0;
      /opt/couchbase/bin/couchbase-cli node-init -c localhost -u Administrator -p password;
      

      debian9 VM: 172.23.96.143
      ubuntu16 VM: 172.23.120.233

      Attachments

        1. 172.23.96.187.zip
          97.67 MB
        2. 172.23.96.187-1.zip
          79.15 MB
        3. 172.23.96.192.zip
          55.06 MB
        4. chronicle.zip
          1 kB
        5. chronicle1.zip
          2 kB
        6. chronicle-1.zip
          10 kB
        7. test.log
          8 kB
        8. test-1.log
          10 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Abhijeeth.Nuthan Abhijeeth Nuthan added a comment - - edited

             It's also not 100% clear to me that this is what is causing the crash here

            Neither am I,  . Per the logs, the problems are caused post uninstall. Just needed to know what has been changed from an informational standpoint. 

            As requested earlier, need a proper timeline as to what is essentially happening to the config directories through this test and juxtapose it against the server logs to understand what is happening and where the problem lies. 

            Abhijeeth.Nuthan Abhijeeth Nuthan added a comment - - edited  It's also not 100% clear to me that this is what is causing the crash here Neither am I,  . Per the logs, the problems are caused post uninstall. Just needed to know what has been changed from an informational standpoint.  As requested earlier, need a proper timeline as to what is essentially happening to the config directories through this test and juxtapose it against the server logs to understand what is happening and where the problem lies. 
            dfinlay Dave Finlay added a comment -

            Thanks Pavithra. Well, that's a complex script. My guess is that it contains a bug and we'll likely need to figure out where the bug is. I have to say that I don't know where the bug is but I see some stuff that's suspicious.

            First some background. The command that Pavithra runs to uninstall on Debian is:

            dpkg --purge couchbase-server
            

            For us this command (in addition to other things) runs the prerm, which removes the chonricle logs files. So far so good - so I took at look at the latest logs for .187.

            First thing is the testlog: it shows the following:

            2021-03-24 11:25:22,807 - root - INFO - Done with uninstall on 172.23.96.187.
            

            I.e. the uninstall is supposed to be done by 11:25:22.807. However, in couchbase.log I see this:

             $ fgrep debsave couchbase.log 
            407793 -rw-r--r-- 1 couchbase couchbase  2621 Mar 24 11:26 local.ini.debsave
             791278 -rw-r----- 1 couchbase couchbase       8 Mar 24 11:26 ip.debsave
            921416 -rw-r----- 1 couchbase couchbase 39087 Mar 24 11:26 config.dat.debsave
            921417 -rw-r--r-- 1 couchbase couchbase     0 Mar 24 11:26 dist_cfg.debsave
            

            These "*.debsave" files are created by the same prerm script that removes the chronicle logs

            cp @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/config/config.dat @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/config/config.dat.debsave || true
            cp @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/ip @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/ip.debsave > /dev/null 2>&1 || true
            cp @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/ip_start @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/ip_start.debsave > /dev/null 2>&1 || true
            cp @@PREFIX@@/etc/couchdb/local.ini @@PREFIX@@/etc/couchdb/local.ini.debsave || true
            cp @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/config/dist_cfg @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/config/dist_cfg.debsave || true
            

            You can see the time stamp on the *.debsave files is 11:26, which is after the uninstall is supposed to be finished.

            Secondly, I see that in uninstall there's also this command:

            rm -rf " + DEFAULT_INSTALL_DIR["LINUX_DISTROS"]
            

            Elsewhere we see that:

            DEFAULT_INSTALL_DIR = {"LINUX_DISTROS": "/opt/couchbase",
                                   "MACOS_VERSIONS": "/Applications/Couchbase\ Server.app",
                                   "WINDOWS_SERVER": "/cygdrive/c/Program\ Files/Couchbase/Server"}
            

            It would seem to be the case that /opt/couchbase should be completely wiped after the uninstall, yet we see the *.debsave files still present.

            /opt/couchbase/var/lib/couchbase/config:
            total 164
            921412 drwxr-xr-x 3 couchbase couchbase  4096 Mar 24 12:39 .
            791274 drwxr-xr-x 8 couchbase couchbase  4096 Mar 24 12:39 ..
            921432 -rw-rw---- 1 couchbase couchbase   322 Mar 24 11:26 audit.json
            921103 drwxrwx--- 4 couchbase couchbase  4096 Mar 24 11:27 chronicle
            921136 -rw-rw---- 1 couchbase couchbase 39096 Mar 24 12:39 config.dat
            921416 -rw-r----- 1 couchbase couchbase 39087 Mar 24 11:26 config.dat.debsave
            

            The *.debsave files don't get created on install.

            In summary I don't understand what's going on fully, but it looks like there's a bug in the script. In particular it looks like it's supposed to do a package purge and rm -rf /opt/couchbase. The package removal is happening (the *.debsave files get created) but the removal of /opt/couchbase looks like it's not happening. It also seems to be the case that the script believes the uninstall finishes before it does.

            If my observations are true, it could well explain the issue. The uninstall is thought to finish early so we begin with the install. The install runs and starts up the server but the uninstaller removes the chronicle files after they've been created. Config.dat is unaffected because it's copied and not removed.

            It also explains why others haven't been able to repro your problems. I'm not sure where I'd start, but I think you'll need to dig in on the script.

            dfinlay Dave Finlay added a comment - Thanks Pavithra. Well, that's a complex script. My guess is that it contains a bug and we'll likely need to figure out where the bug is. I have to say that I don't know where the bug is but I see some stuff that's suspicious. First some background. The command that Pavithra runs to uninstall on Debian is: dpkg --purge couchbase-server For us this command (in addition to other things) runs the prerm , which removes the chonricle logs files. So far so good - so I took at look at the latest logs for .187. First thing is the testlog: it shows the following: 2021-03-24 11:25:22,807 - root - INFO - Done with uninstall on 172.23.96.187. I.e. the uninstall is supposed to be done by 11:25:22.807. However, in couchbase.log I see this: $ fgrep debsave couchbase.log 407793 -rw-r--r-- 1 couchbase couchbase 2621 Mar 24 11:26 local.ini.debsave 791278 -rw-r----- 1 couchbase couchbase 8 Mar 24 11:26 ip.debsave 921416 -rw-r----- 1 couchbase couchbase 39087 Mar 24 11:26 config.dat.debsave 921417 -rw-r--r-- 1 couchbase couchbase 0 Mar 24 11:26 dist_cfg.debsave These "*.debsave" files are created by the same prerm script that removes the chronicle logs cp @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/config/config.dat @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/config/config.dat.debsave || true cp @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/ip @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/ip.debsave > /dev/null 2>&1 || true cp @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/ip_start @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/ip_start.debsave > /dev/null 2>&1 || true cp @@PREFIX@@/etc/couchdb/local.ini @@PREFIX@@/etc/couchdb/local.ini.debsave || true cp @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/config/dist_cfg @@PREFIX@@/var/lib/@@PRODUCT_BASE@@/config/dist_cfg.debsave || true You can see the time stamp on the *.debsave files is 11:26, which is after the uninstall is supposed to be finished. Secondly, I see that in uninstall there's also this command: rm -rf " + DEFAULT_INSTALL_DIR["LINUX_DISTROS"] Elsewhere we see that: DEFAULT_INSTALL_DIR = {"LINUX_DISTROS": "/opt/couchbase", "MACOS_VERSIONS": "/Applications/Couchbase\ Server.app", "WINDOWS_SERVER": "/cygdrive/c/Program\ Files/Couchbase/Server"} It would seem to be the case that /opt/couchbase should be completely wiped after the uninstall, yet we see the *.debsave files still present. /opt/couchbase/var/lib/couchbase/config: total 164 921412 drwxr-xr-x 3 couchbase couchbase 4096 Mar 24 12:39 . 791274 drwxr-xr-x 8 couchbase couchbase 4096 Mar 24 12:39 .. 921432 -rw-rw---- 1 couchbase couchbase 322 Mar 24 11:26 audit.json 921103 drwxrwx--- 4 couchbase couchbase 4096 Mar 24 11:27 chronicle 921136 -rw-rw---- 1 couchbase couchbase 39096 Mar 24 12:39 config.dat 921416 -rw-r----- 1 couchbase couchbase 39087 Mar 24 11:26 config.dat.debsave The *.debsave files don't get created on install. In summary I don't understand what's going on fully, but it looks like there's a bug in the script. In particular it looks like it's supposed to do a package purge and rm -rf /opt/couchbase . The package removal is happening (the *.debsave files get created) but the removal of /opt/couchbase looks like it's not happening. It also seems to be the case that the script believes the uninstall finishes before it does. If my observations are true, it could well explain the issue. The uninstall is thought to finish early so we begin with the install. The install runs and starts up the server but the uninstaller removes the chronicle files after they've been created. Config.dat is unaffected because it's copied and not removed. It also explains why others haven't been able to repro your problems. I'm not sure where I'd start, but I think you'll need to dig in on the script.

            In test-1.log you can see the output of the install command is treated as an error so the install command is retried. This results in an upgrade (to the same version) which I presume is why the .debsave files that are created during uninstall are seen. This happens after the /opt/couchbase removal which is why the install fails

            jake.rawsthorne#1 Jake Rawsthorne [X] (Inactive) added a comment - In test-1.log you can see the output of the install command is treated as an error so the install command is retried. This results in an upgrade (to the same version) which I presume is why the .debsave files that are created during uninstall are seen. This happens after the /opt/couchbase removal which is why the install fails
            dfinlay Dave Finlay added a comment -

            Nicely diagnosed, Jake!

            dfinlay Dave Finlay added a comment - Nicely diagnosed, Jake!
            arunkumar Arunkumar Senthilnathan (Inactive) added a comment - With http://review.couchbase.org/c/testrunner/+/150469 , this is resolved - thanks Jake Rawsthorne for the fix!

            People

              jake.rawsthorne#1 Jake Rawsthorne [X] (Inactive)
              pavithra.mahamani Pavithra Mahamani (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty