Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
7.0.0, 7.0.1
-
Ubuntu 18.04
-
Untriaged
-
Ubuntu 64-bit
-
1
-
Yes
-
Build Team 2021 Sprint 20
Description
Problem
When an offline upgrade is performed on a debian or ubuntu package install of 7.0.0 or 7.0.1, the chronicle directory gets erased due to an issue in the installer. This regression was introduced due to the fix made for MB-44229. This issue was originally reported by a forum user (https://forums.couchbase.com/t/couchbase-upgrade-issues-from-7-0-0-to-7-0-1/31867/2).
Steps to reproduce
1. Create a 2 node Ubuntu 18.04 cluster on 7.0.0 or 7.0.1
2. Create the travel-sample bucket
3. Offline upgrade cluster to 7.0.1 (steps can be found here: https://docs.couchbase.com/server/current/install/upgrade-cluster-offline.html)
What happens:
Upgrade is marked as successful in the CLI but the admin console does not come up
What is expected to happen:
Upgrade goes through fine and we are able to access the admin console. All the data remains intact.
Logs:
This issue was originally reproduced on an AWS instance. Logs to which have been attached to the ticket.
Appendix
QE has validated that this issue is not observed in all other supported platforms (Centos 7/8, Rhel 7/8, Suse 12/15, Oel 7/8, Amazon Linux 2 and Windows). It is only observed in deb installs in QE testing (Debian 10/11, Ubuntu 18/20).
Attachments
- node_down_upgrade.zip
- 54.26 MB
Issue Links
Activity
I am gonna need significantly more information I think.. It sounds like you did not reproduce this issue, correct? We need to at least collect all the logs to diagnose this.. especially if we can't reproduce it.Â
I did reproduce it and give live cluster to debug this issue. The down node (ec2-52-38-173-48.us-west-2.compute.amazonaws.com) is the one upgrade successful to 7.0.1 in command line but failed to start UI
root@ip-172-31-31-134:/home/ubuntu# apt install ./couchbase-server-enterprise_7.0.1-ubuntu18.04_amd64.deb
|
Reading package lists... Done
|
Building dependency tree
|
Reading state information... Done
|
Note, selecting 'couchbase-server' instead of './couchbase-server-enterprise_7.0.1-ubuntu18.04_amd64.deb'
|
The following packages will be upgraded:
|
couchbase-server
|
1 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
|
Need to get 0 B/435 MB of archives.
|
After this operation, 12.7 MB disk space will be freed.
|
Get:1 /home/ubuntu/couchbase-server-enterprise_7.0.1-ubuntu18.04_amd64.deb couchbase-server amd64 7.0.1-6102-1 [435 MB]
|
(Reading database ... 61709 files and directories currently installed.)
|
Preparing to unpack .../couchbase-server-enterprise_7.0.1-ubuntu18.04_amd64.deb ...
|
Warning: Transparent hugepages looks to be active and should not be.
|
Please look at https://docs.couchbase.com/server/7.0/install/thp-disable.html as for how to PERMANENTLY alter this setting.
|
Warning: Swappiness is not set to 0.
|
Please look at https://docs.couchbase.com/server/7.0/install/install-swap-space.html as for how to PERMANENTLY alter this setting.
|
Minimum RAM required : 4 GB
|
System RAM configured : 7.28 GB
|
|
Minimum number of processors required : 4 cores
|
Number of processors on the system : 4 cores
|
Upgrading previous couchbase ... (7.0.0-5302-1)
|
Saving previous couchbase config.dat ...
|
Cleaning symlinks ...
|
Unpacking couchbase-server (7.0.1-6102-1) over (7.0.0-5302-1) ...
|
Setting up couchbase-server (7.0.1-6102-1) ...
|
Upgrading couchbase-server ...
|
/opt/couchbase/bin/cbupgrade -c /opt/couchbase/var/lib/couchbase/config -a yes
|
Automatic mode: running without interactive questions or confirmations.
|
Analysing...
|
Previous config.dat file is /opt/couchbase/var/lib/couchbase/config/config.dat.debsave
|
|
Database dir: /opt/couchbase/var/lib/couchbase/data
|
|
Buckets to upgrade: [no buckets found]
|
|
Checking disk space available for buckets in directory:
|
/opt/couchbase/var/lib/couchbase/data
|
Free disk bucket space wanted: 0.0
|
Free disk bucket space available: 37674733568
|
Free disk space factor: 2.0
|
Ok.
|
|
Analysis complete.
|
|
No data directories found for namespace upgrade
|
Copying /opt/couchbase/var/lib/couchbase/config/config.dat.debsave
|
cp /opt/couchbase/var/lib/couchbase/config/config.dat.debsave /opt/couchbase/var/lib/couchbase/config/config.dat
|
Copying /opt/couchbase/var/lib/couchbase/ip_start.debsave
|
cp /opt/couchbase/var/lib/couchbase/ip_start.debsave /opt/couchbase/var/lib/couchbase/ip
|
Ensuring bucket data directories.
|
Ensuring dbdir and indexdir owner/group: /opt/couchbase/var/lib/couchbase/data
|
chown -R couchbase:couchbase /opt/couchbase/var/lib/couchbase/data
|
Ensuring dbdir and indexdir owner/group: /opt/couchbase/var/lib/couchbase/data
|
chown -R couchbase:couchbase /opt/couchbase/var/lib/couchbase/data
|
Ensuring dbdir and indexdir owner/group: /opt/couchbase/var/lib/couchbase/data
|
chown -R couchbase:couchbase /opt/couchbase/var/lib/couchbase/data
|
|
Done.
|
|
You have successfully installed Couchbase Server.
|
Please browse to http://ip-172-31-31-134:8091/ to configure your server.
|
Refer to https://docs.couchbase.com for additional resources.
|
|
Please note that you have to update your firewall configuration to
|
allow external connections to a number of network ports for full
|
operation. Refer to the documentation for the current list:
|
https://docs.couchbase.com/server/7.0/install/install-ports.html
|
|
By using this software you agree to the End User License Agreement.
|
See /opt/couchbase/LICENSE.txt.
|
|
root@ip-172-31-31-134:/home/ubuntu
|
"Tested in centos 7.7 in AWS. After upgrade to 7.0.1, UI on both nodes are up as expected." – this is very unclear and the box you pointed at was up and running. Where do you mention "ec2-52-38-173-48.us-west-2.compute.amazonaws.com"?Â
Ubuntu 18.04 cluster has 2 nodes, one is ec2-52-38-173-48.us-west-2.compute.amazonaws.com and another one is ec2-54-189-144-232.us-west-2.compute.amazonaws.com
If you look at cluster in UI, you will see that node ec2-52-38-173-48.us-west-2.compute.amazonaws.com shows Couchbase server version Version: Enterprise Edition 7.0.0 build 5302 but in Ubuntu 18.04 server, it shows 7.0.1-6102 because this node was upgrade successfully using offline upgrade
root@ip-172-31-31-134:/home/ubuntu# more /opt/couchbase/VERSION.txt
7.0.1-6102
root@ip-172-31-31-134:/home/ubuntu#
Credentials to login to these Ubuntu 18.04 servers I did post in this ticket above
I tested in centos 7 servers to see if this issue only happens in ubuntu or it hits other operating system. It comes out centos 7 offline upgrade does not hit this issue.
This might possibly be caused by the upgrade process, but I'll also need more info to know. The available logs here are impenetrable to me. If someone can narrow it down to what exactly is failing (probably a certain child process that ns_server is trying to bring up?), maybe we can figure out what's wrong, and from there figure out how it happened.
Yeah I can help you a bit there.. but only a bit..Â
I'm not exactly sure why this happens but it seems like the current issue. Aliaksey Artamonau this not_found is from chronicle, and I'm assuming it's referring to missing some key "X".. not 100% sure.. still looking at it.Â
Looks like chronicle files got deleted somewhere in between the upgrade from 7.0 to 7.0.1:
[ns_server:debug,2021-10-06T22:41:28.833Z,ns_1@ec2-52-38-173-48.us-west-2.compute.amazonaws.com:chronicle_local<0.213.0>:chronicle_local:init:57]Chronicle state is: not_provisioned
|
[ns_server:debug,2021-10-06T22:41:28.833Z,ns_1@ec2-52-38-173-48.us-west-2.compute.amazonaws.com:chronicle_local<0.213.0>:chronicle_local:provision:134]Provision chronicle on this node
|
At the same time ns_config was left intact. So the compat version in ns_config indicates that we should use chronicle, but the latter has got no data in it.
Does not appear to be an ns_server issue. Possible explanation: upgrade scripts did not preserve /opt/couchbase/var/lib/couchbase/config/chronicle.
Direct offline upgrade from 6.5.1 to 7.0.1 on one node cluster in ubuntu 18.04. Upgrade works as expected. Will do with 2 node cluster
Aliaksey Artamonau Both nodes in that cluster (the one that is still 7.0.0 and the one that got upgraded to 7.0.1) appear to have tan almost identical set of files in /opt/couchbase/var/lib/couchbase/config/chronicle:
root@ip-172-31-31-134:/opt/couchbase/var/lib/couchbase/config/chronicle# ls -lR |
.:
|
total 16 |
drwxrwx--- 2 couchbase couchbase 4096 Oct 7 19:19 logs |
drwxrwx--- 4 couchbase couchbase 4096 Oct 6 22:41 rsms |
drwxrwx--- 3 couchbase couchbase 4096 Oct 7 22:42 snapshots |
-rw-rw---- 1 couchbase couchbase 2 Oct 6 22:41 version |
|
./logs:
|
total 4828 |
-rw-rw---- 1 couchbase couchbase 1048697 Oct 7 03:50 0.log |
-rw-rw---- 1 couchbase couchbase 1049492 Oct 7 08:58 1.log |
-rw-rw---- 1 couchbase couchbase 1049432 Oct 7 14:08 2.log |
-rw-rw---- 1 couchbase couchbase 1049410 Oct 7 19:19 3.log |
-rw-rw---- 1 couchbase couchbase 709957 Oct 7 22:51 4.log |
|
./rsms:
|
total 8 |
drwxrwx--- 2 couchbase couchbase 4096 Oct 7 22:51 chronicle_config_rsm |
drwxrwx--- 2 couchbase couchbase 4096 Oct 7 22:51 kv |
|
./rsms/chronicle_config_rsm:
|
total 4 |
-rw-rw---- 1 couchbase couchbase 6 Oct 7 22:51 incarnation |
|
./rsms/kv:
|
total 4 |
-rw-rw---- 1 couchbase couchbase 6 Oct 7 22:51 incarnation |
|
./snapshots:
|
total 4 |
drwxrwx--- 2 couchbase couchbase 4096 Oct 7 22:42 14700 |
|
./snapshots/14700: |
total 8 |
-rw-rw---- 1 couchbase couchbase 277 Oct 7 22:42 chronicle_config_rsm.snapshot |
-rw-rw---- 1 couchbase couchbase 82 Oct 7 22:42 kv.snapshot |
|
The above is from the 7.0.1 node. The 7.0.0 node only has one file in ./logs, 0.log, and it's relatively small (10k). On the other hand, ./snapshots/xxx/kv.snapshot is 1695 bytes on the 7.0.0 node.
Can you tell me if any of those numbers are surprising, or if it looks like any other files that should be there are missing?
Direct offline upgrade from 6.5.1 to 7.0.1 on 2 nodes cluster in ubuntu 18.04. Upgrade works as expected. UI in 2 nodes are up after upgrade.
I could reproduce this issue when upgrade from 7.0.1 to 7.0.2 in ubuntu 18.04. After offline upgrade, Couchbase server of upgrade node could not start.
http://ec2-52-13-125-200.us-west-2.compute.amazonaws.com:8091/ui/index.html#/servers/list?commonBucket=travel-sample&scenarioZoom=minute&scenario=r62lynvxu&openedServers=ec2-54-186-191-72.us-west-2.compute.amazonaws.com:8091&openedServers=ec2-52-13-125-200.us-west-2.compute.amazonaws.com:8091
Can you tell me if any of those numbers are surprising, or if it looks like any other files that should be there are missing?
Once the server is started, all the missing files will get recreated. So no, none of these look "broken". But I would guess the one with only a single log file (0.log) is where those files were previously deleted.
That's surprising, since the one with the single small log file is the one that had NOT been upgraded.
I've been able to reproduce this on Ubuntu 18.04 with the most trivial upgrade scenario:
- Install 7.0.0 GA on a single node and create travel-sample.
- Install 7.0.1 GA (doesn't seem to matter whether you stop Server before doing the upgrade or not).
When doing the upgrade, I see the following output:
Unpacking couchbase-server (7.0.1-6102-1) over (7.0.0-5302-1) ... |
Setting up couchbase-server (7.0.1-6102-1) ... |
Upgrading couchbase-server ...
|
/opt/couchbase/bin/cbupgrade -c /opt/couchbase/var/lib/couchbase/config -a yes
|
Automatic mode: running without interactive questions or confirmations.
|
Analysing...
|
Previous config.dat file is /opt/couchbase/var/lib/couchbase/config/config.dat.debsave
|
|
Database dir: /opt/couchbase/var/lib/couchbase/data
|
|
Buckets to upgrade: [no buckets found]
|
|
Checking disk space available for buckets in directory: |
/opt/couchbase/var/lib/couchbase/data
|
Free disk bucket space wanted: 0.0 |
Free disk bucket space available: 150687678464 |
Free disk space factor: 2.0 |
Ok.
|
|
Analysis complete.
|
|
No data directories found for namespace upgrade |
Copying /opt/couchbase/var/lib/couchbase/config/config.dat.debsave
|
cp /opt/couchbase/var/lib/couchbase/config/config.dat.debsave /opt/couchbase/var/lib/couchbase/config/config.dat
|
Copying /opt/couchbase/var/lib/couchbase/ip.debsave
|
cp /opt/couchbase/var/lib/couchbase/ip.debsave /opt/couchbase/var/lib/couchbase/ip
|
Ensuring bucket data directories.
|
Ensuring dbdir and indexdir owner/group: /opt/couchbase/var/lib/couchbase/data
|
chown -R couchbase:couchbase /opt/couchbase/var/lib/couchbase/data
|
Ensuring dbdir and indexdir owner/group: /opt/couchbase/var/lib/couchbase/data
|
chown -R couchbase:couchbase /opt/couchbase/var/lib/couchbase/data
|
Ensuring dbdir and indexdir owner/group: /opt/couchbase/var/lib/couchbase/data
|
chown -R couchbase:couchbase /opt/couchbase/var/lib/couchbase/data
|
|
Done.
|
|
You have successfully installed Couchbase Server.
|
Please browse to http://74757b46d378:8091/ to configure your server. |
Several things in there feel odd / suspicious:
- Why does it not find any buckets to upgrade?
- The manipulations of config.dat.debsave feel likely to be behind the problem.
My gut is saying that cbupgrade is being invoked at the "wrong time" during the upgrade process. I know that the order the install/upgrade/remove steps are performed are different between rpm and deb-based systems, so perhaps a process that is correct for centos7 is not for Ubuntu. I'm digging a little farther, but I'll probably need to work with Patrick Varley on this.
Ugh. This is happening because of the fix for MB-44229, which was to explicitly blow away the chronicle log directory on a full uninstall. Apparently the way I implemented this for Debian packaging is wrong. Seeing what I can do now.
I've proposed a fix which should be good to go into 7.0.2. Unfortunately, due to the nature of the problem, this will only fix upgrading from 7.0.2. The bug is in 7.0.0 and 7.0.1, meaning that anyone attempting to upgrade from 7.0.0 or 7.0.1 on a Debian-based system will hit this failure. The bug happens too early in the upgrade sequence for me to implement a fix for upgrading to 7.0.2 from 7.0.0 or 7.0.1; by the time anything in the 7.0.2 installer is invoked, the damage is already done.
The workaround is to make a copy of /opt/couchbase/var/lib/couchbase/config/chronicle after stopping Server; upgrading the Server with INSTALL_DONT_START_SERVER=1; then restoring the chronicle directory prior to starting Server. The official "Upgrade an Offline Cluster" documentation (https://docs.couchbase.com/server/current/install/upgrade-cluster-offline.html ) already recommends creating a backup of /opt/couchbase/var/lib/couchbase/config before starting the upgrade; if the user does so, they can just copy the chronicle directory back. In detail:
# Stop Couchbase Server
|
systemctl stop couchbase-server.service
|
|
# Back up all config (probably better not to use /tmp)
|
cp -a /opt/couchbase/var/lib/couchbase/config /tmp
|
|
# Upgrade but don't restart Server
|
INSTALL_DONT_START_SERVER=1 apt install -y ./couchbase-server-enterprise_7.0.1-ubuntu18.04_amd64.deb |
|
# Restore the chronicle directory
|
cp -a /tmp/config/chronicle /opt/couchbase/var/lib/couchbase/config
|
|
# Start Couchbase Server
|
systemctl start couchbase-server.service
|
Again, this should only be done when a user is upgrading from 7.0.0 or 7.0.1 to any later version.
I have tested the above scenario upgrading from 7.0.0 to 7.0.1, and things worked fine.
Fix is in, although again it'll only fix upgrading from 7.0.2 to a later version. The above workaround should make it into the 7.0.2 upgrade release notes; I'm not exactly sure how to achieve that.
Build couchbase-server-7.0.2-6702 contains voltron commit 2532c15 with commit message:
MB-48783: Only purge chronicle files on final removal
Build couchbase-server-7.0.2-6703 contains product-metadata commit 7bdb1be with commit message:
MB-48783: Remove 7.0.0/7.0.1 from apt repos
Build couchbase-server-7.1.0-1457 contains product-metadata commit 7bdb1be with commit message:
MB-48783: Remove 7.0.0/7.0.1 from apt repos
Verified offline upgrade in ubuntu 20.04 from 7.0.2-6703 to 7.1.0-1468 successfully
Verified offline upgrade workaround from 7.0.1 to 7.0.2 on ubuntu 20.04 works. Upgrade node is up after copy back config file and start couchbase server
cp -a /opt/couchbase/var/lib/couchbase/config /tmp
|
INSTALL_DONT_START_SERVER=1 apt install -y ./couchbase-server-enterprise_7.0.2-6703* |
cp -a /tmp/config/chronicle /opt/couchbase/var/lib/couchbase/config
|
systemctl start couchbase-server.service
|
Tony Hillman please see the note from Dave on adding this issue to the release note for 7.0.0 and 7.0.1. Can you create tickets for these doc updates and link to this ticket?
Verified offline upgrade from 7.0.2-6703 to 7.1.0-1468 on
ubuntu18 / 20,
debian9 / 10
centos 7 / rhel8
suse12 / 15
oel7 / 8
amzn2
windows 2016
All passed
Build cbdeps::erlang-neo-2 contains build-tools commit 9dd764a with commit message:
MB-48783: Don't release 7.0.0/7.0.1 on apt
Build couchbase-server-7.1.0-1637 contains voltron commit 2532c15 with commit message:
MB-48783: Only purge chronicle files on final removal
Build cbdeps::grpc-1.31.1-2 contains build-tools commit 9dd764a with commit message:
MB-48783: Don't release 7.0.0/7.0.1 on apt
Build cbdeps::curl-7.78.0-5 contains build-tools commit 9dd764a with commit message:
MB-48783: Don't release 7.0.0/7.0.1 on apt
User hit this upgrade issue https://forums.couchbase.com/t/couchbase-upgrade-issues-from-7-0-0-to-7-0-1/31867/2