Details
-
Bug
-
Resolution: Duplicate
-
Test Blocker
-
Cheshire-Cat
-
Untriaged
-
-
1
-
Yes
-
KV-Engine 2021-Feb
Description
Build : 7.0.0-4547
Test : -test tests/integration/cheshirecat/test_cheshirecat_kv_gsi_coll_xdcr_backup_sgw_fts_itemct_txns_eventing_cbas.yml -scope tests/integration/cheshirecat/scope_cheshirecat_with_backup.yml
Scale :
Iteration : 1st
In the system test, the disk on one KV node 172.23.120.77 gets full, and then the test does not proceed as expected.
On 172.23.120.77, df shows 100% disk usage on /data (100G), but du shows that only 48 GB is used.
[root@localhost bin]# df -kh
Filesystem Size Used Avail Use% Mounted on
devtmpfs 12G 0 12G 0% /dev
tmpfs 12G 0 12G 0% /dev/shm
tmpfs 12G 803M 11G 7% /run
tmpfs 12G 0 12G 0% /sys/fs/cgroup
/dev/mapper/centos-root 31G 8.5G 23G 28% /
/dev/xvdb1 100G 100G 20K 100% /data
/dev/xvda1 497M 284M 214M 58% /boot
tmpfs 2.4G 0 2.4G 0% /run/user/0
[root@localhost data]# pwd
/data
[root@localhost data]# du -sh *
0 archive
48G couchbase
From lsof, it is seen that memcached is holding up 9822 deleted files, which might be the cause.
[root@localhost data]# /usr/sbin/lsof | grep deleted | grep memcached | wc -l
9822
The disk_almost_full alarm started going off from 2021-03-01T03:34:16.
[root@localhost logs]# zgrep -i "disk_almost_full" babysitter.log
|
[ns_server:info,2021-03-01T03:34:16.379-08:00,babysitter_of_ns_1@cb.local:<0.121.0>:ns_port_server:log:224]ns_server<0.121.0>: 2021-03-01 03:34:16.165561 std_info #{label=>{error_logger,info_report},report=>[{alarm_handler,{set,{{disk_almost_full,"/data"},[]}}}]}
|
[ns_server:info,2021-03-01T03:36:16.376-08:00,babysitter_of_ns_1@cb.local:<0.121.0>:ns_port_server:log:224]ns_server<0.121.0>: 2021-03-01 03:36:16.175421 std_info #{label=>{error_logger,info_report},report=>[{alarm_handler,{clear,{disk_almost_full,"/data"}}}]}
|
[ns_server:info,2021-03-01T03:43:16.416-08:00,babysitter_of_ns_1@cb.local:<0.121.0>:ns_port_server:log:224]ns_server<0.121.0>: 2021-03-01 03:43:16.216190 std_info #{label=>{error_logger,info_report},report=>[{alarm_handler,{set,{{disk_almost_full,"/data"},[]}}}]}
|
[ns_server:info,2021-03-01T04:02:16.559-08:00,babysitter_of_ns_1@cb.local:<0.121.0>:ns_port_server:log:224]ns_server<0.121.0>: 2021-03-01 04:02:16.349853 std_info #{label=>{error_logger,info_report},report=>[{alarm_handler,{clear,{disk_almost_full,"/data"}}}]}
|
[ns_server:info,2021-03-01T04:04:16.578-08:00,babysitter_of_ns_1@cb.local:<0.121.0>:ns_port_server:log:224]ns_server<0.121.0>: 2021-03-01 04:04:16.376625 std_info #{label=>{error_logger,info_report},report=>[{alarm_handler,{set,{{disk_almost_full,"/data"},[]}}}]}
|
We will continue to check further, but this looks like MB-41924. On another cluster running 7.0.0-4554, we ran into the same issue after around the same duration of test run. However, the test run with 7.0.0-4539 did not show this issue. So this could be a regression.