Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.1.0
-
7.1.0-2478
-
Untriaged
-
Centos 64-bit
-
1
-
Unknown
-
KV March-22, KV May 22
Description
STEPS TO RECREATE:
DISK FULL TEST
- Create a 4 node cluster
- Create 5 million items (doc size = 2048) and replicas =1
- Fill entire disk , ( "fallocate -l <space left on disk> <file_name>", e.g "fallocate -l 84716M /data/full_disk_84716MB_1647101247.94")
- After Disk is full, Start doc ops (create docs) until ep_data_write_failed > 0(ensured using cbstats)
- Kill memcached on all nodes (kill -9 $(pgrep memcached) Time difference between sigKill on each node was three seconds
- Observed "2022-03-12T09:08:26.230072-08:00 CRITICAL (default) WarmupBackfillTask::run(): caught exception while running backfill - aborting warmup: WarmupVbucketVisitor::visit(): vb:107 shardId:3 failed to create BySeqnoScanContext, for backfill task:'Warmup - loading KV Pairs shard 3'"
(Observed on node 172.23.122.247)
QE-TEST:
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.33408.ini bucket_storage=magma,rerun=false,bucket_eviction_policy=fullEviction,randomize_value=True,enable_dp=false,GROUP=P0,get-cbcollect-info=True,upgrade_version=7.1.0-1671 -t storage.magma.magma_disk_full.MagmaDiskFull.test_crash_recovery_disk_full,nodes_init=4,num_items=5000000,doc_size=2048,sdk_timeout=60,replicas=1,GROUP=P0'
|
Note:
- After the above failure, in tear down we clear the disk space, by removing the file created to fill up the disk (step 3 mentioned above (using rm -rf /data/full_disk_*)). But even after creating disk space all nodes on UI stays in amber state.
- This issue is not easily reproducible . I ran this test many times on the same build, but was able to hit this issue only once.
Cluster details: http://172.23.122.245:8091/ui/index.html#/buckets?commonBucket=default&scenarioZoom=minute&scenario=oombr8sk5