Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 7.1.1
Affects Version/s: 7.1.0
Component/s: couchbase-bucket
Labels:
Environment:
7.1.0-2478

Triage:
Untriaged
Operating System:
Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump:
https://cb-engineering.s3.amazonaws.com/warmup_issue.tar.gz
Story Points:
1
Is this a Regression?:
Unknown
Sprint:
KV March-22, KV May 22

Description

STEPS TO RECREATE:
DISK FULL TEST

Create a 4 node cluster
Create 5 million items (doc size = 2048) and replicas =1
Fill entire disk , ( "fallocate -l <space left on disk> <file_name>", e.g "fallocate -l 84716M /data/full_disk_84716MB_1647101247.94")
After Disk is full, Start doc ops (create docs) until ep_data_write_failed > 0(ensured using cbstats)
Kill memcached on all nodes (kill -9 $(pgrep memcached) Time difference between sigKill on each node was three seconds
Observed "2022-03-12T09:08:26.230072-08:00 CRITICAL (default) WarmupBackfillTask::run(): caught exception while running backfill - aborting warmup: WarmupVbucketVisitor::visit(): vb:107 shardId:3 failed to create BySeqnoScanContext, for backfill task:'Warmup - loading KV Pairs shard 3'"
(Observed on node 172.23.122.247)

QE-TEST:

guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.33408.ini bucket_storage=magma,rerun=false,bucket_eviction_policy=fullEviction,randomize_value=True,enable_dp=false,GROUP=P0,get-cbcollect-info=True,upgrade_version=7.1.0-1671 -t storage.magma.magma_disk_full.MagmaDiskFull.test_crash_recovery_disk_full,nodes_init=4,num_items=5000000,doc_size=2048,sdk_timeout=60,replicas=1,GROUP=P0'

Note:

After the above failure, in tear down we clear the disk space, by removing the file created to fill up the disk (step 3 mentioned above (using rm -rf /data/full_disk_*)). But even after creating disk space all nodes on UI stays in amber state.
This issue is not easily reproducible . I ran this test many times on the same build, but was able to hit this issue only once.

Cluster details: http://172.23.122.245:8091/ui/index.html#/buckets?commonBucket=default&scenarioZoom=minute&scenario=oombr8sk5

Attachments

Activity

People

Assignee:: Ankush Sharma

Reporter:: Ankush Sharma

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 12/Mar/22 5:55 PM

Updated:: 21/Jun/22 7:57 AM

Resolved:: 14/Jun/22 12:05 AM

Warmup aborted when disk is full and vbucket_state local doc does not exist instead of loading data

Details

Description

Attachments

Activity

People

Dates

PagerDuty