Make the defragmenter more reactive to sudden changes in fragmentation

Description

In the auto_pid mode, we use a PID controller to calculate the sleep time for the DefragmenterTask. 

The sleep time can vary between defragmenter_auto_min_sleep (default: 0.6s) and 

defragmenter_auto_max_sleep (default: 10s). We calculate the output of the PID controller and subtract that from the max sleep time.

The input to the PID controller is the current scoredFragmentation. The PID controller internally uses an "error value" which is the difference between the input and the set point (scoredFragmentation, default 7%). The error is essentially how far away we are from the target fragmentation of 7%. 

The PID controller has 3 terms, of which we use 2 (one is set to 0):

  • The proportional term, which is the error multiplied by a constant factor P

  • The integral term (an accumulation of all errors, multiplied by a constant factor I)

The PID controller output is recalculated every defragmenter_auto_pid_dt (default: 30 seconds). 

The P is currently set to 0.3. For 30% scored fragmentation, we'd get an proportional term 0.3 x 0.3 = 0.09s of sleep reduction. So even at 30% fragmentation, the PID controller would start running every 9.91s, and any other adjustment to sleep time would come from the integral term (accumulated over time). 

This might not be the ideal behaviour in cases of bulk expiration/deletion workloads where we get a sudden increase in memory fragmentation and we might want a more immediate, proportional, response.

 

Issue

Resolution

Workloads involving bulk data ingestion or Time-To-Live (TTLs) expiring at the same time caused a sudden increase in memory fragmentation.

The defragmenter now runs more frequently to better cope with sudden increases in fragmentation.

Fix versions

Labels

Environment

None

Release Notes Description

None

Activity

Show:

CB robot August 21, 2023 at 3:21 PM

Build couchbase-server-8.0.0-1382 contains kv_engine commit 0cb279e with commit message:
: Merge branch 'couchbase/7.1.4' into couchbase/neo

CB robot August 21, 2023 at 11:42 AM

Build couchbase-server-7.6.0-1388 contains kv_engine commit 0cb279e with commit message:
: Merge branch 'couchbase/7.1.4' into couchbase/neo

CB robot August 17, 2023 at 3:52 PM

Build couchbase-server-8.0.0-1378 contains kv_engine commit 4b21143 with commit message:
: [BP] Set the auto_pid proportional factor to a higher value

CB robot August 17, 2023 at 3:52 PM

Build couchbase-server-8.0.0-1378 contains kv_engine commit 63d28da with commit message:
: [BP] Reduce the defragmentation age thresholds to 1

CB robot August 17, 2023 at 3:52 PM

Build couchbase-server-8.0.0-1378 contains kv_engine commit c28adca with commit message:
: [BP] Expose the defragmenter task sleep time as a stat

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Story Points

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created March 13, 2023 at 3:05 PM
Updated September 19, 2023 at 11:19 AM
Resolved April 14, 2023 at 2:42 PM
Instabug