Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-18453

Task scheduling is not fair!

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 4.1.2, 4.5.1
    • 3.1.3, 3.1.4, 4.0.0, 4.0.1, 4.1.0, 4.1.1, 4.1.2, 4.5.0, 5.0.0
    • couchbase-bucket
    • Security Level: Public
    • Untriaged
    • No

    Description

      This has been observed happening when a high priority task is busy, tasks of lower priorities never get a chance to run and have been observed waiting for many hours.

      The example seen in the field was the following scenario:

      A new node entering the cluster was hit by 10 concurrent DCP streams, this results in 10 DCP consumer Processor tasks all competing for the 3 NONIO threads.

      A related MB (MB-18452) here also meant that the Processor tasks ran for long times without any yield. However the logs show that the task do eventually yield, the long wait times aren't due to tasks effectively infinite looping. This can be seen because we have many hits in the runtimes histogram showing that there was opportunity for other waiting NONIO tasks to run...

      However during the 40 minutes of uptime a number of checkpoint stats were requested by ns_server. These tasks are added to the NONIO queue and during the observed period, these checkpoint tasks were never scheduled.

      There is evidence that in one set of log files that a checkpoint task was actually waiting to run for 10 hours.

      Clearly this should not happen and it looks like the scheduler only considered the task priority, not wait-time, when scheduling during this heavy workload. Processor has priority 0, all other NONIO tasks are lower priority (higher numbers).

      There is code which looks to try and also schedule tasks by the longest waiter but the observed behaviour is that this didn't trigger.

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.
          For Gerrit Dashboard: MB-18453
          # Subject Branch Project Status CR V

          Activity

            People

              jwalker Jim Walker
              jwalker Jim Walker
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty