The Linux Scheduler: a Decade of Wasted Cores

文章目录[隐藏]

The main novelty is that Linux uses a CFS algorithm with a weighted fair queuing algorithm. Say the CFS time-slices the CPU using the red-black tree to deal with the running thread. Without I/O, we may have a better performance than O(1) . Each CPU's run queue cfs_rq maintains a min_vruntime field that records the minimum value of vruntime for all processes in that run queue. The initial vruntime value of a new process is set based on the min_vruntime of the run queue it is in, keeping it within a reasonable gap from the old process.

Pros

The dormant process is compensated by vruntime when it wakes up, and its ability to grab the CPU when it wakes up is a probable event, which is the intent of the CFS scheduling algorithm, i.e., to guarantee the responsiveness of interactive processes, which would be frequently dormant waiting for user input.

Imagine when you perform each interactive operation such as hitting the keyboard, moving the mouse, etc., for the system this is when a new task comes -> runtime is 0 -> vruntime is 0 -> is put into the leftmost node of the red-black tree of the scheduling task queue -> the leftmost node is pointed to via a special pointer and that pointer is cached

Cons

  1. Priority failure. For example, there are two processes, process A has a nice of 0 and an allocated time slice of 100ms, process B has a nice of +20 and an allocated time slice of 5ms. if we have a time slice of 5ms, after process A runs for 5ms, it will switch to process B, which will also run for 5ms, so within 10ms, process A and process B will run for the same amount of time And so on, within 100ms, process A and process B run for the same amount of time, which is obviously inconsistent with the time slice we've allocated. (This is obviously inconsistent with our time slice allocation. (I think I understand it this way, I read it several times and didn't understand it)
  2. Relative nice value. For example, if there are two processes, the first one assumes a nice value of 0 and the second one assumes a nice value of 1. These two processes will be allocated time slices of 100ms and 95ms respectively. But if the two processes have a nice value of 18 and 19, they will be allocated 10ms and 5ms respectively, which is less efficient for the whole cpu.
  3. The absolute time slice needs to be on a timer beat. When the time slice is reached, the process needs to be switched, at which point, an interrupt from the timer is needed to trigger it, in which case the time slice for both processes can't be split that fine, because the timer beat has to be met. (Can be calculated in using another value as separate from the timer beat)
  4. Based on the priority adjustment problem. To raise the priority of a newly woken process in order for the process to be up and running faster.

There are no crashes or out-of-memory situations, and tools like htop, sar, or perf cannot detect the missing short-term idle periods, making it difficult to detect the issues reported in this research. The authors refer to the first tool as a "sanity checker." It checks that no core is idling while waiting threads are in the running queue of a different core. It permits this condition to exist for a brief time but issues an alert if it continues. A visualizer that displayed scheduling activity over time was the second tool. As a result, the number of running queues, their overall load, and the cores that were taken into account during routine load balancing and thread wakeups may all be profiled and plotted. Scheduling, as in allocating CPU time among threads, was once considered an issue that had been resolved. We demonstrate that this is untrue. A straightforward scheduling principle produced a highly intricate, bug-prone implementation in order to accommodate the complexity of current hardware. We found that scheduling waiting threads onto idle cores breaks a fundamental work-conserving invariant. Runnable threads might get held in running queues for a few seconds if there are idle cores in the system, which would cause a significant decrease in application performance. These vulnerabilities are difficult to find using standard tools because of their nature. We repair these flaws, identify their underlying causes, and provide tools that make finding and correcting these bugs much simpler. As for the future, Of course, the Linux kernel as a code, it is universal, so it is difficult for the community to see and focus on multi-core issues alone, the community is most concerned about maintainability, not performance. new features of Linux in 128MB memory i386 machine running no problem, that is OK. As long as not more than 80% of people encounter new problems, the community is never care, at the same time, because of this, the community will introduce bugs, which is also want to sigh can not sigh. My opinion is that the community is just a community of programmers who take the code as the criterion, and the community does not pay too much attention to the development of architecture and new features, which are all vendor things. Likewise with TCP, which I have always sprayed, people focus on the TCP implementation code itself, which is what makes it more and more complex, and then more and more fragile, which you might say is evolution, but isn't there a chance to go back to the drawing board before all hell breaks loose? It hasn't evolved to the point where it must continue to evolve, right? If you stand outside to see and have mandatory measures, it is estimated that there is no garbage TCP long ago.

Reference

  1. https://www.usenix.org/system/files/conference/atc18/atc18-bouron.pdf