Beancounters

Current day resource management in Linux is certainly adequate for management of a conventional operating system, but when trying to provide limited resource sets for contained groups of processes, it falls a bit short. Of these resources, memory is the most notable, so an important task is to keep track of the amount of pages each process is allowed to use.

For this reason, OpenVZ has implemented an addition to the kernel called beancounters (BC). These kernel-level objects keep track of the resources available and can notify the kernel of user space processes that are overstepping their boundaries.

Since containers are no more than groups of processes, actual allocation of memory and other resources requires no large changes. However, upon creation of a new container, an "init" process is made for it. Readers familiar with Linux-based systems will recognize this process as the proverbial mother of all processes, and in containers it is no different. Upon creation, the process is assigned to a beancounter object by the OpenVZ software. As a result, each of its child processes is bound to it. From that point on, the BC controls the maximum amount of resources available to that container.

 

Think of the BC as containing a sort of table, of which the columns contain the resources currently held, the maximum of resources held during the session and a barrier value, at which the BC will send a warning to the kernel to stop allocating resources. Furthermore, an absolute limit for moments of burst usage and the number of failed allocations due to a shortage of still available resources can be tracked. The rows, in turn, contain the different types of resources the BC can keep track of, including but not limited to:

  • the user memory
  • network buffers
  • # of tasks
  • # of files
  • # of sockets
  • # of file locks

We're able to see the current status of beancounters for each existing container by outputting the contents of /proc/user_beancounters

We'll try to cover the actual accounting of some types of resources as short and as clear as possible.

Memory accounting

Accounting of memory is divided into several parts and discussing all of them would fill up several pages. The following paragraphs dip into the logic behind the system a bit more deeply, and discuss some of the challenges faced in allocating memory to several containers while showing examples of how beancounters can be put to use in a container-based environment.

Virtual Memory Area lengths:

Processes can make requests for extra memory pages, without necessarily putting them to use immediately or even using them at all. Instead of receiving the full amount on the physical RAM, it is given an amount of "virtual" pages, so that only the number of pages actually used are loaded into the RAM, while "empty" pages remain free for other processes to use.

This way, each process works with the impression that the full amount of virtual pages is always available, and even though it isn't currently using the free pages, it can do so in the future if the need arises. The issue here is that the total amount of virtual memory for all processes is usually much larger than what is available in the RAM.

When a lot of processes start actually using most of their virtual pages, the RAM might not suffice, and will need to swap certain data to the first available storage space to make room: the hard disk. As the hard disk is a much slower medium, this is a situation to avoid, especially when many users need to make use of the same system. One user could technically put all the others out of business by starting up some processes that are able to fill up the entirety of the physical RAM.


In the above picture, we can see pages in the virtual memory that exist physically in either the RAM or the HD. The total amount of Virtual Memory available is the sum of what is available in the RAM along with a certain amount assigned to it on the HD. The beancounter keeps track of the total amount of virtual pages allocated, so it can anticipate troublesome situations and deny requests for more virtual pages if needed.

Resident Set Size

RSS is a term used to denote only the actual "used" pages by a process that exist in the physical RAM. The problem here is that, even though a page is the smallest possible unit for memory allocation, it is sometimes possible for a page to be mapped by several processes (i.e. when the same file is used). Therefore, a single page in use on two different beancounters would count as two pages in total, making the count inaccurate. Therefore, RSS is calculated with a system of parts, as follows:

  1. BC1 encounters a used page, and counts it as a whole = 1
  2. BC2 encounters the same page, and both beancounters count half a page = 1/2, 1/2
  3. BC3 encounters the same page and one of the present beancounters splits half the page = 1/2, 1/4, 1/4
  4. When BC4 arrives, the other half is split, and so on. = 1/4, 1/4, 1/4, 1/4

In this picture from Pavlov Emelyanov's explanation on beancounters, we can see the division of pages explained. This is an efficient way of keeping the RSS count accurate, as the arrival of a new beancounter only triggers a change in one of the others, as opposed to the full group.

A Closer Look into OpenVZ's Inner Workings CPU Scheduling and I/O Scheduling
Comments Locked

3 Comments

View All Comments

Log in

Don't have an account? Sign up now