references:
- http://duartes.org/gustavo/blog/post/page-cache-the-affair-between-memory-and-files/
- https://en.wikipedia.org/wiki/Page_cache
Page Cache, the Affair Between Memory and Files
Two serious problems must be solved by the OS when it comes to files:
- The first one is the mind-blowing slowness of hard drives, and disk seeks in particular, relative to memory.
- The second is the need to load file contents in physical memory once and share the contents among programs.
Happily, both problems can be dealt with in one shot: the page cache, where the kernel stores page-sized chunks of files. People are sometimes surprised by this, but all regular file I/O happens through the page cache.
Sadly, in a regular file read the kernel must copy the contents of the page cache into a user buffer, which not only takes cpu time and hurts the cpu caches, but also wastes physical memory with duplicate data. Memory-mapped files are the way out of this madness
Disk is 5 orders of magnitude slower than RAM, hence a page cache hit is a huge win.Due to the page cache architecture, when a program calls write() bytes are simply copied to the page cache and the page is marked dirty. Disk I/O normally does not happen immediately, thus your program doesn’t block waiting for the disk. On the downside, if the computer crashes your writes will never make it, hence critical files like database transaction logs must be fsync()ed (though one must still worry about drive controller caches, oy!).
reference: https://www.thomas-krenn.com/en/wiki/Linux_Page_Cache_Basics
Have a try
dd if=/dev/zero of=testfile.txt bs=1M count=10
cat /proc/meminfo | grep Dirty
// if i cut the power of the vm, the testfile.txt will be empty.
sync
cat /proc/meminfo | grep Dirty
Pages in the page cache modified after being brought in are called dirty pages. Since non-dirty pages in the page cache have identical copies in secondary storage (e.g. hard disk drive or solid-state drive), discarding and reusing their space is much quicker than paging out application memory, and is often preferred over flushing the dirty pages into secondary storage and reusing their space. Executable binaries, such as applications and libraries, are also typically accessed through page cache and mapped to individual process spaces using virtual memory (this is done through the mmap system call on Unix-like operating systems). This not only means that the binary files are shared between separate processes, but also that unused parts of binaries will be flushed out of main memory eventually, leading to memory conservation.