SSDs and HDDs design difference.
We described two approaches to building file systems, clustering, and logging, for Hard Disk Drives (HDDs). How should we design a file system for a Solid State Drive (SSD)? Your answer should identify two hardware properties of SSDs that contrast with the properties of HDDs and describe how these properties inform your design decisions.
In the first FS prototype of UNIX, FFS, we used inode and dirent for interpreting the abstraction for file, those designs are aware of the physical media of the HDDs of how addressing a space will be faster and the page size for balancing swapping the physical memory onto the disks. This type of filesystem is not designed for SSDs. The SSDs have NAND QLC/SLC/MLC as physical media to store files and the on-chip controller is equipped with the ability of LBA to page map translation, drive channels, write-behind, allocation, and wear leveling (considered harmful in [1]). Nowadays, the firmware FS is designed similarly to Log-structured FS because it can fast recover from physical media failure with low latency cost.
EXT4 is always evolving with new media and smarter design. Ramzi group from the University of Wisconsin still publishes papers on EXT4 core design like leveraging device mapper deduplication [2], and how data path can be more critical [3].
Starting from PMFS, people are getting close to how FS can be designed if the persistent memory is attached as memory or more recently as a CXL.mem. 1. with the hardware speeding up, we require software to be low latency as well. e.g. tracer, hypervisor, and multi-tenancy isolation management 2. the ordering and durability of a single cacheline or a single page, or a single pointer should be dealt in the firmware level, either resolved by locks, lock-free hardware mechanism, or other smart API support. 3. workloads aware optimization by providing performance hints and advise interface like madvise()
in Linux. The storage of ML model serialization is completely different from those configured to be plain memory(no ptr or hierarchical logic) dump. 4. optimize consistency using a combination of atomic in-place updates, logging at cacheline granularity (fine-grained journaling), and copy-on-write (CoW)
Replace POSIX?
In our discussion of single-level stores, we identified a few problems with using this design for all interactions with persistent storage. In this question, describe one potential issue that arises when using a single-level store for persistent storage. How does using a POSIX/file system interface simplify/eliminate the issue that you described?
The VFS is long discussed overhead inside the kernel that causes higher latency by random scheduler overhead or lower bandwidth by blk or other mechanisms of kernel occupying the queue. But we don't currently have a proposal to replace it since the kernel must rely on the blk pread/pwrite
& bdi write back all this stuff. [4] provides
The semantics for POSIX is too strong for distributed FS. If you could put something on the fly in the CXL cache layer, that is bounded for data appearance for already ready. that will be
/ Software-defined SmartSSD/ CXLSSD
Reference
- https://dl.acm.org/doi/pdf/10.1145/3538643.3539750#:~:text=We%20argue%20that%20wear%20leveling,produce%20a%20high%20write%20amplification.
- https://elinux.org/images/0/02/Filesystem_Considerations_for_Embedded_Devices.pdf
- https://github.com/Harry-Chen/HERMES
- https://dl.acm.org/doi/10.1145/3538643.3539751