In the prior work, we described the “internal fragmentation” problem of collection files and secondary index files in MongoDB. That data fragmentation can be solved by using boundary-based stream mapping. However, in complex data model such as Linkbench, boundary-based stream mapping is inadequate with a new type of data fragmentation named cross-region fragmentation.
To illustrate the cross-region fragmentation, we implemented boundary-based stream mapping in MongoDB and experiment it with Linkbench. Figure 1 are written patterns of collection files and secondary index files using blktrace. In Figure 1 (b) all bottom regions of collection files are mapped with stream 1 (red color). However, inside this stream, there are overlapped writes of files with different write frequency as shown in Figure 1 (d) (three different colors). We observed the same phenomenon in the case of the top regions.