Sample report 2:

DIP:
The part that jumps out to me immediately is the emphasis on ease of implementation. This is a very important factor from Intel's perspective. They've had decades of cache designs, and having to throw away these designs and expertise of these designs is very costly at an R&D level. However, I found the claims of ease of implementation a little misleading. LIP and BIP certainly seem easy to implement on top of an existing cache, but DIP opens a whole new can of worms that don't sound easy to resolve. While the performance improvements that the paper boasts certainly seems worth the R&D cost, the claim of ease of implementation seems hard to justify without ignoring the LRU use cases (going full LIP/BIP).

DIP seems like an obvious choice here, as it seems borrowed from tournament branch prediction schemes. However the problem does have a fundamental difference: unlike caches, the act of testing a branch prediction doesn't change the underlying state of the branch predictor. All branch predictor algorithms in the tournament approximate the same data (a stream of taken/not taken) at different levels, whereas the caches can have highly divergent or almost completely different states (the only thing staying the same is the inputs). This can lead to a potential problem I didn't see addressed in the paper: warming up the cache when switching policies. If you have follower cache lines shared between both policies, when the policy changes the new policy will initially exhibit poor performance because of the cache lines "inherited" from the previous policy. In the worse case, this can lead to back and forth jumping between the two policies.

Finally, it's a bit confusing that the paper doesn't address inserting into a location other than LRU. I would expect there to be 3 critical places in which cache lines can be inserted: MRU, middle, and LRU. This was addressed in the RRIP paper, but it's a bit puzzling that inserting into the middle was never considered.

RRIP:
The RRIP paper gives a nice abstraction of the purpose of cache eviction/insertion policies. Rather than refer to schemes in terms of LRU (like the DIP paper), the RRIP paper defines the RRIP chain which is what LRU and other policies are attempting to approximate.

LRU's fatal flaw seems to be that it puts all of its bets on temporal locality. I think ideally we would have perfect statistical information on which cache lines are the most likely to be re-referenced (based on entire history of hits/misses). Again, the findings of the paper seem to heavily draw from mechanisms used in branch prediction. It makes sense to favor a cache line that is consistently referenced but not referenced very often over a cache line that's only been referenced a couple times recently.

With so many parallels to branch prediction, I'm wondering if we can go further in that direction. RRIP could implement a two level system that uses the global history of memory accesses (past x accesses, hit/misses of those accesses). This two level system could be used to make the set dueling more accurate. For example: "this memory access pattern previously got X% hit rate with Y prediction, maybe we should try Z prediction now", this can maybe make predictions/insertion policies match with their appropriate memory patterns easier.

Finally, despite the RRIP paper making many references to DIP/BIP, it doesn't seem to explain why BRRIP differs from BIP so heavily. BRRIP inserts fairly close to the end of the RRIP chain (RRIV of 2^M - 2), while BIP inserts into the MRU (0 in the context of RRIV).