Sample report 1:


Well, I finally properly know what RRIP (Re-reference Interval Prediction) is, as well as it's variants, and it's inspiration from the original LIP/BIP/DIP replacement policies. Anyway, I may as well give a short summary for you, followed by the more important "questions and insights":
 
Summary of DIP: The first paper, which described the LIP, BIP, and DIP replacement policies, provides a few useful and more broadly applicable ideas:
LRU Insertion Policy - Inserting into the LRU position instead of the MRU position allows for the existing working set in the cache to be preserved, which is definitely useful in contexts where the working set is larger than the cache size and would otherwise cause thrashing. However, it does not respond to working set changes effectively, which is bad for working sets which DO fit in the cache.
Bimodal Insertion Policy - By simply slightly complicating LIP and occasionally (eg, randomly) inserting into the MRU position, you allow for slow changes to the working set. This allows for adaptation if the working set changes while also being resistant to thrashing, but still causes unreasonable overhead for working sets which fit into the cache comfortably or change rapidly.
Dynamic Insertion Policy - An insertion policy which just chooses between plain LRU and BIP, based on which one performs better. The paper also provides a nice way of determining which policy appears to be performing better, via...
Set Dueling - As opposed to the much more heavyweight method of keeping entirely separate "sampling caches" or "auxiliary tag directories" which you run both insertion policies on, you simply designate some sets of the cache to be LRU, some to be BIP, and then the rest of the sets use whichever insertion policy appears to be working better. This significantly saves on extra hardware overhead, and is relatively robust.
 
Summary of DRRIP: This is an extension of the policies provided in the above paper, though the theory is formulated with a little more nuance via Re-Reference Interval Prediction. Instead of just statically predicting that the re-reference intervals of all lines in the cache are "near-immediate" or "distant", RRIP adds an extra counter which reflects the predicted re-reference interval of the current line. There are a few different ways proposed for doing the actual prediction:
SRRIP - In Static Re-reference Interval Prediction, we just always assume that cache hits imply a "near-immediate" re-reference interval, and that new lines have a "very long" re-reference interval; we always remove lines with the greatest predicted re-reference interval, updating this prediction by "aging" all lines when doing an eviction. This method has the advantage of preserving frequently-used working sets an evicting infrequently-used cache lines, much like LRU, but it's greater level of prediction gives it scan-resistance, by preventing thrashing due to brief sections of distance re-reference memory accesses. However, it still is susceptible to thrashing, much as LRU is, if the working set is moderately larger than the cache size.
BRRIP - An adaptation of BIP, Bimodal Re-reference interval prediction assigns most blocks the "distant" re-use prediction, and a few the "long" re-use prediction; this is through similar logic as BIP, and allows for the preservation of the some of the working set (particularly, the elements which are randomly assigned "long" as the prediction).
DRRIP - A "dynamic" insertion policy which chooses dynamically betwen SRRIP (which is good for being scan-resistant) and BRRIP (which is thrashing-resistant), using set dueling as described above.
 
Main Questions & Insights:
 
The most important insights these papers provide, other than the algorithms themselves of course, are (in my mind):
Phrasing cache policies via re-reference interval prediction problems. This isn't new, of course, but by applying the logic behind Belady's algorithm, we have a simple way of improving our cache policies: just improve the re-reference interval predictions.
Set Dueling is a pretty clever way of just repurposing some existing cache sets to allow for the dynamic benchmarking of various replacement policies, which allows for dynamic algorithms such as DIP and DRRIP to provide better overall performance by letting them react to changes in the working set without a large amount of extra hardware.
Probabilistic methods such as sampling can be potentially powerful tools, as shown with the Bimodal Insertion Policy.
 
A lot of heuristic assumptions show up in the papers above, for example in the choice for the LIP to specifically insert into the LRU position (why not the position directly before that), or the choice of prediction in SRRIP (why a prediction of "long" for new entries), or the two choices between "distant" and "long" for BRRIP. Creating algorithms which can adapt to inputs without having to have very restrictive assumptions would be a lovely thing, though it's also a very tough thing, and perhaps something Machine Learning can aid with.
 
It appears that a lot of these solutions, the "dynamic" ones in particular, are simply encoding methods of doing simple phase prediction, in this case for memory access patterns. I'm curious to look into how having a potentially powerful phase predictor could affect caching policies, and what advantages or disadvantages it might offer over existing methods like sampling or set dueling.
 
As an aside, I am curious as to what replacement policies current modern processors use? LRU/NRU, still? Are there practical problems with proposed solutions that prevent uptake?
 
Furthermore, with the advent of the L3 cache, how does the dynamics between cache levels change? The L2 cache is no longer the LLC (Last level cache), so is there now a different way it should be treated than the new last-level cache?