A fun final project I recently worked on for our Computer Architecture class here at Penn was a simulation of cache banking. Banking is a scheme to allow "multi-porting" of cache (or registers, or memory, etc.). True multi-porting, when a value can be read more than once in a given cycle, has great costs in terms of price, area, and latency. It requires additional transistors per data cell and doubles the read and data lines.
Banking is a much cheaper scheme wherein the cache is divided into multiple static segments, which are interleaved by memory words. Simultaneous memory accesses to distinct static segments, called banks, can proceed simultaneously. However, if there is a structural hazard and multiple operations try to access the same bank at the same time, only one of them may proceed and the other has to be canceled and restarted. This problem is called a bank conflict. Generally, banked memories will have faster access latency than multiported memory, but there is the chance of a bank conflict. Even worse, certain access patterns, such as strided array access by a multiple of the number of banks, can lead to degenerate bank conflicts.
Our group explored methods of bank conflict avoidance. In this scheme, at issue, we can predict that instructions will incur bank conflicts and delay them. This avoids a restart penalty and allows useful instructions to be issued in that slot. We used a number of predictors based on statistical properties of instructions and experience with way predictors and store-set predictors
One difficulty of prediction is that the bank a memory operation will need to access is not determined until address computation. Our first practical predictor got around this by storing a table that predicts which bank a given PC will access. If a prospective memory operation is predicted to access a bank that another issued operation is predicted to access, we delay accessing the prospective instruction. This simple strategy was already fairly effective.
Our best approach however was based on store-set technology. We simulated an idealistic and a realistic predictor. The idealistic predictor tracked the full set of all previous operations (by PC) with which each operation had conflicted. This can be thought of as building edges between PCs in a graph of conflicts. The idealistic predictor maintains sets of conflicting nodes via a compact union-find data structure. These can be thought of as connected components in the graph of conflicts. The realistic predictor actually gives better performance than the idealistic predictor, since it recognizes sets of conflicts. A good predictor for a 32kB-64kB L1 cache takes around 4kB.
You can view the report below for more details and experimental data.
Report on Bank Conflict Prediction
No comments:
Post a Comment