Tuesday, February 22, 2011

GPU Programming Assignment 2

I'm the teaching assistant for CIS 565: GPU Programming and Architecture taught by Patrick Cozzi this semester , and I'm handling the homework assignments this semester. CIS 565 covers both real-time graphics and GPU computing using CUDA. For the real-time graphics portion, we switched languages from Cg to GLSL this year, so I took the time to rewrite all the sample code using GLSL and proper OpenGL 3.3.

The first programming assignment introduced students to basic shader programming. We had 3 different components, showing the students how to create a basic vertex shader, a basic fragment shader, and how to perform image processing in a shader.

The second assignment had some more interesting problems. The first component had the students create a globe with many layered textured effects. This showed how artists can use textures to drive attributes behind basic color, and how layered effects can create complex visuals. The second component was a screen-space ambient occlusion algorithm. The main goals for this problem was to introduce the student to deferred lighting and rendering, and show them the effects and performance of different sampling schemes. Sampling is an important topic and applies to students in many different areas, such as graphics, vision, signal processing, and media encoding/decoding.

The homeworks can be viewed here: CIS 565 Homework

Wednesday, February 2, 2011


We recently competed at the PennApps Mobile development competition, with the same team as the last competition this fall. We didn't place in the competition, but I was really happy with the quality of our final product.

The theme of our competition was 'serendipity'. Our application was a generalization of Snapple Facts for arbitrary products. Each product you scan will give you a random fact from a category specific to that product.

Weuse Barcode Scanner to recognize barcodes and an online UPC database to look up the corresponding products. We built our fact database by scraping the Wikipedia Did-You-Know archives, and also display corresponding images scraped from Google search results. My girlfriend Lauren built the whole comic-book-themed front end and set up the image animation.

Overall, it's a fun and cute little application that makes use of some neat technologies. Check out my demo video and a download link below.

FactSnap! Market Page

Bank Conflict Prediction

A fun final project I recently worked on for our Computer Architecture class here at Penn was a simulation of cache banking. Banking is a scheme to allow "multi-porting" of cache (or registers, or memory, etc.). True multi-porting, when a value can be read more than once in a given cycle, has great costs in terms of price, area, and latency. It requires additional transistors per data cell and doubles the read and data lines.

Banking is a much cheaper scheme wherein the cache is divided into multiple static segments, which are interleaved by memory words. Simultaneous memory accesses to distinct static segments, called banks, can proceed simultaneously. However, if there is a structural hazard and multiple operations try to access the same bank at the same time, only one of them may proceed and the other has to be canceled and restarted. This problem is called a bank conflict. Generally, banked memories will have faster access latency than multiported memory, but there is the chance of a bank conflict. Even worse, certain access patterns, such as strided array access by a multiple of the number of banks, can lead to degenerate bank conflicts.

Our group explored methods of bank conflict avoidance. In this scheme, at issue, we can predict that instructions will incur bank conflicts and delay them. This avoids a restart penalty and allows useful instructions to be issued in that slot. We used a number of predictors based on statistical properties of instructions and experience with way predictors and store-set predictors

One difficulty of prediction is that the bank a memory operation will need to access is not determined until address computation. Our first practical predictor got around this by storing a table that predicts which bank a given PC will access. If a prospective memory operation is predicted to access a bank that another issued operation is predicted to access, we delay accessing the prospective instruction. This simple strategy was already fairly effective.

Our best approach however was based on store-set technology. We simulated an idealistic and a realistic predictor. The idealistic predictor tracked the full set of all previous operations (by PC) with which each operation had conflicted. This can be thought of as building edges between PCs in a graph of conflicts. The idealistic predictor maintains sets of conflicting nodes via a compact union-find data structure. These can be thought of as connected components in the graph of conflicts. The realistic predictor actually gives better performance than the idealistic predictor, since it recognizes sets of conflicts. A good predictor for a 32kB-64kB L1 cache takes around 4kB.

You can view the report below for more details and experimental data.

Report on Bank Conflict Prediction