Tuesday, May 10, 2011

Google I/O

On the train back to Philadelphia from the Google I/O Extended
conference in NYC. Props to Google NYC for hosting us for the streaming session and showing us around.

The Google NYC offices in the Port Authority building in Chelsea are
neat. I'm a sucker for the converted warehouse look, so it appealed to
me straight away. They still had alot of hustle and bustle getting the
space put together and remodeled, and expanded into new parts of the
building. One architectural feature I found funny was that their main
common spaces/auditoriums/rec rooms were located in the loading docks
for their (still working) truck elevators.

There was a big focus on Android graphics and HTML5 capabilities at the
sessions they showed us, which was really encouraging. I'm curious
about the capabilities of their RenderScript. It seems like the
documentation and examples are rather sparse, but it is some kind of
OpenGL wrapper + CUDA approximation that allows for high-performance
rendering or compute (simulation, encoding/decoding) that can run on
either the GPU or CPU as is appropriate.

Rather than Dalvik, then use an LLVM intermediate representation that is
targeted to the CPU's or GPU's available at run-time, which reminds me
of the Apple OpenGL emulation system using LLVM. This, plus the fact
that anyone that can is using LLVM as their back-end, really makes me
think I should pick it up. One nice RenderScript feature is that their
compiler automatically generates Java .class files for you that make
binding from your Android app quite painless.

I'm a little sad that Google chose to rebuild this stack rather than
support OpenCL and build a similar abstraction on top of it; I'd be
curious to hear the rationale. I don't think RenderScript will find for
larger software systems (serious physics engines, math libraries)
without more of a cross-platform nature; it's a pity they couldn't have
leveraged such that already exists for OpenCL, which promises alot of
the same benefit of heterogenous platform support.

Another good talk I saw was the Android ProTips. He actually ran over
alot of the same points on watching battery, connectivity, and docking
state that were covered in CIS542, Embedded Systems. He also showed
some really nice tricks using PassiveLocationProvider to take advantage
of Location updates that other apps requested to keep yourself

Really, the number of system events that can be captured
and handled using Intents is really impressive, and one of my favorite
features of Android as a platform. I thought I was really clever when
we used them to capture document opening for SEASPrint, but apparently
that's just best practice.

Another neat Android library I learned about today in passing is
FileObserver, which is a low-level and powerful utility that I wasn't
really expecting. It's just a wrapper for the Linux inotify library,
but it let's you do neat things like the shader preview tool I talked
about last fall.

It's probably not that great a tool for general IPC since Android
applications view of the filesystem is really restricted, and there are
already higher-level tools like Intents to pass media and messages
between applications, but I can see it being used in content development
to push out asset changes to the phone during development for
interactive preview. Noone likes reinstalling an app to see changes.

At the reception after Google I/O, I bumped into one of Google's engineers for
Google Body, which uses alot of WebGL to render some really high resolution
datasets. I'd been curious about streaming 3d assets into the browser, so he
was really great to talk to.

He had some great points about the tradeoffs in streaming, storing, and
decompressing data when your host code runs in Javascript and your
rendering runs in accelerated OpenGL. This is especially difficult
since in Javascript you can't access custom native libraries for
compression/decompression or asset preparation. One option is using
browser builtin's like Gzip or Deflate content-type encoding to handle
your compression for you.

Test Driven Development: by Example

On the way back home Sunday, I also read most of Test Driven
by Kent Beck, who is an author I have heard referenced alot
but never read.

Good quote: "Every programming practice encodes a value system,
explicitly of implicitly"

--Kent Beck

I used a fair bit of TDD on the mail server project, as well as
conformance testing, system testing, stress testing, and a dash of
regression testing. The SMTP protocol logic especially was developed
incrementally and test first.

The book was alright, the first step-by-step example was
a little wandering but eventually enlightening. I actually liked the
end of the book more, where he gives a large review of test patterns, as
well as design patterns and refactorings as applied to TDD.

One of the sections I enjoyed most was "How do you know if you have good
tests?". For me, one of the greatest benefits of TDD isn't that you
are left with tons of test artifacts that you can use for verification,
refactoring, or maintenance. It's that you can continually,
incrementally, and easily fire up and exercise your code base in little

In this section, Kent Beck points out that bad tests can be an
indication of bad design in the production code. If you can't get your
code in a test hardness because of external dependencies of filesystems,
servers, or databases, it isn't the fault of unit-testing as a
discipline; your production code has too many dependencies. Similarly
if you feel you are instantiating 2/3rds of your code base to run one
object. Because unit tests exercise chunks of the codebase separately,
they verify that you really do have a modular design, and provide alot
of motivation to extract interfaces and encapsulted complicated and
concrete implementations.

This spat of reading on software engineering has really gotten me to
thinking about how to backport these techniques to languages less
focused on encapsulation and dynamic polymorphism (read: C). I might do
a series on modern and modular design techniques in plain-old C. It'd
be stuff most professional developers probably know about, but that
wouldn't show up in K&R.

Seven Languages: Io

I took the train from home in Upstate New York to New York City today to
go to the Google I/O streaming sessions tomorrow, and it's a beautiful
day along the Hudson. I took the opportunity to work through the Io
chapter of Seven Languages.

Io is a prototype language, meaning that to create new objects, one just
clones an existing object, called a prototype, rather than instantiating
a class. I found this to be a simple model that makes reflection very
simple and natural. There is no reason to have, for example, Python
meta-classes, when you don't even have classes! To "inherit"
functionality, you clone your base object, add some functionality
dynamically, and then clone from your new prototype.

Another feature of Io I found distinguishing was message passing as a
means to call methods on objects. I had used messages before, but only
in Objective-C. I don't know if I was just doing it wrong, but message
passing in Objective-C basically amounted to normal functions calls with
a weird syntax, less compile-time checks, and named parameters.

Message passing in Io was much more obviously potent. Message passing
in Io is clearly sending a chunk of data to an object which may (or may
not) result in the execution of a method at some point. For example, to
pass parameters in function calls in most languages, the callee
evaluates expressions in the parameters, pushes the arguments and return
address onto the stack, and jumps into the callee.

This means that if you tried to write a function in C like

return_type if(cond, then_code, else_code)

That evaluated condition then returned one of the 2 branches, you would
have big problems with side-effects because both sides of the branch
would be executed before entering the if function.

In Io, the code in both of those clauses is simply passed in as part of
the message arguments. The callee can check the condition then execute
the side that they wish to. This could be done in C by wrapping that
code up in some kinda of data like a function-pointer + context, but in
Io it's automatic. Similarly, message passing can trivially handle
things like co-routines and asynchronous, deferred, or delayed
invocation, because they very nicely bundle up that act of calling a

One annoyance is that there isn't much syntactical support for the
collections classes. They still offer all the expected features as well
as nice transformation operations like map and select, but it still
requires a little more imperative "add this to a list, then this, then
this" rather than really nice declarative collections processing.

Sunday, May 8, 2011

Seven Languages: Ruby

Since I finished my final projects for the semester, I've gone
on quite a reading binge. One book I started working my way
through is Seven Languages in Seven Weeks. This is a neat
introduction to a slew of languages that focuses on their
quirky, unique aspects.

I think the list is Ruby, Io, Scala, Haskell, Prolog, Erlang, and
Clojure, so their is a good mix of functional, declarative,
object-oriented, and actor-based concurrency languages, as well as a
good spread from static to dynamic. In fact, their may be a scatter
plot in the making there.

The book starts with Ruby. I believe Ruby is the easiest of the
languages to learn, but I was still interested because I want to pick
up CoffeeScript, which borrows much from it. I was really impressed
with the interactions between blocks, the collections classes, and the
IO system.

Python supports some inkling of this, but I found the Ruby
support much more fundamental to the language. I've always disliked
writing for loops for the purposes of collections processing, and Ruby
really does away with this. They also support ranges and slices in a
much more unified way that Python

One thing I just saw a nibble of, but which interested me was the
language-integrated support for Regex matching, ala Perl. While some
of the implicit globals for string processing ($. is the line number of
the last input read?) were a little gross, I thought this was really
nice. I'd be interested to see how elegant a more general context-free
grammar parsing framework could be. I couldn't find a way to run Ruby
one-liners from the command-line, which is a pity because I've been
looking for a reason not to learn awk, and this seemed promising.

I was a little thrown off by the highly dynamic nature of Ruby. Ruby
supports things like open classes and overriding method_missing. These
are really powerful, as Seven Languages mentions, for things like
database adapters and XML frameworks, but making them efficient at
runtime seems daunting. It seems like they would really defeat alot of
JIT compilers, that try to do things like find consistent implicit
class structures. Most code I write has fairly stern performance
requirements, so I'm not sure it's the right language for me.

Next, on to Io!

Saturday, May 7, 2011

Clean Code Review

Today I read most of Clean Code by Bob Martin. The forward describes the book as being divided into 3 sections: guidelines, case studies, and "code smells" (negative heuristics). It also clearly cautions again reading sections 1 and 3 and skipping 2. Clearly, I went ahead and did just that. If that invalidates any of my review, so be it. I felt it was valuable nonetheless.

One of this main tenets, which I find myself agreeing with, is that functions and classes should be small. And then they should be smaller. While it sounds silly, I think applied consistently and combined with great identifiers, it can make code really simple to read.

For example, he recommends against nested scopes inside functions. This means that while, if, and fors should consist only of a single line, which is either a statement, or a function call. Great names for these functions leave little surprise about what they do, and the contents are basically what you would expect.

This interplays nicely with the principle he espouses of only stepping down 1 level of abstraction in a function call. Each function can deal with a single responsibility by stepping down 1 level of abstraction and perform a few simple non-nested steps. Any more complication, or deeper layers of stepping, requiring extracting another function.

One benefit of all this function extracting, which I hadn't thought of before, is that it is fruitful ground for extracting classes. Suppose you have a longish function that performs a sequence of steps on some local data. You extract those steps out into methods and make the local data instance variables. The original function now calls the extracted methods.

However, you now have a little wad of instance data which is accessed only by a few methods. This is perfect ground for extracting a new class and moving those methods to it. I've used this a few times in my own code without really thinking about it, and been really pleased with the results.

Overall it's really nice book, and a very fast read (especially if you skip the case studies). While the author is clearly a heavy Java developer, I still felt like I got good insight out of it, and I am definitely not a big Java guy.

Monday, May 2, 2011

The Passionate Programmer

Recently I've been reading The Passionate Programmer by Chad Fowler. It's been a fast and entertaining read so far, and borrows a lot from Peopleware. It has a really interesting emphasis on personal and career development, rather than how to complete a project or task successfully.

One point he drives is to really focus on your business value as a programmer. This means really being aware of how much you cost and how much value you produce. As a technical professional who loves what they do, it is really easy to become preoccupied with technical details and problems. However, this doesn't necessarily contribute to the bottom like or to the customer's/manager's perception of that contribution.

Along with this is a call against technical elitism and specialization. "I'm only X and Y expert and it's not my job to handle Z" doesn't help you do your job, which is to solve those problems if they need to be solved. I feel like this extends not only to different technical areas but also to non-technical areas like responding to customers or understanding your business's finance.

Saturday, April 30, 2011

Joy (and Lament) of Computer Graphics

Some of my vanilla Computer Science friends took classes in computer graphics this semester. Their reaction to the experience was enjoyable:

"I could see what I was doing on the screen!"

"People who didn't know what I was doing were impressed!"

"I wrote it and then it was there!"

I think this is one of the real joys of doing computer graphics; a deep satisfaction is seeing something (almost) tangible as a result of your efforts, and the instant appeal of great results.

I think this is closely tied to some of the greatest challenges in your computer science: the difficulty of installing automated and repeatable tests in code, and in verifying that large amounts of numerical data are correct. It's very hard to assert specific facts about a graphics system in test code. For example, suppose you want to assert that some particular aliasing artifact doesn't appear. How would you without human eyeballs? Image differences? Machine learning/computer vision (recognize the bad patterns)? Spectral analysis (make sure some frequencies don't show up?)

Similarly, it is much harder to verify the results of a large numerical computation than of a small logical one. Not only are the answers "fuzzier", there many, many, more of them. If you are lucky (as with matrix arithmetic) you could assert some large property about the numbers as a whole (maybe the matrix has a certain rank). If you are also lucky, maybe you could assert some point-wise fact (they are all positive). For anything in between, it's very difficult. Often, as anyone who has spent alot of time in Matlab knows, the best answer is plotting the data. This helps with the here-and-now, but doesn't solve the automated and persistent problem.

I'm not sure what a good solution to this is, if there is one.

Monday, April 25, 2011

Nested Tabs

So I've been spending alot of time doing command-line development on my Ubuntu laptop recently, and I have many different ways to let different tasks share my limited screen-space.

Recently however, I find I've been using them in a pretty consistent way, that help keeps all my work straight. Each level of indirection multiplexes screen-space across a specific direction. I thought I'd share my scheme here:

Workspaces (change with Ctrl-Alt-(LEFT|RIGHT))
----Doc/Browser Programs (change with Alt-Tab)
--------Web Browser Tabs (change with Ctrl-Tab)
--------PDF Viewers (untabbed)
--------File Browsers
----Gnome Terminal (Generally only 1)
--------Terminal Tabs (use for different login sessions)
------------GNU Screen Windows (use for different working dirs)
----------------Vim Tabs (use for conceptually separate tasks in same dir)
--------------------Vim Split-Windows (use for one task that covers multiple files/different parts of the same file)

The only level of nesting where I don't do much multiplexing is running multiple applications on the workspace I use for my Gnome Terminal, not sure if there is room for improvement there.

GTD + Todo.txt

So, I recently read Getting Things Done, and while I'm not sure all of it's points apply to me as a programmer*, I really liked the idea of capturing "open loops". You need to find all those little niggling things that you should be doing and capture them in a place you know where to look, so you can start thinking about what you should be doing, and just do it.

This led me to looking for personal organization solid. I wanted something that would be sync-able across anywhere I use the internet, fast and efficient to edit, and preferably compatible with common formats in case I need to upgrade to a different program in the future.

A few of the alternatives I considered were a plain TODO text-file, or Vim versions of Emacs Org-mode. After a little search however, I found Todo.txt, and I'm really happy with it. Todo.txt uses a plain text-file as it's central database, so it's about as compatible as you could hope for. It however also presents a filesystem like interface that allows efficient adding, listing, and completion, and will be instantly familiar to any Linux user. With the file-system interface, there is almost no friction to creating a new item, which is important for this system to work for me.

Using Dropbox or normal line-based version control, Todo.txt can sync easily across multiple machines. They also provide a paid-for Android app that uses Dropbox, and is easily worth the $1.99 for the convenience. All in all, I think this is a really great and elegant solution.

Saturday, April 23, 2011

Fun With Hadoop

Since I finally wrapped up the mail server, I've moved on to my next final project, our Wifi Heatmap. Part of the server side component of this application is a compilation step that clusters samples of network strengths into estimates of access point location, and then tiles those access points for efficient display.

To make this a little exciting, I decided to use the Apache Hadoop framework to perform this computation in 2 Map-Reduce programs. The first map pass is trivial; it outputs the sample points keyed by the BSSID of the access point. The first Reduce pass is interesting, it is here that we cluster each set of samples with the same BSSID. In the second Map pass, we accomplish the bulk of the tiling by hashing each access points to its tile ID. In the second Reduce pass, we concatenate all access points with the same ID into a downloadable tile that we can then serve to our map viewer on demand.

Tuesday, April 5, 2011

SMTP Server and Unit Tests

Recently, for Software Systems, I had a group project to build an SMTP and POP3 server and client. Milestone 1 was a simple single-machine implementation, Milestone 2 will be a clustered implementation. I was primarily responsible for the SMTP server.

Moreso that most problems I have to solve, this was a really good chance to practice doing software development the "right" way. We had a clear protocol spec, separable parsing/formatting and protocol logic, and a need for configuration in terms of the backing store.

One goal was that almost any line in the server should be runnable under unit tests. We accomplished this by programming almost entirely to simple interface, which let us make dumb mock implementations for test. This was especially valuable in testing the protocol logic, which with less care would have been connected to almost the entire rest of the system. Using a streamPair construct to replace sockets and a Session class to control a single user's connected session, we were even able to test full client-server interactions under unit tests with the overhead of only a single thread creation.

Now that we move on to the clustered implementation, we'll encounter more testing problems in that our tests need to ensure correctness and consistency in a distributed system. We'll be moving to an automated test system that will run long-running multi-client, multi-server tests on every check-in, as opposed to our current system which runs units on every build.

Tuesday, February 22, 2011

GPU Programming Assignment 2

I'm the teaching assistant for CIS 565: GPU Programming and Architecture taught by Patrick Cozzi this semester , and I'm handling the homework assignments this semester. CIS 565 covers both real-time graphics and GPU computing using CUDA. For the real-time graphics portion, we switched languages from Cg to GLSL this year, so I took the time to rewrite all the sample code using GLSL and proper OpenGL 3.3.

The first programming assignment introduced students to basic shader programming. We had 3 different components, showing the students how to create a basic vertex shader, a basic fragment shader, and how to perform image processing in a shader.

The second assignment had some more interesting problems. The first component had the students create a globe with many layered textured effects. This showed how artists can use textures to drive attributes behind basic color, and how layered effects can create complex visuals. The second component was a screen-space ambient occlusion algorithm. The main goals for this problem was to introduce the student to deferred lighting and rendering, and show them the effects and performance of different sampling schemes. Sampling is an important topic and applies to students in many different areas, such as graphics, vision, signal processing, and media encoding/decoding.

The homeworks can be viewed here: CIS 565 Homework

Wednesday, February 2, 2011


We recently competed at the PennApps Mobile development competition, with the same team as the last competition this fall. We didn't place in the competition, but I was really happy with the quality of our final product.

The theme of our competition was 'serendipity'. Our application was a generalization of Snapple Facts for arbitrary products. Each product you scan will give you a random fact from a category specific to that product.

Weuse Barcode Scanner to recognize barcodes and an online UPC database to look up the corresponding products. We built our fact database by scraping the Wikipedia Did-You-Know archives, and also display corresponding images scraped from Google search results. My girlfriend Lauren built the whole comic-book-themed front end and set up the image animation.

Overall, it's a fun and cute little application that makes use of some neat technologies. Check out my demo video and a download link below.

FactSnap! Market Page

Bank Conflict Prediction

A fun final project I recently worked on for our Computer Architecture class here at Penn was a simulation of cache banking. Banking is a scheme to allow "multi-porting" of cache (or registers, or memory, etc.). True multi-porting, when a value can be read more than once in a given cycle, has great costs in terms of price, area, and latency. It requires additional transistors per data cell and doubles the read and data lines.

Banking is a much cheaper scheme wherein the cache is divided into multiple static segments, which are interleaved by memory words. Simultaneous memory accesses to distinct static segments, called banks, can proceed simultaneously. However, if there is a structural hazard and multiple operations try to access the same bank at the same time, only one of them may proceed and the other has to be canceled and restarted. This problem is called a bank conflict. Generally, banked memories will have faster access latency than multiported memory, but there is the chance of a bank conflict. Even worse, certain access patterns, such as strided array access by a multiple of the number of banks, can lead to degenerate bank conflicts.

Our group explored methods of bank conflict avoidance. In this scheme, at issue, we can predict that instructions will incur bank conflicts and delay them. This avoids a restart penalty and allows useful instructions to be issued in that slot. We used a number of predictors based on statistical properties of instructions and experience with way predictors and store-set predictors

One difficulty of prediction is that the bank a memory operation will need to access is not determined until address computation. Our first practical predictor got around this by storing a table that predicts which bank a given PC will access. If a prospective memory operation is predicted to access a bank that another issued operation is predicted to access, we delay accessing the prospective instruction. This simple strategy was already fairly effective.

Our best approach however was based on store-set technology. We simulated an idealistic and a realistic predictor. The idealistic predictor tracked the full set of all previous operations (by PC) with which each operation had conflicted. This can be thought of as building edges between PCs in a graph of conflicts. The idealistic predictor maintains sets of conflicting nodes via a compact union-find data structure. These can be thought of as connected components in the graph of conflicts. The realistic predictor actually gives better performance than the idealistic predictor, since it recognizes sets of conflicts. A good predictor for a 32kB-64kB L1 cache takes around 4kB.

You can view the report below for more details and experimental data.

Report on Bank Conflict Prediction