Saturday, April 30, 2011

Joy (and Lament) of Computer Graphics

Some of my vanilla Computer Science friends took classes in computer graphics this semester. Their reaction to the experience was enjoyable:

"I could see what I was doing on the screen!"

"People who didn't know what I was doing were impressed!"

"I wrote it and then it was there!"

I think this is one of the real joys of doing computer graphics; a deep satisfaction is seeing something (almost) tangible as a result of your efforts, and the instant appeal of great results.

I think this is closely tied to some of the greatest challenges in your computer science: the difficulty of installing automated and repeatable tests in code, and in verifying that large amounts of numerical data are correct. It's very hard to assert specific facts about a graphics system in test code. For example, suppose you want to assert that some particular aliasing artifact doesn't appear. How would you without human eyeballs? Image differences? Machine learning/computer vision (recognize the bad patterns)? Spectral analysis (make sure some frequencies don't show up?)

Similarly, it is much harder to verify the results of a large numerical computation than of a small logical one. Not only are the answers "fuzzier", there many, many, more of them. If you are lucky (as with matrix arithmetic) you could assert some large property about the numbers as a whole (maybe the matrix has a certain rank). If you are also lucky, maybe you could assert some point-wise fact (they are all positive). For anything in between, it's very difficult. Often, as anyone who has spent alot of time in Matlab knows, the best answer is plotting the data. This helps with the here-and-now, but doesn't solve the automated and persistent problem.

I'm not sure what a good solution to this is, if there is one.

Monday, April 25, 2011

Nested Tabs

So I've been spending alot of time doing command-line development on my Ubuntu laptop recently, and I have many different ways to let different tasks share my limited screen-space.

Recently however, I find I've been using them in a pretty consistent way, that help keeps all my work straight. Each level of indirection multiplexes screen-space across a specific direction. I thought I'd share my scheme here:

Workspaces (change with Ctrl-Alt-(LEFT|RIGHT))
----Doc/Browser Programs (change with Alt-Tab)
--------Web Browser Tabs (change with Ctrl-Tab)
--------PDF Viewers (untabbed)
--------File Browsers
----Gnome Terminal (Generally only 1)
--------Terminal Tabs (use for different login sessions)
------------GNU Screen Windows (use for different working dirs)
----------------Vim Tabs (use for conceptually separate tasks in same dir)
--------------------Vim Split-Windows (use for one task that covers multiple files/different parts of the same file)

The only level of nesting where I don't do much multiplexing is running multiple applications on the workspace I use for my Gnome Terminal, not sure if there is room for improvement there.

GTD + Todo.txt

So, I recently read Getting Things Done, and while I'm not sure all of it's points apply to me as a programmer*, I really liked the idea of capturing "open loops". You need to find all those little niggling things that you should be doing and capture them in a place you know where to look, so you can start thinking about what you should be doing, and just do it.

This led me to looking for personal organization solid. I wanted something that would be sync-able across anywhere I use the internet, fast and efficient to edit, and preferably compatible with common formats in case I need to upgrade to a different program in the future.

A few of the alternatives I considered were a plain TODO text-file, or Vim versions of Emacs Org-mode. After a little search however, I found Todo.txt, and I'm really happy with it. Todo.txt uses a plain text-file as it's central database, so it's about as compatible as you could hope for. It however also presents a filesystem like interface that allows efficient adding, listing, and completion, and will be instantly familiar to any Linux user. With the file-system interface, there is almost no friction to creating a new item, which is important for this system to work for me.

Using Dropbox or normal line-based version control, Todo.txt can sync easily across multiple machines. They also provide a paid-for Android app that uses Dropbox, and is easily worth the $1.99 for the convenience. All in all, I think this is a really great and elegant solution.

Saturday, April 23, 2011

Fun With Hadoop

Since I finally wrapped up the mail server, I've moved on to my next final project, our Wifi Heatmap. Part of the server side component of this application is a compilation step that clusters samples of network strengths into estimates of access point location, and then tiles those access points for efficient display.

To make this a little exciting, I decided to use the Apache Hadoop framework to perform this computation in 2 Map-Reduce programs. The first map pass is trivial; it outputs the sample points keyed by the BSSID of the access point. The first Reduce pass is interesting, it is here that we cluster each set of samples with the same BSSID. In the second Map pass, we accomplish the bulk of the tiling by hashing each access points to its tile ID. In the second Reduce pass, we concatenate all access points with the same ID into a downloadable tile that we can then serve to our map viewer on demand.

Tuesday, April 5, 2011

SMTP Server and Unit Tests

Recently, for Software Systems, I had a group project to build an SMTP and POP3 server and client. Milestone 1 was a simple single-machine implementation, Milestone 2 will be a clustered implementation. I was primarily responsible for the SMTP server.

Moreso that most problems I have to solve, this was a really good chance to practice doing software development the "right" way. We had a clear protocol spec, separable parsing/formatting and protocol logic, and a need for configuration in terms of the backing store.

One goal was that almost any line in the server should be runnable under unit tests. We accomplished this by programming almost entirely to simple interface, which let us make dumb mock implementations for test. This was especially valuable in testing the protocol logic, which with less care would have been connected to almost the entire rest of the system. Using a streamPair construct to replace sockets and a Session class to control a single user's connected session, we were even able to test full client-server interactions under unit tests with the overhead of only a single thread creation.

Now that we move on to the clustered implementation, we'll encounter more testing problems in that our tests need to ensure correctness and consistency in a distributed system. We'll be moving to an automated test system that will run long-running multi-client, multi-server tests on every check-in, as opposed to our current system which runs units on every build.