Thursday, May 22, 2014

I worry about the dangers I am inviting them into.

I'm a Computer Science professor. My year has a rhythm. The academic calendar is a metronome, tapping out the weeks of fall and spring semesters.

Week eight is why I love my job. Spring break bliss. One week of autonomous calm. I choose to do whatever I need to do to feel caught up, calm, and successful. I must muscle through week seven to earn week eight. The semester is almost halfway over, so keeping time with the syllabus despite snow days and daycare inservice and impetigo and a traveling husband has become toxic. All through the week, a line from Bill Murray's Bob Wiley gets echoed by 20-year old voices, "Gimme, gimme, gimme, I need, I need, I need!"

Every week seven, a lone student wants to meet with me on Friday afternoon. This is the Friday before Spring Break, when the parking is sparsely littered by owners who are already away to vacations in Colorado and California. This student has bad news that cannot wait. Maybe, "I hate linear algebra" or "my ex-boyfriend is stalking me and I don't feel safe."

Last semester, "I don't want to major in computer science anymore."

Why not?

- I don't want to program all the time.
- I don't have any side coding projects like everybody else does.
- I have to work really hard just to get it to compile.
- I don't want to code all the time for a job.

Instead:
- I like to talk to people and get things done.
- I like to turn ideas into reality.
- I like to hang out with my friends.

A blow. Though, I have heard this speech before many times, often from students who are not doing well. Sometimes from women that have engineer moms and dads.

But this day, the speech has a new sting to it. I'm talking to an Asian guy who has been doing just fine in his classes. He looks like he belongs. He codes like it, too.

My field is doing something wrong.

Most folk know there is something wrong with the ways by which computing culture accepts and retains members to the field of computer science. I see it everyday in my classes. I gave an exam today. 36 students. They quietly flipped the pages, stressing over 5-point problems, and I tallied up diversity of the room. Three women. Two hispanics. A guy from Kuwait. And 32 white guys.

In frustration, I want to yell out, "Where is everybody else!? Why don't other people feel like they would belong here?"

I know why.

No. That's too bold. I feel like I know parts of the why.

A feeling of belongingness can be very hard to find in this field. Even for the white guys, the dominant demographic in the US, a feeling of belonging is not guaranteed. Six years ago I did a survey of computer science majors at University of Illinois [1]. In it, I asked, "Are you a typical computer scientist?" One guy -- almost archetypal in his CS-itude with his white skin and shaggy hair and love of comic books and functional programming languages and his quiet-but-kind math-major girlfriend said -- "No. I cook."

Computer science is so narrow to this guy that cooking food makes him an oddball.

Perhaps I understand how he feels. At least, I know why I don't feel like I belong here, after twenty years being in it. Despite my Sparc and MIPS and Intel and Linux and Apache and C and VxWorks and Assembly and JavaScript, I still cannot tolerate the nerd-machismo, the disgusting one-upmanship of REAL programming languages or REAL operating systems. After twenty years, it still stings when folk assume I am the HR lady or the secretary or the wife. Or when a previously-trusted colleague says, "I think you are just having a female issue."

Most of the time, I don't really want to be in this field either. But I put aside my own hesitance and follow through on what feels like a moral obligation to expose the youth to my field. I participate in summer camps to teach young girls how to write video games. I choreograph sorting algorithms and dance them out for parents and students during visit weekends. I host campus visits for middle school girls so that they can see the inside of a research lab. I create comfy microcosms for them . I let them play, then offer, "Isn't this neat? Don't you want this, too?"

I feel conflicted about this work. My field is not always neat and comfy and playful. It has its difficulties. I worry about the dangers that I am inviting them into. I worry that behind my work of recruiting is a naive and silly hope, "If I tell 10,000, then 10 will come."

And when that handful arrives to the perimeters of my field, who will do the other half of the job? Who will do the work of nurturing, of retention? It won't likely be me, their first professor or their first boss. Someone else must do the work of reinforcing to them again and again that, "Yes, you belong here." 

Such work is not just women's work for the sake of women. This work is everyone's work.

I desperately want my field to become diverse -- a place where lots of different kinds of people are tolerating each other in healthy engaging ways. To achieve this requires the work of recruitment, middle school girls and summer camps. But it also requires an improved look at retention. All members of my field must be educated. To use inclusive language. To use blind resumes. To think about users who are deaf or gamers who might not care for the metal thong. To put aside strong opinions about race or gender or Haskell or Ruby. To leave one's baggage in the car and be a professional in the field.

My field.

[1] T. L. Crenshaw, E. W. Chambers, H. Metcalf, U. Thakkar. A case study of retention practices at the University of Illinois at Urbana-Champaign. In Technical Symposium on Computer Science Education (SIGCSE 2008). March 2008.

Saturday, February 1, 2014

I wore tinsel in my hair all day on Friday.

It was because Kim threw tinsel at me during the faculty meeting.

Which was, because, at the faculty meeting, it was announced that I got tenure.

I wore tinsel in my hair all day on Friday, so that, when people saw me, they would ask, "What's with the tinsel?"  And I would reply, "Oh that's tenure tinsel."

I taught my class at 12:30 pm, and my students asked me about the tinsel, and I said, "Six years ago, this university hired me.  And when they hired me, they said, 'In six years, we are going to make a decision.  Either we'll hire you forever, or we'll fire you.'  And today they told me the answer.  I got tenure."

And everybody clapped and it was very nice.

But it was bittersweet for I missed all the people that should have been in that room to hear the news.

  • My husband.
  • My parents.
  • My brother.  When we were little, we'd set up all our dolls and stuffed animals into a little school room and I would teach them arithmetic.
  • The students who had to suffer all the bad lectures I gave until I got good at this job.  Anastasia Borok and that awful 1-credit C++ course.  Steven Beyer and Emily McKaig and Carolyn Farris and Chris Harvey who, among others, had to muscle through my explanations and re-explanations of pointers in C.
  • Kitty.
  • The people who suffered through graduate school with me.
  • My undergraduate advisor, Dr. Sarwar.


Really, they should have been able to hear the news first.  It's because of all them that I got tenure today.  But I know that they will clap for me a little when they finally do hear that I wore tinsel in my hair on Friday.

Tuesday, August 6, 2013

A long post about Git

This is a long post about Git, an open source, software configuration management system.  It allows multiple developers to manage the changes to a source base.  With a tool like Git, multiple developers can make changes to code, merge their changes with the contributions from other developers, and control which versions of the code are for development, experimenting, testing, and production.

Learn more about Git at their 'About' page.

The remainder of this post introduces new Git users to creating and managing their own local and remote repositories.

Part 1.  The obligatory part about creating a git repository on a local machine and committing local changes to it.

Let us suppose that I have a small bit of code I have been working on in a directory called robots.  The directory has four files in it:

robots$ ls
README.txt main.c makefile robots.out

I am ready to create my first git repository for this code:

robots $ git init
Initialized empty Git repository in /Users/crenshaw/robots/.git/

Now I have a local git repository for my code, but it currently has no files in it.   

robots $ git status
# On branch master
#
# Initial commit
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
# README.txt
# main.c
# makefile
# robots.out
nothing added to commit but untracked files present (use "git add" to track)

I need to tell my git repository which files I would like to track.  In general, I will never want to keep track of object files or executable files, so I'm going to configure git to always ignore files ending in .o and .out.  To do so, I create a .gitignore file in the robots directory.  This file contains a set of regular expressions that state the files that, in general, I never want git to track:

*.o
*.out

Now when I use the command git status to look at the status of my repository, it does not report that robots.out is an untracked file.  Git is ignoring the file.

robots $ git status
# On branch master
#
# Initial commit
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
# .gitignore
# README.txt
# main.c
# makefile
nothing added to commit but untracked files present (use "git add" to track)

I want to track all these files.  So I use the git add command to start tracking the files.

robots $ git add .
robots $ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
# new file:   .gitignore
# new file:   README.txt
# new file:   main.c
# new file:   makefile
#

They are now being tracked, but they haven't been committed.  To commit a file means to take a snapshot of that file and save that snapshot in the repository.  In order to commit a file in git, I must stage it.  I want to commit all of the files that are listed in the status report above.  So I can stage and commit my files all at once.  I use git commit to commit the files, and the -a flag to stage all of the files that are currently being tracked and have changes.  I use the -m flag to write a useful message about the files being committed.

robots $ git commit -a -m "The initial commit of the robots example source code for Software Engineering."
[master (root-commit) 75bca81] The initial commit of the robots example source code for Software Engineering.
 4 files changed, 32 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 README.txt
 create mode 100644 main.c
 create mode 100644 makefile

Now git reports that everything is up to date.

robots $ git status
# On branch master
nothing to commit, working directory clean

I'm going to make a change to a file.  After I do so, I'm ready to commit my change.

robots $ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified:   main.c
#
no changes added to commit (use "git add" and/or "git commit -a")

One nice thing about Git is that it often gives you helpful suggestions on commands that you might want to use.  See the note about using git commit -a.  That's what I want to do.  Again, note my useful commit message.  

robots $ git commit -a -m "Added additional text to the output of main.c.  Updated the comments in the header to reflect this change.  Program now states 'I really love robots.'"
[master 7e6aa31] Added additional text to the output of main.c.  Updated the comments in the header to reflect this change.  Program now states 'I really love robots.'
 1 file changed, 3 insertions(+), 3 deletions(-)

Part 2. In which I need to fix my commit comments because, in a hot-headed moment of frustration, I wrote an f-bomb in the commit message and that's not particularly useful.

Yeah, I have a temper.  In trying to get something quickly committed just before lunch, I was angry and made the following commit:

robots $ git commit -a -m "JUST F***ING WORK."
[master ca122b3] JUST F***ING WORK.
 5 files changed, 59 insertions(+), 5 deletions(-)
 create mode 100644 robot.c
 create mode 100644 robot.h

Nope.  Not useful.    

Git has an amend command that allows you to change the commit message of the most previous commit.

robots $ git commit --amend

This pulls up an editor that allows me to alter the message.  By default, this editor is vi, so hit the letter i on the keyboard to go into edit mode.  When you are done editing, hit : + w + q + Enter.

Why was I mad?  I always forget that when adding new files to the repository, I must use git add . to indicate to git that I want to track the new files.  Notice the two new files in my most recent commit, robot.c and robot.h.  I had to add them using git add . and then I staged them using the -a flag in git commit -a -m < message>.


Parts 1 and 2 make it seem like git is easy to use.  And it can be, if I am working on code all by myself on a single machine.  Git gets trickier to use when I open up this repository by maintaining a remote repository and inviting multiple developers to add features to my code. 

*

Part 3. In which I create a new remote repository and copy my local repository to it so that the world may download my awesome open source.

I open a browser and point it at 


I click on the text near the little green plus at the bottom left of the page to “Create a new project.”   I fill out the form, making sure to choose Git as the version control system.  I choose GNU GPL v2 as my license, because that is the one license I have read the whole way through that makes sense to me.

After I fill out the form, I click on the button “Create project”.

Now I am at the newly-created repository’s home page.  I click on the Administer tab and select "Sharing" so that I can add my friend B. as a Project committer.  I click on the button “Save changes.”  My browser goes to the main page of my new empty remote repository.  The URL of this page is the location of my remote repository.


I want to copy my local repository to my remote one.  This makes my work public; since its hosted on Google code, anyone can download it but only people I specify under "Sharing" may alter the code.

The act of copying my local repository to a remote one is called pushing.

To set up push functionality for my repository, I first configure my push settings on my local machine:

robots $ git config --global push.default simple

This means that when I invoke the command git push, Git will only copy my current branch to my remote repository.  There are more details on this configuration in the Git docs.

Now I want to tell my local repository the location of the remote repository I want to use whenever I invoke git push:

robots $ git remote add origin https://code.google.com/p/i-love-robots

I can tell that this command worked by looking in the file ~/robots/.git/config.  It now contains:

[remote "origin"]
        url = https://code.google.com/p/i-love-robots
        fetch = +refs/heads/*:refs/remotes/origin/*


Finally, I am going to push my code to the remote repository.  This is the first time I am pushing to the remote, so I need to configure git so that it considers this remote repository my "upstream branch."

$ git push --set-upstream origin master
Counting objects: 16, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (14/14), done.
Writing objects: 100% (16/16), 2.33 KiB, done.
Total 16 (delta 4), reused 0 (delta 0)
remote: Scanning pack: 100% (16/16), done.
remote: Storing objects: 100% (16/16), done.
remote: Processing commits: 100% (3/3), done.
To https://code.google.com/p/i-love-robots
 * [new branch]      master -> master
Branch master set up to track remote branch master from origin.

I go to my Google Code site for my project and I can see my code.  So can my parents.  Or future employers.


Part 4.  I make some changes locally, and when I try to push, I get ! [rejected].

I add one more feature to my program and commit it locally.  Note my useful commit message.

robots $ git commit -a -m "Added command line parameters to the software.  The usage is now './robots.out <number>' where number is the number of times the program prints the word 'really'."
[master 0680720] Added command line parameters to the software.  The usage is now './robots.out <number>' where number is the number of times the program prints the word 'really'.
 3 files changed, 34 insertions(+), 7 deletions(-)

I use the git status command to asses the state of my local repository versus my remote one.

robots $ git status
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#   (use "git push" to publish your local commits)
#
nothing to commit, working directory clean

Again, Git is trying to help me out.  It reports that I am "ahead" of the remote repository, and that I can use git push to copy my local changes to the remote repository.  So I do it; this time I can just simply use the git push command.  

$ git push
To https://code.google.com/p/i-love-robots
 ! [rejected]        master -> master (fetch first)
error: failed to push some refs to 'https://code.google.com/p/i-love-robots'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first merge the remote changes (e.g.,
hint: 'git pull') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

That did not work.  It looks like my friend B. already committed some changes to the remote repository.  Before I can push my changes, I must synchronize with the remote repository; I must get my friend's changes.  I do this using git pull.  If my friend B. and I altered different parts of the source, git will automatically merge all of our changes.  Then I can commit the merge locally and push my latest commits to the remote.

Unfortunately, it looks like B. and I changed the same part of the code, and we have a merge conflict:

robots $ git pull
remote: Counting objects: 3, done.
remote: Finding sources: 100% (3/3), done.
remote: Total 3 (delta 0)
Unpacking objects: 100% (3/3), done.
From https://code.google.com/p/i-love-robots
   f62e31a..2487a6a  master     -> origin/master
Auto-merging main.c
CONFLICT (content): Merge conflict in main.c
Automatic merge failed; fix conflicts and then commit the result.

I have to fix the merge conflict.  The git status command tells me where the conflicts are:

robots $ git status
# On branch master
# Your branch and 'origin/master' have diverged,
# and have 1 and 1 different commit each, respectively.
#   (use "git pull" to merge the remote branch into yours)
#
# You have unmerged paths.
#   (fix conflicts and run "git commit")
#
# Unmerged paths:
#   (use "git add <file>..." to mark resolution)
#
# both modified:      main.c
#
no changes added to commit (use "git add" and/or "git commit -a")

The conflict is in main.c.  When I open the file, the conflicts are marked.  The code after the keyword <<<<<<< HEAD is the code that is checked into the remote repository.  The code after the line 
>>>>>>> 2487a6ae036cea369a8dbeade78df8d1014825db is the code that is in my local repository.  It looks like B and I both added a function comment header to main(), but our function headers are different.  It also looks like I added command line arguments to main(), but my friend B. did not.

/** 
 * main.c 
 * 
 * The main entrypoint of the "I really love robots" program. The 
 * program prints the phrase "I really love robots!" to the screen. 
 * 
 * @author Tanya L. Crenshaw 
 * @since August 2013 
 * 
 */
#include "robot.h"
<<<<<<< HEAD
=======
>>>>>>> 2487a6ae036cea369a8dbeade78df8d1014825db
/**
 * main() 
 * 
 * The main entrypoint of the program. 
 * <<<<<<< HEAD 
 * @param command line arguments. 
 * 
 * @returns nothing. */int main(int argc, const char * argv[])
=======
* @param none
*
* @returns nothing. 
*/
int main(void)
>>>>>>> 2487a6ae036cea369a8dbeade78df8d1014825db


I get in touch with B. to ask about the recent changes to master.  After we chat and agree on the direction of the code, I alter main.c so that it reflects a coherent set of changes made by both B. and myself, and I remove all the <<<<<<< HEAD and >>>>>>> 2487a6ae036cea369a8dbeade78df8d1014825db.  I confirm that my code still works with this new main.c, and I commit my changes.

robots $ git commit -a -m "Merged my recent changes to the command line arguments with the master branch at the remote."
[master 6336ae6] Merged my recent changes to the command line arguments with the master branch at the remote.

And I push.

robots $ git push
Counting objects: 12, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 1.21 KiB, done.
Total 6 (delta 3), reused 0 (delta 0)
remote: Scanning pack: 100% (6/6), done.
remote: Storing objects: 100% (6/6), done.
remote: Processing commits: 100% (2/2), done.
To https://code.google.com/p/i-love-robots
   2487a6a..6336ae6  master -> master

Yipee.  And now my repository looks like this.  Notice the bit that looks like I-405?  The little D-shape?  That was me making my changes while my friend B. pushed changes to the remote repository before I did.


Part 5.  I want to develop a new feature, but I don't want to disrupt the working code while I am developing it.

When a git repository is created, it is created by default with a single branch called master.   This can be confirmed by using the git branch command on a new repository.  This command lists all the branches in the repository and stars the one that is currently checked out.

robots $ git branch
* master
With any source control repository, it is desirable that the code in the master branch is a stable, working version.  Conventionally, incremental changes for new, almost-working features should not be committed to the master branch.  

For my robots program, I am planning a change.  Instead of just printing out "I really love robots", I want the program to draw a little bit of ASCII art.  I am going to make a new branch, develop my feature on that branch, and then merge my branch with the master branch.  

This all might seem like a lot of work just to add some ASCII art, but this is typical industrial practice. When multiple developers are working on many different features, it is typical for each developer to have her own branch or maybe even one branch for each feature she is developing.

First, I create a new branch, called ascii.

robots $ git branch ascii
robots $ git branch
  ascii
* master

To work from my new branch, I do the following:

robots $ git checkout ascii
Switched to branch 'ascii'

Now, any changes I commit will happen on my ascii branch and cannot be seen from the master branch.  I can use the git commit commands just as I did before; they will all happen on ascii.

Part 6.  I made a new local branch and I want to push it to the remote.

I do some work on my local ascii branch and I commit it locally:

robots $ git commit -a -m "Checkpoint for work to add robPrintAscii() function to robot.c."
[ascii e56cb4f] Checkpoint for work to add robPrintAscii() function to robot.c.
 2 files changed, 41 insertions(+)

Because I have a healthy level of paranoia, and because I'm worried my computer will transmute into a kitten, I want to push my changes to the remote repository.  That way, I have a backup.

$ git push
fatal: The current branch ascii has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin ascii
Oh.  Right.  The remote repository doesn't have an ascii branch.  That's something I made locally.  I need to convince the remote repository that it needs to track my branch.  It's very similar to the first time I pushed to the remote repository; I need to indicate that the remote repository labeled origin in my configuration is the upstream branch for ascii.  So, the first time I push a new branch, I need to do this:

robots $ git push -u origin ascii
Counting objects: 7, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 1.02 KiB, done.
Total 4 (delta 2), reused 0 (delta 0)
remote: Scanning pack: 100% (4/4), done.        
remote: Storing objects: 100% (4/4), done.        
remote: Processing commits: 100% (1/1), done.        
To https://code.google.com/p/i-love-robots
 * [new branch]      ascii -> ascii
Branch ascii set up to track remote branch ascii from origin.

Now my changes made on my local branch are also seen on the same branch in the remote repository. 

I make another commit.

robots $ git commit -a -m "Completed ascii art feature.  Program now prints\
 a most excellent ascii robot before printing parameterized info message."
[ascii 18f69d6] Completed ascii art feature.  Program now prints a most exce\
llent ascii robot before printing parameterized info message.
 2 files changed, 53 insertions(+), 78 deletions(-)
 rewrite robot.c (62%)

I make another push.

robots $ git push
Counting objects: 7, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 483 bytes, done.
Total 4 (delta 3), reused 0 (delta 0)
remote: Scanning pack: 100% (4/4), done.        
remote: Storing objects: 100% (4/4), done.        
remote: Processing commits: 100% (1/1), done.        
To https://code.google.com/p/i-love-robots
   e56cb4f..18f69d6  ascii -> ascii


Part 7.  I am ready to merge all my work on my branch onto the master branch.

I have completed my ascii art feature and I am ready to commit my fully-working feature to the master branch.  First, I confirm that everything on my branch has been pushed to the remote repository:

robots $ git branch
* ascii
  master
bash-3.2$ git status
# On branch ascii
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       post.txt
nothing added to commit but untracked files present (use "git add" to track)

I see that I am on my ascii branch.  I see that I've committed everything except for the post.txt file that I do not want to track.  Coolio.

I check out the master branch and make sure I have all the changes on the master branch.

robots $ git checkout master
Switched to branch 'master'
robots $ git pull
Already up-to-date.

I merge my local ascii branch with my local master branch:

robots $ git merge ascii
Updating 6336ae6..18f69d6
Fast-forward
 main.c  |  2 ++
 robot.c | 13 +++++++++++++
 robot.h |  1 +
 3 files changed, 16 insertions(+)

Note that merge conflicts could happen here.  But they don't.  Lucky me.

Another call to git status shows I'm not done yet:

robots $ git status
# On branch master
# Your branch is ahead of 'origin/master' by 2 commits.
#   (use "git push" to publish your local commits)

I need to push my merged master.

robots $ git push
Total 0 (delta 0), reused 0 (delta 0)
To https://code.google.com/p/i-love-robots
   6336ae6..18f69d6  master -> master

Done.

Part 9. I started making some changes, but they are all lame and I need to get rid of my changes and revert back to working code.

I started making some changes to the code in my repository, but B. has a great idea for adding a music feature, so I don't want them. I can discard the changes I made to individual files.

robots $ git checkout -- robot.c

Or I can discard all the changes I've made on the branch.

robots $ git checkout -- .

Part 10. I want to work with an older version of the code.

Here is what the Google Code repository looks like right now:


Notice the jumble of letters and numbers under "Rev"?  That is a SHA-1 hash of the commit and it's the value that Git uses to refer to specific commits.  So, I can use the value to checkout the version of the master branch called "7e6aa314cddc."  As usual, Git gives me some guidance:

git checkout 7e6aa314cddc
Note: checking out '7e6aa314cddc'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at 7e6aa31... Added additional text to the output of main.c.  Updated the comments in the header to reflect this change.  Program now states 'I really love robots.'

Just like Git says, I can just look around, or I can create a new branch to retain my commits.  I just look at the files, do a make, run the old program.  And now I want to get out of this detached HEAD state:

$ git checkout master
Previous HEAD position was 7e6aa31... Added additional text to the output of main.c.  Updated the comments in the header to reflect this change.  Program now states 'I really love robots.'
Switched to branch 'master'

* * *

Holy cow.  If you read this whole thing, you deserve a pudding pop.

For even more details on how to use Git, here are some nice resources:

http://git-scm.com/book/en/Git-Tools-Stashing
http://git-scm.com/book/en/Git-Branching-Remote-Branches




Friday, August 10, 2012

Google Fusion Tables and Parsing KML files with Python ElementTree

Alternate title: What I taught myself this week.

With every census, the US Census Bureau releases ZCTA data, or zip code tabulation areas.  These are approximations on the boundaries of the US Postal Service zip codes.  The US Census Bureau offers this data as a collection of shapefiles.

Shapefiles are neat, but I am not a GIS user.  Instead, I want to visualize zip codes in Google Maps.  Fortunately, Google Maps supports KML, or Keyhole Markup Language, and there are some open source tools that make it easy to convert between .shp and .kml.  I have a Mac, so I installed GDAL using macports.  It has a nice utility called ogr2ogr to convert between file formats.

$ sudo port install gdal
$ ogr2ogr -F kml tl_2010_41_zcta510.kml tl_2010_41_zcta510.shp

From there, one can just use Google Earth to open the .kml file.



It's also possible to visualize these zip code areas in Google Maps via the Javascript API.  The API supports KML layers.  To add a KML layer to a map requires a URL; that is, the KML data must be hosted on a public server somewhere.  Hm.

Enter Google Fusion Tables.  From Google Docs, one can create a "Table (beta)".  From there, import the desired .kml file, making sure to mark it as a public table using the "Share" menu.  Get the encrypted ID from under the "About" menu, and voila!  Alright, that wasn't a lot of detail, but there's some nice documentation available on this experimental feature.

Cool, but, the document schema used by the US Census Bureau is a bit messy and uninteresting.  So I wrote a little Python script that uses the ElementTree package to omit the parts I didn't like.  This means that when one clicks on part of the KML layer in my Google Map, it just displays my simplified SchemaData, i.e., State and ZipCodeArea:




Tuesday, March 20, 2012

Cool USB drivers



Running an XBOX controller driver in userspace in Linux:

http://packages.debian.org/sid/xboxdrv

Wednesday, February 29, 2012

Joe writes:

I thought this would be interesting in the context of your security class:

http://agtb.wordpress.com/2012/02/17/john-nashs-letter-to-the-nsa/

Sunday, February 26, 2012

Virtual Box + BSD Sockets

I've been trying out Virtual Box as a solution to maintaining a number of computers for the Roomba room as well as for Computer Security.  Today, I tried out writing a socket program for the Virtual Debian appliance hosted on an Apple iMac.  It was far easier than I'd thought.  Here's the key point from the Virtual Box documentation:

"To enable bridged networking, all you need to do is to open the Settings dialog of a virtual machine, go to the "Network" page and select "Bridged network" in the drop down list for the "Attached to" field. Finally, select desired host interface from the list at the bottom of the page, which contains the physical network interfaces of your systems. On a typical MacBook, for example, this will allow you to select between "en1: AirPort" (which is the wireless interface) and "en0: Ethernet", which represents the interface with a network cable."