Uncategorized25 Apr 2010 01:28 pm

On your own notebook computers with one central site-packages for your python setup, follow instructions at http://pypi.python.org/pypi/setuptools

One way is to download http://peak.telecommunity.com/dist/ez_setup.py

and run it (i.e., python ez_setup.py)

Then you’ll be able to do

easy_install simplejson


on people.ischool.berkeley.edu:

add to ~/.cshrc:

setenv PYTHONPATH ~/lib/python2.5/site-packages

then you can run

source ~/.cshrc

setup your virtual python:

wget http://svn.python.org/projects/sandbox/trunk/setuptools/virtual-python.py
python virtual-python.py

This sets up a custom python environment that you can run from ~/bin

wget http://peak.telecommunity.com/dist/ez_setup.py

~/bin/python ez_setup.py –prefix=~

Now you’re ready to install simplejson

~/bin/easy_install –prefix=~ simplejson

Demo of http://people.ischool.berkeley.edu/~rdhyee/cgi-bin/testsimplejson.py


import simplejson as json
print “Content-Type: application/json”

m = {‘states’:{‘CA’:’California’, ‘NY’:’New York’, ‘MM’:None}, ‘provinces’:{‘ON’:’Ontario’,’BC’:’British Columbia’}}
m_json = json.dumps(m)
print m_json

Optional — to install virtualenv

~/bin/easy_install –prefix=~ virtualenv

Uncategorized21 Apr 2010 11:39 am

A reminder of the schedule:

Day 26 Wed 2010-04-28 Project Presentations I
Day 27 Mon 2010-05-03 Project Presentations II
Day 28 Wed 2010-05-05 Open House

Pick which day you want to present on and let me know.  Presentations are up to 15 minutes.

Written reports are due on Wednesday, May 5 in class.



I rewrote most of the code in Ch 10 to run using Python on the I School server.  See


The proxy code is at http://people.ischool.berkeley.edu/~rdhyee/s10/day24/flickrgeo.py — the proxy runs at http://people.ischool.berkeley.edu/~rdhyee/cgi-bin/s10/day24/flickrgeo.py

Spring 201029 Mar 2010 11:04 am

Day 17 notes are available.

Spring 201025 Mar 2010 08:18 am

See Day 16 notes

Spring 201015 Mar 2010 11:31 am

Day 15 Notes are available.

Spring 201010 Mar 2010 12:28 pm

See slides.

Spring 201008 Mar 2010 12:23 pm

Day 13 notes

Spring 201003 Mar 2010 08:21 am

Day 12 notes are available.

Spring 201002 Mar 2010 10:32 am

I published notes for Day 11 yesterday.

Project Proposals and Spring 201025 Feb 2010 11:33 am

What problem is your project aimed at solving? Alternatively, why would someone want to use what you are making in your project?

The first immediate goal of the American Recovery and Reinvestment Act of 2009 is to “create new jobs as well as save existing ones.”

While Emily’s initial project idea was to compare jobs created and saved vs. unemployment, some initial research proved that the exact number of jobs created per grant or contract is difficult to determine with the level of regularity and reliability needed for creating a machine-readable, automated data mashup.  A good example is the problem of bulk grants.  The Regents of the University of California have been earmarked to receive a grant of $716 million dollars

which will save some 38,923.98 jobs, but the distribution of those jobs within the state of California is not clearly delineated, and furthermore, the number of jobs is a combination of full-time employees, part-time employees, sub-contractors, and vendor hours, again unspecified by county placement.

Due to these problems–the lack of consistency and specificity needed to achieve this initial project goal, we will be focusing on a slightly different question within the manageable scope of Alameda County, California:  How has the ARRA affected (or not affected) unemployment in Alameda County, California?  If the ARRA has been “effective,” we would hope to see the level of employment decreasing or remaining steady over time as funds are dispersed.  Of course it is difficult if not impossible to make any causal conclusions by comparing this data as opposed to examining job-for-job data (and taking into account other indices of the recovering economy), which I have already mentioned is an intractable problem until more granular data can be reliable extracted.

Is your project doable given the constraints of time, our starting knowledge, etc?

Yes, we think so.  We’ve narrowed the geographic scope from the entire U.S. to Alameda County and will only be needing two sources of data:

  1. Month-by-month unemployment data in Alameda County from January 2009 to the present
  2. The start date of all ARRA grants/contracts dispersed in Alameda County to date

The visualization portion will be tricky, but I am confident that we have the resourcefulness necessary to complete this part of the project.

What interface are you imagining? Is it a web, desktop, or mobile application? What platform are you running?

We want to build a web visualization which can be viewed with the Firefox web browser.

What data or services are planning to bring together?

  1. Unemployment data from the Bureau of Labor Statistics (http://www.bls.gov/lau/#tables)
  2. ARRA Project award start dates and amounts from http://www.recovery.ca.gov/HTML/RecoveryImpact/map.shtml?county=Alameda
  3. ClearMaps or Tableau or Google Visualization API

What’s your plan for getting the data? Often the data you might want might not be easily available.

Downloading the latest unemployment and ARRA files available and filtering for Alameda county.

Do the APIs you plan to use actually support the functionality that you need in your application?

No, we are not using any APIs, except for maybe the Google Visualization API, but that is not a data source.

What programming language do you plan to use?

Either Python or PHP, depending upon which one is easier to use with the data.

Action Plan:

  1. Download and examine unemployment and ARRA data files.
  2. Filter the files by Alameda County.
  3. Find zip codes for Alameda County and extract relevant ARRA data from master file.
  4. Investigate best mapping/visualization tool, hopefully one which will allow us to make monthly step-wise comparisons in unemployment, new project contract start dates, and the percentage of ARRA money spent out of the total currently slated for Alameda County.
  5. Integrate the data sets with the visualization utility.
  6. Celebrate.

Risk Areas/Mitigation Plan:

Right now our biggest risk is scrubbing the ARRA data.  We imagine that not every award will have listed its zip code, zip codes may span more than just Alameda County, or that field may be blank.

Our best mitigation plan might actually be to expand the scope of the project to include all of California, since the award’s state is a more reliable data point.

Next Page »