Day 15 Notes are available.
Emily and Ayush’s Project Proposal
What problem is your project aimed at solving? Alternatively, why would someone want to use what you are making in your project?
The first immediate goal of the American Recovery and Reinvestment Act of 2009 is to “create new jobs as well as save existing ones.”
While Emily’s initial project idea was to compare jobs created and saved vs. unemployment, some initial research proved that the exact number of jobs created per grant or contract is difficult to determine with the level of regularity and reliability needed for creating a machine-readable, automated data mashup. A good example is the problem of bulk grants. The Regents of the University of California have been earmarked to receive a grant of $716 million dollars
which will save some 38,923.98 jobs, but the distribution of those jobs within the state of California is not clearly delineated, and furthermore, the number of jobs is a combination of full-time employees, part-time employees, sub-contractors, and vendor hours, again unspecified by county placement.
Due to these problems–the lack of consistency and specificity needed to achieve this initial project goal, we will be focusing on a slightly different question within the manageable scope of Alameda County, California: How has the ARRA affected (or not affected) unemployment in Alameda County, California? If the ARRA has been “effective,” we would hope to see the level of employment decreasing or remaining steady over time as funds are dispersed. Of course it is difficult if not impossible to make any causal conclusions by comparing this data as opposed to examining job-for-job data (and taking into account other indices of the recovering economy), which I have already mentioned is an intractable problem until more granular data can be reliable extracted.
Is your project doable given the constraints of time, our starting knowledge, etc?
Yes, we think so. We’ve narrowed the geographic scope from the entire U.S. to Alameda County and will only be needing two sources of data:
The visualization portion will be tricky, but I am confident that we have the resourcefulness necessary to complete this part of the project.
What interface are you imagining? Is it a web, desktop, or mobile application? What platform are you running?
We want to build a web visualization which can be viewed with the Firefox web browser.
What data or services are planning to bring together?
What’s your plan for getting the data? Often the data you might want might not be easily available.
Downloading the latest unemployment and ARRA files available and filtering for Alameda county.
Do the APIs you plan to use actually support the functionality that you need in your application?
No, we are not using any APIs, except for maybe the Google Visualization API, but that is not a data source.
What programming language do you plan to use?
Either Python or PHP, depending upon which one is easier to use with the data.
Action Plan:
Risk Areas/Mitigation Plan:
Right now our biggest risk is scrubbing the ARRA data. We imagine that not every award will have listed its zip code, zip codes may span more than just Alameda County, or that field may be blank.
Our best mitigation plan might actually be to expand the scope of the project to include all of California, since the award’s state is a more reliable data point.

The aim of this project is to make the ARRA data more meaningful to people
who might find the data reported at recovery.gov difficult to make sense
of, or difficult to relate to their own lives.
A news feed that can provide users with some context for what recovery act recipients are accomplishing locally might help to make the data more accessible to people, although I am still having trouble figuring out how to put together a news feed that could provide meaningful/useful context without overwhelming the context with noise. Some possible sources might be feeds using google news, yahoo search, google blog search, and the NYT APIs. The goal would be to combine news feeds focused on recovery act recipients with the recipient-reported data at recovery.gov, so that if a user is looking at particular recipients, s/he gets links to related stories.
I am imagining a web interface, maybe starting with a searchable grid
layout similar to the “advanced recipient reported data search” at recovery.gov
http://www.recovery.gov/pages/TextViewProjSummary.aspx?data=recipientAwardsList
but with the retrieved data focused on news feed information, together
with related ARRA funding data. If possible, I would also like to create a
searchable map interface.
The ARRA data can be downloaded from recovery.gov, or possibly accessed through the RPI SPARQL project. New York Times data can be accessed through several APIs — the Search and Tags APIs are promising for stories that make the national news, but other sources will be necessary for local coverage outside of the New York metropolitan area.
I am not sure what languages I will need to use for this project– in
addition to (possibly) Python, I guess PHP?
I am also not sure how doable this project will be given time constraints and
my starting knowledge. I have some coding experience but not in this
context, and very little experience with web architecture or web
interfaces.
Currently I’m exploring different news sources to see what combinations of information can provide useful context for a user looking at recipients in a particular zip code. In addition to the other difficulties already mentioned,
identifying news sources/searches that would be meaningful to users, without requiring a lot of winnowing on the user’s part, feels like the most important barrier right now.
Hi all, I am Julian and this is my project proposal.
What problem is your project aimed at solving? Alternatively, why would someone want to use what you are making in your project?
I am thinking about extracting real-time air quality data from the California Air Resources Board and create an air quality map along with real-time traffic data (meshed with Gmap). Combined with various meteorological data such as wind speed, temperature, and precipitation, one could look at the various impacts traffic has on the air quality.
What interface are you imagining? Is it a web, desktop, or mobile application? What platform are you running? Often a rough sketch of the interface can help clarify a lot of issues
The interface would be google map. Since a lot of computation is involved, I would expect the platform to be strictly desktop and web.
What data or services are planning to bring together? Be specific.
I am planning to bring together the data from California Air Resources Board and the Mobile Millennium Project. The Mobile Millennium project “will design, test and implement a state-of-the-art system to collect traffic data from GPS-equipped mobile phones and estimate traffic conditions in real-time. It is a partnership between government, academia, and industry.” [http://traffic.berkeley.edu/theproject.html]
What’s your plan for getting the data. Often the data you might want might not be easily available.
I am already in contact with the Professor responsible for this project in my department and we are sorting things out.
Do the APIs you plan to use actually support the functionality that you need in your application? Show how it does so.
Google map will be able to plot the traffic data as color coded paths. This would show the level of congestion on the road. Air quality data could also be easily shown on the map by pinpointing the sampling station and color coding the pin.
What programming language do you plan to use?
Javascript.
Provide an action plan:
Break down the project into steps. You can end up changing the steps later, but I want to make sure you have a clear conception on what the steps are.
1. Obtain traffic data in a workable format.
2. Analyze and devise a scheme to automate extraction of air quality data.
3. Plot the traffic and air quality data on the same map (trial, using a particular set of data).
4. Plot the traffic and air quality data on the same map, using different sets of data.
5. Create a user interface where date and time could be selected for a specific set of data.
6. Animation of changes in traffic and air quality data over time.
7. Integrate the system for future data extraction.
Highlight what you are currently working on.
Currently I am discussing with the Professor on how to getting the traffic data in a workable format. The data is there, it is just difficult to have it in a google-map-friendly format.
Identify areas of “high risk,” areas that you are uncertain about and/or things that might undermine the entire project. Write about how you are planning to deal with these potential problem areas.
One of the high risk areas is that the traffic data might not even be available in a workable format for me to integrate into the map. In that case, I would either have to look for alternate traffic data or use a different approach in showcasing the air quality data, such as comparing the changes in air quality with the changes in Federal and State air quality standards over the years.
*NOW WORKING ON…*
Writing a javascript to extract csv data files from the California Air Resources Board and put them into a database.