17 Oct 2008 02:06 am

Reminders

We have guest speakers coming in on Wednesday (Kirrily Robert of MetaWeb) and next Monday (our very own Brian Carver).

April 1 at 5pm is the deadline for revised project proposals.

New course schedule

I’ve put up a wiki page on bSpace (called Student Led Tutorials) with a draft schedule for students to do tutorials. I’d like every student to do at least one. Think of an activity that would help your fellow students take away something practical from your session. Don’t be shy about having these sessions help you out with our project too.

Feel free to edit this page to capture the right wording for your topic. Feel free to reschedule. Let’s aim for no more than 2 sessions/day – unless they are short. Please include an estimate of how much time you will need. 

Notes for today’s lecture

Although Kirrily Robert of MetaWeb, the company behind Freebase, will be here on Wednesday, I will give an introductory talk and lead you through a little exercise in getting started with Freebase today.

Python code for verifying Mitch McConnell’s assertion that Republican Senators represent 1/2 the population

"""
scrape the XML on senators to help verify Sen. Mitch McConnell's assertion

I'm assuming only one population value / state right now in Freebase -- a dicey assumption.  I'm also assuming the
populations are correct....
"""

try:
	from xml.etree import ElementTree
except:
	from elementtree import ElementTree

import json 	

import urllib

import httplib2
#client = httplib2.Http(".cache")
client = httplib2.Http()

senate_url = "http://www.senate.gov/general/contact_information/senators_cfm.xml"
response, xml = client.request(senate_url)
doc = ElementTree.fromstring(xml)

# build a dictionary for the state
states = {}

for member in doc.findall('.//member'):
    party = member.find('party').text
    state = member.find('state').text
    # pull corresponding state object out
    state_obj = states.setdefault(state,{})
    state_obj[party] = state_obj.setdefault(party,0) + 1

for (k, v) in states.iteritems():
    print k, v

# let's use freebase to pull out the population of states along with abbreviations.

mql_query = """{"q1":{"as_of_time":"%s","query":[{"/location/administrative_division/iso_3166_2_code":null,"/location/statistical_region/population":[{"number":null}],"name":null,"type":"/location/us_state"}]}}""" %("2009-03-30T16:16:36.546Z")

#mql_url = 'http://api.freebase.com/api/service/mqlread?query={"query":[{"/location/administrative_division/iso_3166_2_code":null,"/location/statistical_region/population":{"number":null},"name":null,"sort":"-/location/statistical_region/population.number","type":"/location/us_state"}]}'
#mql_url = 'http://api.freebase.com/api/service/mqlread?query=%7B%22query%22%3A%5B%7B%22%2Flocation%2Fadministrative_division%2Fiso_3166_2_code%22%3Anull%2C%22%2Flocation%2Fstatistical_region%2Fpopulation%22%3A%7B%22number%22%3Anull%7D%2C%22name%22%3Anull%2C%22sort%22%3A%22-%2Flocation%2Fstatistical_region%2Fpopulation.number%22%2C%22type%22%3A%22%2Flocation%2Fus_state%22%7D%5D%7D'
#mql_url = 'http://is.gd/jmkN'  # not ideal...but helps with httplib2 cache trying to write weird names to filenames

#mql_url = 'http://api.freebase.com/api/service/mqlread?%s' % (urllib.urlencode({'query':mql_query}))
mql_url = 'http://api.freebase.com/api/service/mqlread?queries=%s' % (mql_query)
print mql_query
print mql_url

#mql_url = 'http://bit.ly/iojHx'
client.follow_redirects = True
response, mql_json = client.request(mql_url)

state_pop = json.loads(mql_json)
print state_pop
state_pop_hash = {}

for state in state_pop['q1']['result']:
    population = state['/location/statistical_region/population'][0]['number']
    abbreviation = state['/location/administrative_division/iso_3166_2_code'].split('-')[1]
    state_pop_hash[abbreviation] = population

#print state_pop_hash

# now we're ready to calculate the proportion of the population for each party
# let's do this in two ways

# 1) if party is represented at all, credit the entire population of the state: M1
# 2) each senator gets 1/2 of population: M2

total_population = 0

M1 = {}
M2 = {}
num_senators = 0

for (state,senators) in states.iteritems():
    total_population += state_pop_hash[state]
    for k in senators.keys():
        num_senators += senators[k]
        M1[k] = M1.get(k,0) + state_pop_hash[state]
        M2[k] = M2.get(k,0) + senators[k]*state_pop_hash[state]/2.0

print "number of current senators: ", num_senators
print "total population: ", total_population

print "if party is represented at all, credit the entire population of the state"
print "totals: ", M1
for (party, value) in M1.iteritems():
    print (party, 100.0*value/total_population)

print "each senator gets 1/2 of population"
print "totals", M2
for (party, value) in M2.iteritems():
    print (party, 100.0*value/total_population)

At 3:20 of http://www.npr.org/templates/story/story.php?storyId=99122148, McConnell said (roughly): “And Republicans feel like they have some opportunities to have some input. It’s not just a matter of pride, Senate Republicans represent 1/2 the American population.”

One should be able to verify these facts quickly. I was hoping for a completely freebase solution, but ended using http://www.senate.gov/general/contact_information/senators_cfm.xml to tell me current senators and their party affiliation.

Output of app

number of current senators:  99
total population:  299571466
if party is represented at all, credit the entire population of the state
totals:  {‘I’: 623908, ‘R’: 151020599, ‘D’: 226411258, ‘ID’: 3502309}
(‘I’, 0.2082668313944159)
(‘R’, 50.412210821173467)
(‘D’, 75.578379016912109)
(‘ID’, 1.1691063393868093)
each senator gets 1/2 of population
totals {‘I’: 311954.0, ‘R’: 112090403.5, ‘D’: 182819143.5, ‘ID’: 1751154.5}
(‘I’, 0.10413341569720795)
(‘R’, 37.416915902130675)
(‘D’, 61.026888154961995)
(‘ID’, 0.58455316969340465)

Conclusion:

McConnell is right using methodology 1 — but he neglected to mention by his methodology, Democrats represent 76% of the population.  By methodology 2, we have Republican senators representing 37%.  One caution:  are the populations correct in Freebase?

I’d like to come back to create a totally Fbase version.

Chemical elements

periodicity behavior of elements — boiling point vs atomic number

example with H
http://www.freebase.com/view/en/hydrogen
just the id
http://www.freebase.com/tools/queryeditor?q={“id”:”/en/hydrogen”}&read=1
ask for the name
http://www.freebase.com/tools/queryeditor?q={“id”:”/en/hydrogen”,”name”:null}&read=1
look at topics
http://www.freebase.com/tools/queryeditor?q={“id”:”/en/hydrogen”,”name”:null,”type”:[]}&read=1

To query all elements w/ and w/o bp in freebase:

http://blog.dataunbound.com/2009/02/13/get-chemical-elements-with-and-without-boiling-point-data-from-freebase/

David sent me a good solution using Parallax for plotting elements vs BP
http://tinyurl.com/64day4

Beware of bad actors ….

On Wednesday…the hard work of reconciliation….

Other refs

* parallax:  http://mqlx.com/~david/parallax/index.html

* interview with Jamie Taylor

Trackback this Post | Feed on comments to this Post

Leave a Reply

You must be logged in to post a comment.