import numpy as npimport pandas as pdfrom pandas import Series,DataFrameimport matplotlib.pyplot as plt%matplotlib inline新建一个字典,建立候选人与所在党派之间的映射parties = { ‘Bachmann, Michelle’: ‘Republican’, ‘Romney, Mitt’: ‘Republican’, ‘Obama, Barack’: ‘Democrat’, “Roemer, Charles E. ‘Buddy’ III”: ‘Reform’, ‘Pawlenty, Timothy’: ‘Republican’, ‘Johnson, Gary Earl’: ‘Libertarian’, ‘Paul, Ron’: ‘Republican’, ‘Santorum, Rick’: ‘Republican’, ‘Cain, Herman’: ‘Republican’, ‘Gingrich, Newt’: ‘Republican’, ‘McCotter, Thaddeus G’: ‘Republican’, ‘Huntsman, Jon’: ‘Republican’, ‘Perry, Rick’: ‘Republican’ }读取文件df = pd.read_csv(’../data/usa_election.txt’, low_memory=False)df.head()使用map函数+字典,新建一列各个候选人所在党派partydf[‘party’] = df[‘cand_nm’].map(parties)df.head()使用np.unique()函数查看colums:party这一列中有哪些元素np.unique(df[‘party’])array([‘Democrat’, ‘Libertarian’, ‘Reform’, ‘Republican’], dtype=object)使用value_counts()函数,统计party列中各个元素出现次数df[‘party’].value_counts()Democrat 292400Republican 237575Reform 5364Libertarian 702Name: party, dtype: int64使用groupby()函数,查看各个党派收到的政治献金总数contb_receipt_amtdf.groupby(‘party’)[[‘contb_receipt_amt’]].sum()查看具体每天各个党派收到的政治献金总数contb_receipt_amtdf.groupby([‘party’, ‘contb_receipt_dt’])[[‘contb_receipt_amt’]].sum()查看日期格式,并将其转换为Pandas的日期格式,通过函数加map方式进行转换%time df[‘contb_receipt_dt’] = df[‘contb_receipt_dt’].map(pd.to_datetime)df这里相对耗费时间根据时间进行排序df = df.sort_values(‘contb_receipt_dt’)df.head()每天各政党所收政治献金数目df2 = df.groupby([‘party’, ‘contb_receipt_dt’])[[‘contb_receipt_amt’]].sum()df2.head(10)使用unstack()将上面所得数据中的party从一级索引变成列索引,unstack(‘party’)df2 = df2.unstack(0)df2.head(10)使用上面获取的数据,画出各党派累计政治献金,cumsum()累加函数df2 = df2.fillna(0)df2 = df2.cumsum()df2.plot()因为Libertarian和Reform的政治献金跟另外两个政党不再同一个级别上,因此刻度相对这两个政党太大,所以两个政党的政治献金看起来为0.把时间作为列,党派作为行来观察,使用stack()把party变成二级行索引,注意所有的值都不能为nan,需要填充为0df2 = df2.stack(1)df3 = df2.unstack(0)df3 = df3.fillna(0)df3.head(20)查看候选人姓名cand_nm和政治献金捐献者职业contbr_occupation,以及捐献情况。能看出各个候选人主要的支持者分布情况df3 = df.groupby([‘cand_nm’, ‘contbr_occupation’])[[‘contb_receipt_amt’]].sum()df3.head(20)查看老兵主要支持谁:DISABLED VETERANdf5 = df3.query(‘contbr_occupation == “DISABLED VETERAN”’)df5把索引变成列df5.reset_index()找出各个候选人的捐赠者中,捐赠金额最大的人的职业以及捐献额df.head()df4 = df.groupby([‘cand_nm’, ‘contbr_nm’])[[‘contb_receipt_amt’]].sum()df4[‘contb_receipt_amt’].max()df4.query(‘contb_receipt_amt == 4419671.51’)