也许先按CostCentre分组,然后使用Series / DataFrame resample()
?
In [72]: centers = {}
In [73]: for center, idx in df.groupby("CostCentre").groups.iteritems():
....: timediff = df.ix[idx].set_index("Date")['TimeDifference']
....: centers[center] = timediff.resample("W", how=sum)
In [77]: pd.concat(centers, names=['CostCentre'])
Out[77]:
CostCentre Date
0 2012-09-09 0
2012-09-16 89522
2012-09-23 6
2012-09-30 161
2073 2012-09-09 141208
2012-09-16 113024
2012-09-23 169599
2012-09-30 170780
6078 2012-09-09 171481
2012-09-16 160871
2012-09-23 153976
2012-09-30 122972
当parse_dates
是True
为pd.read_ *功能,index_col
还必须设置。
In [28]: df = pd.read_clipboard(sep=' +', parse_dates=True, index_col=0,
....: dayfirst=True)
In [30]: df.head()
Out[30]:
CostCentre TimeDifference
DateOccurred
2012-09-03 2073 28138
2012-09-03 6078 34844
2012-09-03 8273 31215
2012-09-03 8367 28160
2012-09-03 8959 32037
由于resample()需要一个以TimeSeries索引的帧/序列,因此在创建过程中设置索引就无需为每个组分别设置索引。GroupBy对象也有一个apply方法,该方法基本上是用上述pd.concat()完成的“ combine”步骤周围的语法糖。
In [37]: x = df.groupby("CostCentre").apply(lambda df:
....: df['TimeDifference'].resample("W", how=sum))
In [38]: x.head(12)
Out[38]:
CostCentre DateOccurred
0 2012-09-09 0
2012-09-16 89522
2012-09-23 6
2012-09-30 161
2073 2012-09-09 141208
2012-09-16 113024
2012-09-23 169599
2012-09-30 170780
6078 2012-09-09 171481
2012-09-16 160871
2012-09-23 153976
2012-09-30 122972