使用pandas将csv文件中的数据读入时间序列(Reading data from csv file into time series with pandas)

我的目标是将EURUSD 数据 (每日)读入一个时间序列对象,在那里我可以根据不规则时间帧轻松地对信息进行切片和切块,聚合和重新采样。 这很可能是一个简单的答案。 我正在使用Python进行数据分析,但似乎无法弥补差距。

下载并解压缩数据后 ,我运行以下代码:

>>> import pandas as pd >>> df = pd.read_csv('EURUSD_day.csv', parse_dates = {'Timestamp' : ['<DATE>', '<TIME>']}, index_col = 'Timestamp')

到现在为止还挺好。 我现在有一个很好的数据框,其中包含Timestamps作为索引。

但是,本书暗示(第295页)我应该能够对数据进行子集化,如下所示,以查看2001年的所有数据。

>>> df['2001']

但是,这不起作用。

阅读这个问题和答案告诉我,我可以导入时间戳:

>>> from pandas.lib import Timestamp >>> s = df['<CLOSE>']

这似乎适用于某一天:

>>> s[Timestamp('2001-01-04)] 0.9506999999

然而,以下代码为2001年所有数据的期望范围产生单个值。

>>> s[Timestamp('2001')] 0.8959

我知道我缺少一些简单的东西,一些基本的东西。 有人可以帮忙吗?

谢谢你,Brian

My goal is to read EURUSD data (daily) into a time series object where I can easily slice-and-dice, aggregate, and resample the information based on irregular-ish time frames. This is most likely a simple answer. I'm working out of Python for Data Analysis but can't seem to bridge the gap.

After downloading and unzipping the data, I run the following code:

>>> import pandas as pd >>> df = pd.read_csv('EURUSD_day.csv', parse_dates = {'Timestamp' : ['<DATE>', '<TIME>']}, index_col = 'Timestamp')

So far so good. I now have a nice data frame with Timestamps as the index.

However, the book implies (p. 295) that I should be able to subset the data, as follows, to look at all the data from the year 2001.

>>> df['2001']

But, that doesn't work.

Reading this question and answer tells me that I could import Timestamp:

>>> from pandas.lib import Timestamp >>> s = df['<CLOSE>']

Which seems to work for a particular day:

>>> s[Timestamp('2001-01-04)] 0.9506999999

Yet, the following code yields a single value for my desired range of all data from year 2001.

>>> s[Timestamp('2001')] 0.8959

I know I am missing something simple, something basic. Can anyone help?

Thank you, Brian

最满意答案

关于pg的例子。 正在对Series对象执行295,这就是为什么使用年份编制索引的原因。 使用DataFrame,您可能希望df.ix['2001']获得相同的结果。

The example on pg. 295 is being performed on Series object which is why indexing with the year works. With a DataFrame you would want df.ix['2001'] to achieve the same results.

更多推荐