{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CHAPTER 11 Time Series(时间序列)\n", "\n", "时间序列指能在任何能在时间上观测到的数据。很多时间序列是有固定频率(fixed frequency)的,意思是数据点会遵照某种规律定期出现,比如每 15 秒,每 5 分钟,或每个月。时间序列也可能是不规律的(irregular),没有一个固定的时间规律。如何参照时间序列数据取决于我们要做什么样的应用,我们可能会遇到下面这些:\n", "\n", "- Timestamps(时间戳),具体的某一个时刻\n", "- Fixed periods(固定的时期),比如 2007 年的一月,或者 2010 年整整一年\n", "- Intervals of time(时间间隔),通常有一个开始和结束的时间戳。Periods(时期)可能被看做是 Intervals(间隔)的一种特殊形式。\n", "- Experiment or elapsed time(实验或经过的时间);每一个时间戳都是看做是一个特定的开始时间(例如,在放入烤箱后,曲奇饼的直径在每一秒的变化程度)\n", "\n", "这一章主要涉及前三个类型。\n", "\n", "> pandas 也支持基于 timedeltas 的 index,本书不会对 timedelta index 做介绍,感兴趣的可以查看 pandas 的文档。\n", "\n", "\n", "# 11.1 Date and Time Data Types and Tools(日期和时间数据类型及其工具)\n", "\n", "python 有标准包用来表示时间和日期数据。datetime, time, calendar,这些模块经常被使用。datetime.datetime 类型,或简单写为 datetime,被广泛使用:\n" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from datetime import datetime" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "now = datetime.now()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2017, 12, 1, 12, 12, 0, 375896)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "now" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(2017, 12, 1)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "now.year, now.month, now.day" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "datetime 能保存日期和时间到微妙级别。timedelta 表示两个不同的 datetime 对象之间的时间上的不同:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "datetime.timedelta(926, 56700)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)\n", "delta" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "926" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "delta.days" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "56700" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "delta.seconds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以在一个 datetime 对象上,添加或减少一个或多个 timedelta,这样可以产生新的变化后的对象:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from datetime import timedelta" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "start = datetime(2011, 1, 7)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2011, 1, 19, 0, 0)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "start + timedelta(12)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2010, 12, 14, 0, 0)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "start - 2 * timedelta(12)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下表汇总了一些 datetime 模块中的数据类型:\n", "\n", "![](http://oydgk2hgw.bkt.clouddn.com/pydata-book/wqo6m.png)\n", "\n", "# 1 Converting Between String and Datetime(字符串与时间的转换)\n", "\n", "我们可以对 datetime 对象,以及 pandas 的 Timestamp 对象进行格式化,这部分之后会介绍,使用 str 或 strftime 方法,传入一个特定的时间格式就能进行转换:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "stamp = datetime(2011, 1, 3)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'2011-01-03 00:00:00'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "str(stamp)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'2011-01-03'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stamp.strftime('%Y-%m-%d')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下表是关于日期时间类型的格式:\n", "\n", "![](http://oydgk2hgw.bkt.clouddn.com/pydata-book/r98dw.png)\n", "\n", "![](http://oydgk2hgw.bkt.clouddn.com/pydata-book/bc9e8.png)\n", "\n", "我们可以利用上面的 format codes(格式码;时间日期格式)把字符串转换为日期,这要用到 datetime.strptime:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "value = '2011-01-03'" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2011, 1, 3, 0, 0)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datetime.strptime(value, '%Y-%m-%d')" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "datestrs = ['7/6/2011', '8/6/2011']" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于一个一直的时间格式,使用 datetime.strptime 来解析日期是很好的方法。但是,如果每次都要写格式的话很烦人,尤其是对于一些比较常见的格式。在这种情况下,我们可以使用第三方库 dateutil 中的 parser.parse 方法(这个库会在安装 pandas 的时候自动安装):" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from dateutil.parser import parse" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2011, 1, 3, 0, 0)" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parse('2011-01-03')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "dateutil 能够解析很多常见的时间表示格式:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(1997, 1, 31, 22, 45)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parse('Jan 31, 1997 10:45 PM')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在国际上,日在月之前是很常见的(译者:美国是把月放在日前面的),所以我们可以设置 dayfirst=True 来指明最前面的是否是日:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2011, 12, 6, 0, 0)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parse('6/12/2011', dayfirst=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "pandas 通常可以用于处理由日期组成的数组,不论是否是 DataFrame 中的行索引或列。to_datetime 方法能解析很多不同种类的日期表示。标准的日期格式,比如 ISO 8601,能被快速解析:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": true }, "outputs": [], "source": [ "datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.to_datetime(datestrs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "还能处理一些应该被判断为缺失的值(比如 None, 空字符串之类的):" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "idx = pd.to_datetime(datestrs + [None])\n", "idx" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "NaT" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "idx[2]" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([False, False, True], dtype=bool)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.isnull(idx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nat(Not a Time) 在 pandas 中,用于表示时间戳为空值(null value)。\n", "\n", "> dateutil.parse 是一个很有用但不完美的工具。它可能会把一些字符串识别为日期,例如,'42'就会被解析为 2042 年加上今天的日期。\n", "\n", "datetime 对象还有一些关于地区格式(locale-specific formatting)的选项,用于处理不同国家或不同语言的问题。例如,月份的缩写在德国和法国,与英语是不同的。下表列出一些相关的选项:\n", "\n", "![](http://oydgk2hgw.bkt.clouddn.com/pydata-book/gp2fy.png)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0 (main, Oct 26 2022, 19:06:18) [Clang 14.0.0 (clang-1400.0.29.202)]" }, "vscode": { "interpreter": { "hash": "5c7b89af1651d0b8571dde13640ecdccf7d5a6204171d6ab33e7c296e100e08a" } } }, "nbformat": 4, "nbformat_minor": 0 }