关于pandas:Pandas高级教程之plot画图详解

python 中 matplotlib 是十分重要并且不便的图形化工具，应用 matplotlib 能够可视化的进行数据分析，明天本文将会具体解说 Pandas 中的 matplotlib 利用。

要想应用 matplotlib，咱们须要援用它：

In [1]: import matplotlib.pyplot as plt

如果咱们要从 2020 年 1 月 1 日开始，随机生成 365 天的数据，而后作图示意应该这样写：

 ts = pd.Series(np.random.randn(365), index=pd.date_range("1/1/2020", periods=365))
 
ts.plot()

应用 DF 能够同时画多个 Series 的图像：

 df3 =  pd.DataFrame(np.random.randn(365, 4), index=ts.index, columns=list("ABCD"))
 
 df3= df3.cumsum()
 
df3.plot()

能够指定行和列应用的数据：

 df3 = pd.DataFrame(np.random.randn(365, 2), columns=["B", "C"]).cumsum()
 
df3["A"] = pd.Series(list(range(len(df))))
 
df3.plot(x="A", y="B");

plot() 反对很多图像类型，包含 bar, hist, box, density, area, scatter, hexbin, pie 等，上面咱们别离举例子来看下怎么应用。

 df.iloc[5].plot(kind="bar");

多个列的 bar：

 df2 = pd.DataFrame(np.random.rand(10, 4), columns=["a", "b", "c", "d"])
 
df2.plot.bar();

df2.plot.bar(stacked=True);

barh 示意横向的 bar 图：

df2.plot.barh(stacked=True);

df2.plot.hist(alpha=0.5);

df.plot.box();

box 能够自定义色彩：

 color = {
   ....:     "boxes": "DarkGreen",
   ....:     "whiskers": "DarkOrange",
   ....:     "medians": "DarkBlue",
   ....:     "caps": "Gray",
   ....: }
 
df.plot.box(color=color, sym="r+");

能够转成横向的：

df.plot.box(vert=False);

除了 box，还能够应用 DataFrame.boxplot 来画 box 图：

 In [42]: df = pd.DataFrame(np.random.rand(10, 5))
 
In [44]: bp = df.boxplot()

boxplot 能够应用 by 来进行分组：

 df = pd.DataFrame(np.random.rand(10, 2), columns=["Col1", "Col2"])
 
df
Out[90]: 
       Col1      Col2
0  0.047633  0.150047
1  0.296385  0.212826
2  0.562141  0.136243
3  0.997786  0.224560
4  0.585457  0.178914
5  0.551201  0.867102
6  0.740142  0.003872
7  0.959130  0.581506
8  0.114489  0.534242
9  0.042882  0.314845
 
df.boxplot()

当初给 df 加一列：

  df["X"] = pd.Series(["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"])
 
df
Out[92]: 
       Col1      Col2  X
0  0.047633  0.150047  A
1  0.296385  0.212826  A
2  0.562141  0.136243  A
3  0.997786  0.224560  A
4  0.585457  0.178914  A
5  0.551201  0.867102  B
6  0.740142  0.003872  B
7  0.959130  0.581506  B
8  0.114489  0.534242  B
9  0.042882  0.314845  B
 
bp = df.boxplot(by="X")

应用 Series.plot.area() 或者 DataFrame.plot.area() 能够画出 area 图。

 In [60]: df = pd.DataFrame(np.random.rand(10, 4), columns=["a", "b", "c", "d"])
 
In [61]: df.plot.area();

如果不想叠加，能够指定 stacked=False

In [62]: df.plot.area(stacked=False);

DataFrame.plot.scatter() 能够创立点图。

 In [63]: df = pd.DataFrame(np.random.rand(50, 4), columns=["a", "b", "c", "d"])
 
In [64]: df.plot.scatter(x="a", y="b");

scatter 图还能够带第三个轴：

 df.plot.scatter(x="a", y="b", c="c", s=50);

能够将第三个参数变为散点的大小：

df.plot.scatter(x="a", y="b", s=df["c"] * 200);

应用 DataFrame.plot.hexbin() 能够创立蜂窝图：

 In [69]: df = pd.DataFrame(np.random.randn(1000, 2), columns=["a", "b"])
 
In [70]: df["b"] = df["b"] + np.arange(1000)
 
In [71]: df.plot.hexbin(x="a", y="b", gridsize=25);

默认状况下色彩深度示意的是（x，y）中元素的个数，能够通过 reduce_C_function 来指定不同的聚合办法：比方 mean, max, sum, std.

 In [72]: df = pd.DataFrame(np.random.randn(1000, 2), columns=["a", "b"])
 
In [73]: df["b"] = df["b"] = df["b"] + np.arange(1000)
 
In [74]: df["z"] = np.random.uniform(0, 3, 1000)
 
In [75]: df.plot.hexbin(x="a", y="b", C="z", reduce_C_function=np.max, gridsize=25);

应用 DataFrame.plot.pie() 或者 Series.plot.pie() 来构建饼图：

 In [76]: series = pd.Series(3 * np.random.rand(4), index=["a", "b", "c", "d"], name="series")
 
In [77]: series.plot.pie(figsize=(6, 6));

能够依照列的个数别离作图：

 In [78]: df = pd.DataFrame(....:     3 * np.random.rand(4, 2), index=["a", "b", "c", "d"], columns=["x", "y"]
   ....: )
   ....: 
 
In [79]: df.plot.pie(subplots=True, figsize=(8, 4));

更多定制化的内容：

 In [80]: series.plot.pie(....:     labels=["AA", "BB", "CC", "DD"],
   ....:     colors=["r", "g", "b", "c"],
   ....:     autopct="%.2f",
   ....:     fontsize=20,
   ....:     figsize=(6, 6),
   ....: );

如果传入的 value 值加起来不是 1，那么会画出一个伞形：

 In [81]: series = pd.Series([0.1] * 4, index=["a", "b", "c", "d"], name="series2")
 
In [82]: series.plot.pie(figsize=(6, 6));

上面是默认画图形式中解决 NaN 数据的形式：

画图形式	解决 NaN 的形式
Line	Leave gaps at NaNs
Line (stacked)	Fill 0’s
Bar	Fill 0’s
Scatter	Drop NaNs
Histogram	Drop NaNs (column-wise)
Box	Drop NaNs (column-wise)
Area	Fill 0’s
KDE	Drop NaNs (column-wise)
Hexbin	Drop NaNs
Pie	Fill 0’s

能够应用 pandas.plotting 中的 scatter_matrix 来画散点矩阵图：

 In [83]: from pandas.plotting import scatter_matrix
 
In [84]: df = pd.DataFrame(np.random.randn(1000, 4), columns=["a", "b", "c", "d"])
 
In [85]: scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal="kde");

应用 Series.plot.kde() 和 DataFrame.plot.kde() 能够画出密度图：

 In [86]: ser = pd.Series(np.random.randn(1000))
 
In [87]: ser.plot.kde();

安德鲁斯曲线容许将多元数据绘制为大量曲线，这些曲线是应用样本的属性作为傅里叶级数的系数创立的. 通过为每个类对这些曲线进行不同的着色，能够可视化数据聚类。属于同一类别的样本的曲线通常会更凑近在一起并造成较大的构造。

 In [88]: from pandas.plotting import andrews_curves
 
In [89]: data = pd.read_csv("data/iris.data")
 
In [90]: plt.figure();
 
In [91]: andrews_curves(data, "Name");

平行坐标是一种用于绘制多元数据的绘制技术。平行坐标容许人们查看数据中的聚类，并直观地预计其余统计信息。应用平行坐标点示意为连贯的线段。每条垂直线代表一个属性。一组连贯的线段代表一个数据点。趋于汇集的点将显得更凑近。

 In [92]: from pandas.plotting import parallel_coordinates
 
In [93]: data = pd.read_csv("data/iris.data")
 
In [94]: plt.figure();
 
In [95]: parallel_coordinates(data, "Name");

滞后图是用工夫序列和相应的滞后阶数序列做出的散点图。能够用于观测自相关性。

 In [96]: from pandas.plotting import lag_plot
 
In [97]: plt.figure();
 
In [98]: spacing = np.linspace(-99 * np.pi, 99 * np.pi, num=1000)
 
In [99]: data = pd.Series(0.1 * np.random.rand(1000) + 0.9 * np.sin(spacing))
 
In [100]: lag_plot(data);

自相干图通常用于查看工夫序列中的随机性。自相干图是一个立体二维坐标悬垂线图。横坐标示意提早阶数，纵坐标示意自相关系数。

 In [101]: from pandas.plotting import autocorrelation_plot
 
In [102]: plt.figure();
 
In [103]: spacing = np.linspace(-9 * np.pi, 9 * np.pi, num=1000)
 
In [104]: data = pd.Series(0.7 * np.random.rand(1000) + 0.3 * np.sin(spacing))
 
In [105]: autocorrelation_plot(data);

bootstrap plot 用于直观地评估统计数据的不确定性，例如均值，中位数，两头范畴等。从数据集中抉择指定大小的随机子集，为该子集计算出相干统计信息，反复指定的次数。生成的图和直方图形成了疏导图。

 In [106]: from pandas.plotting import bootstrap_plot
 
In [107]: data = pd.Series(np.random.rand(1000))
 
In [108]: bootstrap_plot(data, size=50, samples=500, color="grey");

他是基于弹簧张力最小化算法。它把数据集的特色映射成二维指标空间单位圆中的一个点，点的地位由系在点上的特色决定。把实例投入圆的核心，特色会朝圆中此实例地位（实例对应的归一化数值）“拉”实例。

 In [109]: from pandas.plotting import radviz
 
In [110]: data = pd.read_csv("data/iris.data")
 
In [111]: plt.figure();
 
In [112]: radviz(data, "Name");

matplotlib 1.5 版本之后，提供了很多默认的画图设置，能够通过 matplotlib.style.use(my_plot_style) 来进行设置。

能够通过应用 matplotlib.style.available 来列出所有可用的 style 类型：

 import matplotlib as plt;
 
plt.style.available
Out[128]: 
['seaborn-dark',
 'seaborn-darkgrid',
 'seaborn-ticks',
 'fivethirtyeight',
 'seaborn-whitegrid',
 'classic',
 '_classic_test',
 'fast',
 'seaborn-talk',
 'seaborn-dark-palette',
 'seaborn-bright',
 'seaborn-pastel',
 'grayscale',
 'seaborn-notebook',
 'ggplot',
 'seaborn-colorblind',
 'seaborn-muted',
 'seaborn',
 'Solarize_Light2',
 'seaborn-paper',
 'bmh',
 'seaborn-white',
 'dark_background',
 'seaborn-poster',
 'seaborn-deep']

默认状况下画进去的图会有一个示意列类型的图标，能够应用 legend=False 禁用：

 In [115]: df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list("ABCD"))
 
In [116]: df = df.cumsum()
 
In [117]: df.plot(legend=False);

 In [118]: df.plot();
 
In [119]: df.plot(xlabel="new x", ylabel="new y");

画图中如果 X 轴或者 Y 轴的数据差别过大，可能会导致图像展现不敌对，数值小的局部基本上无奈展现，能够传入 logy=True 进行 Y 轴的缩放：

 In [120]: ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
 
In [121]: ts = np.exp(ts.cumsum())
 
In [122]: ts.plot(logy=True);

应用 secondary_y=True 能够绘制多个 Y 轴数据：

 In [125]: plt.figure();
 
In [126]: ax = df.plot(secondary_y=["A", "B"])
 
In [127]: ax.set_ylabel("CD scale");
 
In [128]: ax.right_ax.set_ylabel("AB scale");

小图标下面默认会增加 right 字样，想要去掉的话能够设置 mark_right=False：

 In [129]: plt.figure();
 
In [130]: df.plot(secondary_y=["A", "B"], mark_right=False);

应用工夫做坐标的时候，因为工夫太长，导致 x 轴的坐标值显示不残缺，能够应用 x_compat=True 来进行调整：

 In [133]: plt.figure();
 
In [134]: df["A"].plot(x_compat=True);

如果有多个图像须要调整，能够应用 with：

 In [135]: plt.figure();
 
In [136]: with pd.plotting.plot_params.use("x_compat", True):
   .....:     df["A"].plot(color="r")
   .....:     df["B"].plot(color="g")
   .....:     df["C"].plot(color="b")
   .....:

绘制 DF 的时候，能够将多个 Series 离开作为子图显示：

In [137]: df.plot(subplots=True, figsize=(6, 6));

能够批改子图的 layout：

df.plot(subplots=True, layout=(2, 3), figsize=(6, 6), sharex=False);

下面等价于：

In [139]: df.plot(subplots=True, layout=(2, -1), figsize=(6, 6), sharex=False);

一个更简单的例子：

 In [140]: fig, axes = plt.subplots(4, 4, figsize=(9, 9))
 
In [141]: plt.subplots_adjust(wspace=0.5, hspace=0.5)
 
In [142]: target1 = [axes[0][0], axes[1][1], axes[2][2], axes[3][3]]
 
In [143]: target2 = [axes[3][0], axes[2][1], axes[1][2], axes[0][3]]
 
In [144]: df.plot(subplots=True, ax=target1, legend=False, sharex=False, sharey=False);
 
In [145]: (-df).plot(subplots=True, ax=target2, legend=False, sharex=False, sharey=False);

如果设置 table=True，能够间接将表格数据一并显示在图中：

 In [165]: fig, ax = plt.subplots(1, 1, figsize=(7, 6.5))
 
In [166]: df = pd.DataFrame(np.random.rand(5, 3), columns=["a", "b", "c"])
 
In [167]: ax.xaxis.tick_top()  # Display x-axis ticks on top.
 
In [168]: df.plot(table=True, ax=ax)
 
fig

table 还能够显示在图片下面：

 In [172]: from pandas.plotting import table
 
In [173]: fig, ax = plt.subplots(1, 1)
 
In [174]: table(ax, np.round(df.describe(), 2), loc="upper right", colWidths=[0.2, 0.2, 0.2]);
 
In [175]: df.plot(ax=ax, ylim=(0, 2), legend=None);

如果 Y 轴的数据太多的话，应用默认的线的色彩可能不好分辨。这种状况下能够传入 colormap。

 In [176]: df = pd.DataFrame(np.random.randn(1000, 10), index=ts.index)
 
In [177]: df = df.cumsum()
 
In [178]: plt.figure();
 
In [179]: df.plot(colormap="cubehelix");

本文已收录于 http://www.flydean.com/09-python-pandas-plot/

最艰深的解读，最粗浅的干货，最简洁的教程，泛滥你不晓得的小技巧等你来发现！

关于pandas:Pandas高级教程之plot画图详解

简介

根底画图

其余图像

bar

stacked bar

barh

Histograms

box

Area

Scatter

Hexagonal bin

Pie

在画图中解决 NaN 数据

其余作图工具

散点矩阵图 Scatter matrix

密度图 Density plot

安德鲁斯曲线 Andrews curves

平行坐标 Parallel coordinates

滞后图 lag plot

自相干图 Autocorrelation plot

Bootstrap plot

RadViz

图像的格局

去掉小图标

设置 label 的名字

缩放

多个 Y 轴

坐标文字调整

子图

画表格

应用 Colormaps

Just My Socks（注册教程内含优惠码）

	ts = pd.Series(np.random.randn(365), index=pd.date_range("1/1/2020", periods=365))

	ts.plot()

	df3 = pd.DataFrame(np.random.randn(365, 4), index=ts.index, columns=list("ABCD"))

	df3= df3.cumsum()

	df3.plot()

	df3 = pd.DataFrame(np.random.randn(365, 2), columns=["B", "C"]).cumsum()

	df3["A"] = pd.Series(list(range(len(df))))

	df3.plot(x="A", y="B");

	df2 = pd.DataFrame(np.random.rand(10, 4), columns=["a", "b", "c", "d"])

	df2.plot.bar();

	color = {
	....: "boxes": "DarkGreen",
	....: "whiskers": "DarkOrange",
	....: "medians": "DarkBlue",
	....: "caps": "Gray",
	....: }

	df.plot.box(color=color, sym="r+");

	In [42]: df = pd.DataFrame(np.random.rand(10, 5))

	In [44]: bp = df.boxplot()

	df = pd.DataFrame(np.random.rand(10, 2), columns=["Col1", "Col2"])

	df
	Out[90]:
	Col1 Col2
	0 0.047633 0.150047
	1 0.296385 0.212826
	2 0.562141 0.136243
	3 0.997786 0.224560
	4 0.585457 0.178914
	5 0.551201 0.867102
	6 0.740142 0.003872
	7 0.959130 0.581506
	8 0.114489 0.534242
	9 0.042882 0.314845

	df.boxplot()

	df["X"] = pd.Series(["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"])

	df
	Out[92]:
	Col1 Col2 X
	0 0.047633 0.150047 A
	1 0.296385 0.212826 A
	2 0.562141 0.136243 A
	3 0.997786 0.224560 A
	4 0.585457 0.178914 A
	5 0.551201 0.867102 B
	6 0.740142 0.003872 B
	7 0.959130 0.581506 B
	8 0.114489 0.534242 B
	9 0.042882 0.314845 B

	bp = df.boxplot(by="X")

	In [60]: df = pd.DataFrame(np.random.rand(10, 4), columns=["a", "b", "c", "d"])

	In [61]: df.plot.area();

	In [63]: df = pd.DataFrame(np.random.rand(50, 4), columns=["a", "b", "c", "d"])

	In [64]: df.plot.scatter(x="a", y="b");

	In [69]: df = pd.DataFrame(np.random.randn(1000, 2), columns=["a", "b"])

	In [70]: df["b"] = df["b"] + np.arange(1000)

	In [71]: df.plot.hexbin(x="a", y="b", gridsize=25);

	In [72]: df = pd.DataFrame(np.random.randn(1000, 2), columns=["a", "b"])

	In [73]: df["b"] = df["b"] = df["b"] + np.arange(1000)

	In [74]: df["z"] = np.random.uniform(0, 3, 1000)

	In [75]: df.plot.hexbin(x="a", y="b", C="z", reduce_C_function=np.max, gridsize=25);

	In [76]: series = pd.Series(3 * np.random.rand(4), index=["a", "b", "c", "d"], name="series")

	In [77]: series.plot.pie(figsize=(6, 6));

	In [78]: df = pd.DataFrame(....: 3 * np.random.rand(4, 2), index=["a", "b", "c", "d"], columns=["x", "y"]
	....: )
	....:

	In [79]: df.plot.pie(subplots=True, figsize=(8, 4));

	In [80]: series.plot.pie(....: labels=["AA", "BB", "CC", "DD"],
	....: colors=["r", "g", "b", "c"],
	....: autopct="%.2f",
	....: fontsize=20,
	....: figsize=(6, 6),
	....: );

	In [81]: series = pd.Series([0.1] * 4, index=["a", "b", "c", "d"], name="series2")

	In [82]: series.plot.pie(figsize=(6, 6));

	In [83]: from pandas.plotting import scatter_matrix

	In [84]: df = pd.DataFrame(np.random.randn(1000, 4), columns=["a", "b", "c", "d"])

	In [85]: scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal="kde");

	In [86]: ser = pd.Series(np.random.randn(1000))

	In [87]: ser.plot.kde();

	In [88]: from pandas.plotting import andrews_curves

	In [89]: data = pd.read_csv("data/iris.data")

	In [90]: plt.figure();

	In [91]: andrews_curves(data, "Name");

	In [92]: from pandas.plotting import parallel_coordinates

	In [93]: data = pd.read_csv("data/iris.data")

	In [94]: plt.figure();

	In [95]: parallel_coordinates(data, "Name");

关于pandas:Pandas高级教程之plot画图详解

简介

根底画图

其余图像

bar

stacked bar

barh

Histograms

box

Area

Scatter

Hexagonal bin

Pie

在画图中解决 NaN 数据

其余作图工具

散点矩阵图 Scatter matrix

密度图 Density plot

安德鲁斯曲线 Andrews curves

平行坐标 Parallel coordinates

滞后图 lag plot

自相干图 Autocorrelation plot

Bootstrap plot

RadViz

图像的格局

去掉小图标

设置 label 的名字

缩放

多个 Y 轴

坐标文字调整

子图

画表格

应用 Colormaps

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）

	In [96]: from pandas.plotting import lag_plot

	In [97]: plt.figure();

	In [98]: spacing = np.linspace(-99 * np.pi, 99 * np.pi, num=1000)

	In [99]: data = pd.Series(0.1 * np.random.rand(1000) + 0.9 * np.sin(spacing))

	In [100]: lag_plot(data);

	In [101]: from pandas.plotting import autocorrelation_plot

	In [102]: plt.figure();

	In [103]: spacing = np.linspace(-9 * np.pi, 9 * np.pi, num=1000)

	In [104]: data = pd.Series(0.7 * np.random.rand(1000) + 0.3 * np.sin(spacing))

	In [105]: autocorrelation_plot(data);

	In [106]: from pandas.plotting import bootstrap_plot

	In [107]: data = pd.Series(np.random.rand(1000))

	In [108]: bootstrap_plot(data, size=50, samples=500, color="grey");

	In [109]: from pandas.plotting import radviz

	In [110]: data = pd.read_csv("data/iris.data")

	In [111]: plt.figure();

	In [112]: radviz(data, "Name");

	import matplotlib as plt;

	plt.style.available
	Out[128]:
	['seaborn-dark',
	'seaborn-darkgrid',
	'seaborn-ticks',
	'fivethirtyeight',
	'seaborn-whitegrid',
	'classic',
	'_classic_test',
	'fast',
	'seaborn-talk',
	'seaborn-dark-palette',
	'seaborn-bright',
	'seaborn-pastel',
	'grayscale',
	'seaborn-notebook',
	'ggplot',
	'seaborn-colorblind',
	'seaborn-muted',
	'seaborn',
	'Solarize_Light2',
	'seaborn-paper',
	'bmh',
	'seaborn-white',
	'dark_background',
	'seaborn-poster',
	'seaborn-deep']

	In [115]: df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list("ABCD"))

	In [116]: df = df.cumsum()

	In [117]: df.plot(legend=False);

	In [118]: df.plot();

	In [119]: df.plot(xlabel="new x", ylabel="new y");

	In [120]: ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))

	In [121]: ts = np.exp(ts.cumsum())

	In [122]: ts.plot(logy=True);

	In [125]: plt.figure();

	In [126]: ax = df.plot(secondary_y=["A", "B"])

	In [127]: ax.set_ylabel("CD scale");

	In [128]: ax.right_ax.set_ylabel("AB scale");

	In [129]: plt.figure();

	In [130]: df.plot(secondary_y=["A", "B"], mark_right=False);