关于数据挖掘:Python用KShape对时间序列进行聚类和肘方法确定最优聚类数k可视化附代码数据

65次阅读

共计 13351 个字符,预计需要花费 34 分钟才能阅读完成。

全文链接:http://tecdat.cn/?p=27078

最近咱们被客户要求撰写对于 KShape 对工夫序列进行聚类的钻研报告,包含一些图形和统计输入。

时序数据的聚类办法,该算法依照以下流程执行。

  1. 应用基于相互关测量的间隔标度(基于形态的间隔:SBD)
  2. 依据 1 计算工夫序列聚类的质心。(一种新的基于质心的聚类算法,可保留工夫序列的形态)
  3. 划分成每个簇的办法和个别的 kmeans 一样,然而在计算间隔尺度和重心的时候应用下面的 1 和 2。
import pandas as pd

    # 读取数据帧,将其转化为工夫序列数组,并将其存储在一个列表中    tata = []    for i, df in enmee(dfs):

        

        # 查看每个工夫序列数据的最大长度。for ts in tsda:

            if len(s) > ln_a:

                lenmx = len(ts)

        

        # 给出最初一个数据,以调整工夫序列数据的长度        for i, ts in enumerate(tsdata):

            dta[i] = ts + [ts[-1]] * n_dd

    





    # 转换为矢量    stack_list = []    for j in range(len(timeseries_dataset)):

       

        stack_list.append(data)

    

    # 转换为一维数组    trasfome_daa = np.stack(ack_ist, axis=0)

    return trafoed_data

数据集筹备

# 文件列表 flnes= soted(go.ob('mpldat/smeda*.csv'))
# 从文件中加载数据帧并将其存储在一个列表中。for ienme in fiemes:

    df = pd.read_csv(filnme, indx_cl=one,hadr=0)    flt.append(df)

聚类后果的可视化

# 为了计算穿插关系,须要对它们进行归一化解决。# TimeSeriesScalerMeanVariance 将是对数据进行规范化的类。sac_da = TimeeiesalerMVarne(mu=0.0, std=1.0).fit_trnform(tranfome_data)# KShape 类的实例化。ks = KShpe(_clusrs=2, n_nit=10, vrboe=True, rano_stte=sed)

yprd = ks.ft_reitsak_ata)# 聚类和可视化 plt.tight_layout()

plt.show()


点击题目查阅往期内容

R 语言 k -Shape 工夫序列聚类办法对股票价格工夫序列聚类

左右滑动查看更多

01

02

03

04

用肘法计算簇数

  • 什么是肘法...
  • 计算从每个点到簇核心的间隔的平方和,指定为簇内误差平方和 (SSE)。
  • 它是一种更改簇数,绘制每个 SSE 值,并将像“肘”一样蜿蜒的点设置为最佳簇数的办法。

    # 计算到 1~10 个群组 for i  in range(1,11):

    #进行聚类计算。

    ks.fit(sacdta)

    #KS.fit 给出 KS.inrta_    disorons.append(ks.netia_)

plt.plot(range(1,11), disorins, marker=’o’)


![图片](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/aabdcae5337f4e92abfe8a84467d808c~tplv-k3u1fbpfcp-zoom-1.image)

![图片](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/15e517427eb74587ae77b5252e7d9212~tplv-k3u1fbpfcp-zoom-1.image)

* * *

 

* * *

  


![图片](https://p3-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/aa2a0b1887174c00a26df53fab8acabb~tplv-k3u1fbpfcp-zoom-1.image)

点击文末 **“浏览原文”**

获取全文残缺材料。本文选自《**Python 用 KShape 对工夫序列进行聚类和肘办法确定最优聚类数 k 可视化 **》。** 点击题目查阅往期内容 **

[R 语言中的 SOM(自组织映射神经网络)对 NBA 球员聚类分析](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247516548&idx=2&sn=5b75c1e34d6b40864244e1aa5a1dbe94&chksm=fd92bf8fcae5369996bef14478ddee6c2a8592019daec30320c018c1e3a3e00fdd95ce1d0f0d&scene=21#wechat_redirect)  
[K-means 和档次聚类分析癌细胞系微阵列数据和树状图可视化比拟](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247510121&idx=1&sn=9c2b39769533229d1fd5bb2cdf185be7&chksm=fd929662cae51f74e0d487f8141a76612f721f329431ff3ec806ed6be9634da98f11f02031c7&scene=21#wechat_redirect)[KMEANS 均值聚类和档次聚类:亚洲国家地区生存幸福品质异同可视化剖析和抉择最佳聚类数](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247508401&idx=2&sn=03b8a812234cc82a7f4cccd365fb1f97&chksm=fd929fbacae516ac2b6ea9a1bc73e30d40fff4f9cdab45a14213c08062f45c2085c5a19f5c43&scene=21#wechat_redirect)  
[PYTHON 实现谱聚类算法和扭转聚类簇数后果可视化比拟](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247508051&idx=1&sn=640bd7a0dbd844d50a1a6f4d89e68c45&chksm=fd929e58cae5174e523d422a2253efebdd254f507d3b80d92519a4de21cc3a1ea784ddacb617&scene=21#wechat_redirect)  
[无限混合模型聚类 FMM、狭义线性回归模型 GLM 混合利用剖析威士忌市场和钻研专利申请数据](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247507609&idx=1&sn=2526c1a15e5c2b64c4eeb218767f2a71&chksm=fd92e092cae5698418bf5bf79c98837000b8c711e81ceabb30afd7fd074c12e7729f61f390c9&scene=21#wechat_redirect)  
[R 语言多维数据档次聚类散点图矩阵、配对图、平行坐标图、树状图可视化城市宏观经济指标数据](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247506893&idx=1&sn=3722d123322132225f076f8ccd97e2d7&chksm=fd92e5c6cae56cd0775bc5b88e7a9406613e5f04ecba1fb073190228b0649606bff742b86976&scene=21#wechat_redirect)  
[r 语言无限正态混合模型 EM 算法的分层聚类、分类和密度估计及可视化](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247502736&idx=1&sn=d1b8691595a347f58e489fc0ce6edaf0&chksm=fd92f59bcae57c8de701ec891d3c8ec3bed5bc2a798d9d5937ba977372c8f07c8090caa013f5&scene=21#wechat_redirect)  
[Python Monte Carlo K-Means 聚类实战钻研](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247501328&idx=2&sn=cb254a796edf83b34bf66ef43c651ec0&chksm=fd92f81bcae5710daaf84437bd99e7e7fa35a1634c6081bedf2db4e9b21a8ea72da44e8fc23a&scene=21#wechat_redirect)  
[R 语言 k -Shape 工夫序列聚类办法对股票价格工夫序列聚类](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247500705&idx=2&sn=1e9d8fbef30b7e62aa5bd16c304f6088&chksm=fd92fdaacae574bc2ea18b9f9a3791b555e904a44d68dc3be9e3c675461cb41902baaddc5286&scene=21#wechat_redirect)  
[R 语言对用电负荷工夫序列数据进行 K -medoids 聚类建模和 GAM 回归](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247500286&idx=2&sn=a81aebc0b60318d226968e07b0020db1&chksm=fd92fff5cae576e356df36372b9d935f0b914f13ad1ba80c91b1185acd92f2efaf2b745bcebd&scene=21#wechat_redirect)  
[R 语言谱聚类、K-MEANS 聚类分析非线性环状数据比拟](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247500104&idx=1&sn=1eb955cf5c730ed5cd07bdbe07472bb9&chksm=fd92ff43cae576552fd2a14d14f0e2af3d59f6f410221eb24b51ecc5ee0d0113f14c05c22767&scene=21#wechat_redirect)  
[R 语言实现 k -means 聚类优化的分层抽样 (Stratified Sampling) 剖析各市镇的人口](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247499299&idx=1&sn=694818311fdfafcfd3b1bd2553200085&chksm=fd92c028cae5493e1168a2a59bd8301f001c2efb9dd1620b8e258c4b8dde1a1e7534b80caea7&scene=21#wechat_redirect)  
[R 语言聚类有效性:确定最优聚类数剖析 IRIS 鸢尾花数据和可视化](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247498712&idx=1&sn=7683c54364c529923c843df31882a2ce&chksm=fd92c5d3cae54cc52dd7be623292ec059a5d924ee36c0029e8930474db6c65d0267ff0818ca4&scene=21#wechat_redirect)[Python、R 对小说进行文本开掘和档次聚类可视化剖析案例](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247498465&idx=2&sn=91b7297690d8ddc5194e987a4e8a3357&chksm=fd92c4eacae54dfcc0bfab86f878b028f42302a83efb3a59e5de519a979b298f4fa8af8d68a4&scene=21#wechat_redirect)  
[R 语言 k -means 聚类、档次聚类、主成分(PCA)降维及可视化剖析鸢尾花 iris 数据集](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247498465&idx=1&sn=dd4c1f65a1c2d8cd0dfd1236c827492a&chksm=fd92c4eacae54dfc1cda2f924c5c5c60cf497eb2079cd5459a39cc9daefca3aec7fd76cb9c32&scene=21#wechat_redirect)  
[R 语言无限混合模型(FMM,finite mixture model)EM 算法聚类分析间歇泉喷发工夫](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247497673&idx=1&sn=b31e2b3c1f703d201f09ca203caddce3&chksm=fd92c9c2cae540d4c13dbefbed1d034c62386ec2744ce372eae8099d8fdb936f857219b3fb74&scene=21#wechat_redirect)  
[R 语言用温度对城市档次聚类、kmean 聚类、主成分剖析和 Voronoi 图可视化](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247495156&idx=1&sn=82e9cd3cbfe47006cbb694eb24a0d74e&chksm=fd92d3ffcae55ae94850149f0d8f56de96129429754ab6b15a3e84f8c06dae05ab803431ac2a&scene=21#wechat_redirect)  
[R 语言 k -Shape 工夫序列聚类办法对股票价格工夫序列聚类](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247493463&idx=2&sn=18232f6b637b9272ac503924392fe8d3&chksm=fd92d95ccae5504a43f777aa15ef10d6f74e9701c4318c41b636cbf61a3649ee2b5b2bd1d233&scene=21#wechat_redirect)  
[R 语言中的 SOM(自组织映射神经网络)对 NBA 球员聚类分析](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247493017&idx=1&sn=7fd07acebc2e5d7216236f287a333914&chksm=fd92db92cae55284671aaaeba424b5d6a892a293e181e6831c8585a0432b394b0f5645649e67&scene=21#wechat_redirect)  
[R 语言简单网络分析:聚类(社区检测)和可视化](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247492713&idx=2&sn=3b1341efc989f5f395929c798ac9da60&chksm=fd92da62cae55374bec3b89f9e382a5e143171bebce4171f214a4af4d43dc6fe70c2d23b01d5&scene=21#wechat_redirect)  
[R 语言中的划分聚类模型](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247492472&idx=3&sn=43056832fc7b4b5dd6e1cbd0f035f9de&chksm=fd92dd73cae55465b419f4572d0ba0497a20bae6047c891bc6399e12d91a0caf7d83c578e5b4&scene=21#wechat_redirect)  
[基于模型的聚类和 R 语言中的高斯混合模型](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247492432&idx=3&sn=2960efe44c2a0bbab0e2c804755adf70&chksm=fd92dd5bcae5544d6328947ce22bcb4fe9a574f440f7d4b016a4642e3ea0a74ef69aa1c88408&scene=21#wechat_redirect)  
[r 语言聚类分析:k-means 和档次聚类](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247491996&idx=3&sn=76fdce2eeb0f9eb0bcf8681e258c0a99&chksm=fd92df97cae55681e9c371f5fb7d25912d3bae75093e32f476d6bc73403f46af15c1859a5694&scene=21#wechat_redirect)  
[SAS 用 K -Means 聚类最优 k 值的选取和剖析](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247491396&idx=1&sn=4ca13d1b6bda580d7a60605f8e1ed2de&chksm=fd91214fcae6a859b5e670be257cf3ea29892c63b66927e48514e190e703547ff1fb8f93bd1b&scene=21#wechat_redirect)  
[用 R 语言进行网站评论文本开掘聚类](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247489693&idx=3&sn=ed0a65fc1019f2e62e90734e25b2e6cc&chksm=fd912696cae6af8050cdcb5c516ffd4ea98278ef438712c07a01c6f11f0a17f5a3744ff24d84&scene=21#wechat_redirect)  
[基于 LDA 主题模型聚类的商品评论文本开掘](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247489519&idx=3&sn=bd8179e866ebd67286a6f5e8afda3de1&chksm=fd9129e4cae6a0f29dc622fb174dc0e321f4cbf93afb0f5dee5051cd0cbb0ee9677012e6507b&scene=21#wechat_redirect)  
[R 语言鸢尾花 iris 数据集的档次聚类分析](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247488780&idx=1&sn=8426dcbc64a4485383d333e3e440c81c&chksm=fd912b07cae6a21107a3aa6e9c8aefe8b2e96aea6864b8c5f561618aeeee4c3abf67e3332be3&scene=21#wechat_redirect)  
[R 语言对用电负荷工夫序列数据进行 K -medoids 聚类建模和 GAM 回归](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247488780&idx=2&sn=c56669c116190eb04e2639194cb912f8&chksm=fd912b07cae6a211fdb7c8e8dabd6045330657c14735f07b8356d90434e622a7a7e5c1f779f9&scene=21#wechat_redirect)  
[R 语言聚类算法的利用实例](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247487708&idx=3&sn=a7b207b86934c101a22a223c40b4741c&chksm=fd912ed7cae6a7c1d179c0b921c199beae98bc67f49939d3551adca08f0360f7b1feac861b01&scene=21#wechat_redirect)[对用电负荷工夫序列数据进行 K -medoids 聚类建模和 GAM 回归](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247516271&idx=3&sn=781a28e7bc8e5db5fb0e368d7a76566b&chksm=fd92be64cae53772b909ea0dfaa4376c8852e6f171c11c60fbe0f5fde61c095349fc08dc1eaf&scene=21#wechat_redirect)  
[分位数回归、GAM 样条曲线、指数平滑和 SARIMA 对电力负荷工夫序列预测](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247509153&idx=3&sn=4aff9db04827809fecf70d91fdc482df&chksm=fd929aaacae513bccaf6b595ed861a87113abe5f140bf6982075d98fd31867c9a436698b5312&scene=21#wechat_redirect)  
[【视频】R 语言狭义相加模型(GAM)在电力负荷预测中的利用](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247500246&idx=1&sn=f41d4fcb12796ed52b7c4cdd2fd04099&chksm=fd92ffddcae576cbb10dc1f819079c3212c8edfd0e1870927cb9067739a023b034375e04538c&scene=21#wechat_redirect)  
[R 语言里的非线性模型:多项式回归、部分样条、平滑样条、狭义相加模型 GAM 剖析](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247496258&idx=2&sn=d651743315f28e19a94c818fc3fdad9b&chksm=fd92cc49cae5455fd193e776dca6aed628f5d8fbafef2a3420f3567eb1d7a5d9caf696719286&scene=21#wechat_redirect)  
[R 语言用规范最小二乘 OLS,狭义相加模型 GAM,样条函数进行逻辑回归 LOGISTIC 分类](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247495252&idx=1&sn=28310cd9e77eeb87dbe8d8b4870be72d&chksm=fd92d05fcae5594948645b998c141b8a352d7cd4df043a0ced68e8032b7185af6523bebdd790&scene=21#wechat_redirect)  
[R 语言 ISLR 工资数据进行多项式回归和样条回归剖析](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247494779&idx=2&sn=470849162dfc3270c69e18999e6c025a&chksm=fd92d270cae55b66adf34e437f19698959dd211bb7d54bb9c8fd84df4d550879eb3a8c9d83b5&scene=21#wechat_redirect)  
[R 语言中的多项式回归、部分回归、核平滑和平滑样条回归模型](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247494720&idx=1&sn=def27845fe8c39827fb580baa90bd0b0&chksm=fd92d24bcae55b5d477d0fa66632940e1c675f8efa585c65218f88d2ed58d69e8bd4c688971d&scene=21#wechat_redirect)  
[R 语言用泊松 Poisson 回归、GAM 样条曲线模型预测骑自行车者的数量](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247492508&idx=1&sn=ea5713493413c251cdd5c2104c80f4cb&chksm=fd92dd97cae554810882ab453f5d45dd61e6699a67c41202ea6fc50cab3c5fc1398f8efa736f&scene=21#wechat_redirect)  
[R 语言中的多项式回归、B 样条曲线 (B-spline Curves) 回归](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247491963&idx=1&sn=dcf6f3c7dadf5dd449060c9001bb4466&chksm=fd92df70cae55666baace37b76c44e3a439cc53cdf50aa9e1a5c73d06470cd29baf7a583f37f&scene=21#wechat_redirect)  
[R 语言狭义相加模型 (GAMs)剖析预测 CO2 工夫序列数据](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247495111&idx=1&sn=6a918ec70f6055e1e680160dbb4db655&chksm=fd92d3cccae55ada6264dcb7a9fd3996688d5616f3e04b0634a83ae5266f2e508f7e9d67bbe7&scene=21#wechat_redirect)  
[R 语言中实现狭义相加模型 GAM 和一般最小二乘 (OLS) 回归](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247495047&idx=1&sn=0afeb7e0614f936d554b2ecddb42e8dd&chksm=fd92d38ccae55a9a2ed44cc090a403b45782203ac1040ba4eaa33bf3165734517eb628415e4e&scene=21#wechat_redirect)  
[在 r 语言中应用 GAM(狭义相加模型)进行电力负荷工夫序列剖析](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247493193&idx=2&sn=e12e1946b1b650dde707444fb4b3b202&chksm=fd92d842cae55154975aab8696680d4b51eff20b9ce765ea13ea6211e378b0bcf0dd422a7f5d&scene=21#wechat_redirect)  
[R 语言用泊松 Poisson 回归、GAM 样条曲线模型预测骑自行车者的数量](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247492508&idx=1&sn=ea5713493413c251cdd5c2104c80f4cb&chksm=fd92dd97cae554810882ab453f5d45dd61e6699a67c41202ea6fc50cab3c5fc1398f8efa736f&scene=21#wechat_redirect)  
[Python 用狭义加性模型 GAM 进行工夫序列剖析](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247492268&idx=2&sn=375ecb35c5d83bd11a597302cb3f03a0&chksm=fd92dca7cae555b10e66e082ae7d10a3420c5c8f1498426ea443e7e3429f68ab9d07fb7ba50f&scene=21#wechat_redirect)  
[R 语言狭义线性模型 GLM、多项式回归和狭义可加模型 GAM 预测泰坦尼克号幸存者](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247492140&idx=1&sn=843b0c171b2b8a6574a6585fda0263dd&chksm=fd92dc27cae55531a04913c9f9332bdbf1b5adb0da209f3aa4c256d5456b6ea64b10a40b4b71&scene=21#wechat_redirect)  
[R 语言中的狭义线性模型(GLM)和狭义相加模型(GAM):多元(平滑)回归剖析保险资金投资组合信用风险敞口](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247489964&idx=3&sn=8ce0c4a13c1e03422b69c8d4398e8414&chksm=fd9127a7cae6aeb11709a2fe0cf8c16b875aa3c5037420ac0f56545a7ec7f9020cd296068f2e&scene=21#wechat_redirect)  
[R 语言对用电负荷工夫序列数据进行 K -medoids 聚类建模和 GAM 回归](http://mp.weixin.qq.com/s?__biz=MzU4NTA1MDk4MA==&mid=2247488780&idx=2&sn=c56669c116190eb04e2639194cb912f8&chksm=fd912b07cae6a211fdb7c8e8dabd6045330657c14735f07b8356d90434e622a7a7e5c1f779f9&scene=21#wechat_redirect)  

正文完
 0