基于孤立森林算法的集中供热系统异常数据识别研究
摘要:针对集中供热系统异常数据识别工作量大的问题,本文提出采用孤立森林(IF)算法进行异常数据的自动识别。以天津某换热站一个供暖季的数据作为样本,详细分析了集中供热系统数据本身的物理规律及IF算法设定参数对模型性能的影响规律。针对集中供热系统运行调节所导致的部分正常数据误诊率高的问题提出了数据集参数相对化的方法,通过该方法可以降低6.7%的数据误诊率和44.6%的漏诊率。通过对比不同IF算法设定参数下的模型性能,给出了供热系统数据异常识别的推荐参数设定范围。
关键词:集中供热系统异常数据自动识别孤立森林算法参数相对化
尊敬的用户,本篇文章需要2元,点击支付交费后阅读
限时优惠福利:领取VIP会员
全年期刊、VIP视频免费!
全年期刊、VIP视频免费!
参考文献[1] LUND H,WERNER S,WILTSHIRE R,et al.4th generation district heating (4GDH) integrating smart thermal grids into future sustainable energy systems[J].Energy,2014,68:1- 11.
[2] LIU F T,TING K M,ZHOU Z H.Isolation forest[C]//2008 Eighth IEEE International Conference on Data Mining,2008:413- 422.
[3] DING Z G,FEI M R.An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window[C]//3rd IFAC International Conference on Intelligent Control and Automation Science,2013:12- 17.
[4] AHMED S,LEE Y D,SEUNG-HO H,et al.Unsupervised machine learning-based detection of covert data integrity assault in smart grid networks utilizing isolation forest[J].IEEE transactions on information forensics and security,2019,14(10):2765- 2777.
[5] 陈飞宇.基于集成学习算法的异常检测研究[D].南京:南京大学,2015:47- 63.
[6] 张庆峰,陈冬岩.基于大数据分析的供热二次管网异常监测的算法比较[J].区域供热,2019 (6):94- 106.
[7] 兰芸.供热管网运行数据异常校正方法研究[D].哈尔滨:哈尔滨工业大学,2019:43- 44.
[8] LIU F T,TING K M,ZHOU Z H.Isolation-based anomaly detection[C]//ACM Transactions on Knowledge Discovery from Data,2012:3.
[9] FAWCETT T.ROC graphs:notes and practical considerations for researchers[J].Pattern recognition letters,2004,31(8):1- 38.
[10] 蒋帅.基于AUC的分类器性能评估问题研究[D].长春:吉林大学,2016:10- 13.
[11] 张荣昌.基于数据挖掘的用电数据异常的分析与研究[D].北京:北京交通大学,2017:39- 42.
[2] LIU F T,TING K M,ZHOU Z H.Isolation forest[C]//2008 Eighth IEEE International Conference on Data Mining,2008:413- 422.
[3] DING Z G,FEI M R.An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window[C]//3rd IFAC International Conference on Intelligent Control and Automation Science,2013:12- 17.
[4] AHMED S,LEE Y D,SEUNG-HO H,et al.Unsupervised machine learning-based detection of covert data integrity assault in smart grid networks utilizing isolation forest[J].IEEE transactions on information forensics and security,2019,14(10):2765- 2777.
[5] 陈飞宇.基于集成学习算法的异常检测研究[D].南京:南京大学,2015:47- 63.
[6] 张庆峰,陈冬岩.基于大数据分析的供热二次管网异常监测的算法比较[J].区域供热,2019 (6):94- 106.
[7] 兰芸.供热管网运行数据异常校正方法研究[D].哈尔滨:哈尔滨工业大学,2019:43- 44.
[8] LIU F T,TING K M,ZHOU Z H.Isolation-based anomaly detection[C]//ACM Transactions on Knowledge Discovery from Data,2012:3.
[9] FAWCETT T.ROC graphs:notes and practical considerations for researchers[J].Pattern recognition letters,2004,31(8):1- 38.
[10] 蒋帅.基于AUC的分类器性能评估问题研究[D].长春:吉林大学,2016:10- 13.
[11] 张荣昌.基于数据挖掘的用电数据异常的分析与研究[D].北京:北京交通大学,2017:39- 42.
Abnormal data identification of central heating systems based on isolation forest algorithm
Abstract: Aiming at the problem of heavy workload in abnormal data identification of central heating systems, this paper proposes to use the isolation forest(IF) algorithm to automatically identify abnormal data. Taking the data of a heating season in a heat exchange station in Tianjin as a sample, the physical laws of the central heating system data itself and the influence of the parameters set by the IF algorithm on the model performance are analysed in detail. Aiming at the problem of high misdiagnosis rate of some normal data caused by the operation regulation of central heating systems, a method of data set parameter relativization is proposed. This method can reduce the data misdiagnosis rate by 6.7% and the missed diagnosis rate by 44.6%. By comparing the model performance under different IF algorithm setting parameters, the recommended parameter setting range for abnormal data identification of the heating systems is given.
Keywords: central heating system; abnormal data; automatic identification; isolation forest algorithm; parameter relativization;
784
0
0