

Improved K-means Algorithm for Manufacturing Process Anomaly Detection and Recognition
- 期刊名字:武汉理工大学学报
- 文件大小:313kb
- 论文作者:ZHOU Xiaomin,PENG Wei,SHI Haib
- 作者单位:Shenyang Institution of Automation Chinese Academy of Sciences,Graduate School
- 更新时间:2020-11-11
- 下载次数:次
Improved K-means Algorithm for Manufacturing Process AnomalyDetection and RecognitionZHOU Xiaomin' 2 PENG Wei' SHI Haibo'( 1. Shenyang Institution of Automation Chinese Academy of Sciences , Shenyang 1 10016 China E-mail xmzhou@sia. cn ;2. Graduate School , Chinese Academy of Sciences , Beiing 100039 China )Abstract : Anomaly detection and recognition are of prime importance in process industries. Faults are usually rare ,and,therefore , predicting them is difficult. In this paper ,a new greedy initialization method for the K-means algorithm is proposed toimprove traditional K -means clustering techniques. The new initialization method tries to choose suitable initial points , which arerwell separated and hare the potential to form high- quality clusters. Bused on the clustering result of historical disqualificationproduct data in manufacturing process which generated by the Improved- K -means algorithm ,a prediction model which is used todetect and recognize the abnormal trend of the quality problems is constructed . This simple and robust alarmn-system architecture forpredicting incoming faults realizes the transition of quality problems from diagnosis afteruard to prevention be forehand indeed. Inthe end , the alarm model was applied for prediction and avoidance of gear. wheel assembly faults at a gear. plant .Key words: data mining ; clustering ; quality management ; anomaly detection and recognition1 IntroductionFor an enterprise , quality is the life of product and service so implementing the precaution principle is thecore and distillation of modern quality management. The current lag fault diagnosis method is of lttle use to thereal-time manufacturing process quality-control. When the product disqualification is detected , the loss is irre-trievable and this affects the quality and efficiency of production greatly. So how to recognize the early failuresymptom and performance falling-trend , then take a corresponding fault-elimination action beforehand throughmonitoring the process information and product characteristic information becomes more and more important.This manufacturing process excursion forecasting manner is becoming a key step to avoid losses and create a goodproduct reputation.Gearbox is an important component of vehicle drivetrain system and almost 60% gearbox-faults are causedby gear-wheel ,so it is very important to monitor the gear-wheel assembly process and detect the abnormal trendin the gearbox assembly-line. Based on a mass of gearbox performance testing data from a gear plant , this paperanalyzed the gear-wheel assembly process with clustering technology first , then explained the clustering result indomain- expert' s help and formed an anomaly. analysis decision table. To find the hidden quality problem and toprovide abundant decision- making information for the enterprise quality control , the status of each work- stationis monitored with the help of the anomaly forecasting model.2 An Improved K-means Clustering AlgorithmData Clustering is an important technique for exploratory data analysis. Clustering techniques are used forcombining observed objects into clusters which satisfy the criteria that each cluster is homogeneous and should bedifferent from other clusters. K-means is a traditional simple and effective clustering method in common use ,however , there are problems with such a technique :a ) K-means requires the number of clusters to be specifiedbeforehand , but determining the number of clusters is not easy. b ) K-means requires one centroid for each clus-ter , these centroids shoud be placed in a cunning way because of different initial centroids cause different results.c) The abnormality data may be very large , and this may affect the estimation of the data distribution , so thisalgorithm is very sensitive to the abnormality data. To overcome these defects as much as possible , this paperproposed an Improved- K-means clustering algorithm.2.1 Basic Idea of The Improved Algorithm中国煤化工We' II give you several definitions before describing the id.MYHCNMHGhm:Definition 1 :( ε -neighborhood of a point ) The ε-neighborhood ofapoint p ,denoted by N( p ) , is definedby N( p)= {q∈D|dis( p q)≤e}Definition 2 :( sparse-point ) If the number of points in a e-neighborhood of a point is less than the giventhreshold value MinPts , then this point is called sparse- point , otherwise it is called non- sparse point._ 103Definition 3 :( clustering merging rule ) The cluster center of cluster C; is O; , the cluster center of clusterC;isO{( j≠i ),the center of all the points both in C; and in C; is Ok , then cluster C; and C; satisfty the clus-tering merging ruleif 21p-O;|+ 2|p-0;2q+x2p-O%| (λ∈(0.75 1.5 )). Otherwise ,they don' t satisfy this merging rule. .Definition 4 :( cluster radius ) The maximum distance from points in a cluster to the cluster center.Definition 5 :( the subjection degree of a point to a cluster ) Suppose the center of cluster C; is O; , the clus-ter radius is R , the subjection of a random sample point to cluster C; is :Sul(X ,C)=?。-(r-R312(1)In this formula :r= |X- Ok| , the smaller the σ-value is , the steeper the GAUSS Function is , in general , theσ-value is between 0 and 0.5( σ∈(0 0.5 )).Definition 6 :( the improved maximum- likelihood-classification ) Let the two biggest subjection degree ofsample point X be Sul( X ,Ca )=a andSul( X ,C;)=b ,X belongs to cluster Ca if a and b satisfying the con-dition that a≥( 1 + μ )b otherwise, X belongs to a new cluster. The value of μ is chosen acording to thestrictness of classificatory judgement , they are in direct ratio( commonly μ∈[0 ,1 ]).( 1 ) Because the initialized cluster centroids are crucial to the clustering result of this algorithm , the resultwill be more reasonable and the convergence rate will be faster on the base of the reasonable initialized clustercentroids. So ,the better choice is to place them as much as possible far away from each other. In order to realizethis , we can adopt the following strategy : choose the non-sparse- point which has the biggest number of points inits e-neighborhood as the first cluster centroid , and remove both this point and its e-neighborhood points fromthe initial dataset. Then take the rest data points as the initial dataset and choose the second cluster centroid inthe same way. Acording to this rule , choose the third cluster centroid , the fourth,and so on.The next step is to take each rest point belonging to a given cluster represented by the nearest cluster cen-troid. When no point is pending , the first step is completed and an early groupage is done. At this point we needto merge the clusters acording to the clustering merging rule as much as possible. Then we should re- calculatethe new centroids as barycenters of the clusters resulting from the previous step. Repeat these steps untill all theclusters don t change any more.( 2 ) For a cluster- unknown sparse sample point , it' s usually difficult to identify whether it belongs to a newcluster or to an already existing cluster. And it is likely to take a cluster-unknown sample point which belongs toa new cluster to an existing cluster if adopted the traditional Maximum-Likelihood-Classification. Researches onHuman-Classify-Mechanism found that if the similarity degree of a need classified sample-point to one class isoverwhelming bigger than other classes , then this sample-point can be partitioned into this class. Enlighten bythis , we improved the MLC rule to avoid the improper partition and proposed the concept of Improved-MLCrule.( 3 ) Because the data of some abnormality sparse -points may be very large and this may affect the estima-tion of the data distribution , we can separate these sparse-points at first , then assign the rest data points to theright cluster. After all these steps have been finished , you can process these sparse-points according to the Im-proved-MLC and identify whether it should be assigned to a new cluster or to an already known cluster.2.2 Steps of The Improved AlgorithmIn the following , we present the basic steps of the algorithm based on the previous analysis :Input : input parameters ε , MinPts and the database included n data pointsOutput : K clusters satisfied the Lowest- V ariance Criterion( 1 ) Analyze the E neighborhood of each point in the given dataset , and separate the sparse data points fromthe non-sparse data points.( 2 ) Repeat step(3 )to( 4 )untill we can' t choose any more cluster centroids.( 3 ) Pick up the non-sparse data point which has the biggest number of points in it' s E -neighborhood as thefirst cluster centroid.中国煤化工( 4 ) Remove both this point and the e-neighborhood pointC N M H Gset. Then take the restdata points as the initial dataset.( 5 ) Repeat step( 6 )to( 8 ) untill all the clusters don' t change any more.( 6 )Calculate the distance between each rest data point and each cluster center , assigning these rest datapoints to the closest cluster according the minimum distance criterion.一1037-( 7 ) Merge the clusters according to the clustering merging rule as much as possible.( 8 ) Recalculate the new cluster center for each cluster.( 9 ) Assign the sparse data points separated in step( 1 ) according to the Improved-MLC , identify whetherit belongs to a new cluster or to an already exist cluster.For the input parameter K is critical to the clustering number and the rationality of clustering result ,andsetting the parameter K lies on domain expert' s experience greatly , the traditional K-means algorithm is appliedat a discount. Though the Improved-K-means algorithm in this paper imports two parameters ε and MinPts ,they have much less effect on the clustering result. The clustering number self study function can reduce the ex-cessive reliance on input parameter K during clustering analysis.In addition , the irrational initial center which is needed by K- means algorithm will lead the algorithm to lo-cal optimization easily. This improved K- means algorithm can get a better clustering result by a special way toassure the initial points be placed dispersedly enough. Though adding some steps , the loop number will fall downfor the more rational initial center and the time complexity of improved K -means is more or less equal to the tra-ditional one.3 Approach and Strategy AnalysisA great deal of product-performance-parameters variances are usually caused by manufacturing process pa-rameters variances which always changed slowly and gradually. These parameters variances can be divided intomany types such as material differences , the stability of the machine equipment , the variance of the environmentconditions , and the change of the operator. In manufacturing process ,the product performance parameters varyin a local scope and obey a certain distribution normal distribution while there are only casual factors exist.However , when there are some systemic factors such as the deteriorating of machine equipment , low-level skillof the operator and the low passing rate of the material , the performance parameters may depart from the origi-nal distribution although they still stay in normal scope. This aberrancy means that the manufacturing processmay have some hidden quality trouble in reverse.In order to realize the anomaly trend detection , you need an integrated system to unite the interrelated tech-niques organically so as to exert the whole function. The analysis system proposed in this article can divide intothree parts :the data standardization and attribute weight analysis part , the clustering analysis part,the anomalydetection and recognition part. The whole analysis system flow-chart is depicted in Fig. 1.Online data collectionFaul -sampledatabaseStand;ndardization processStandardizationγNammaliteClustering analysis andexplanationNAnalytical decision tableStatistical layerAnomaly detection and alarm generationFig. 1 Clustering- based anomaly detection and recognition system flow-chart3.1 Data Standardization Process and Attribute Weight AnalysisChoose correlative operation data for the particular problem after preparing the sampling data step , and pre-process these data such as : fllup the missing data , smooth th中国煤化工bnormal data. Supposethe collected sampling dataset is x={x1 x2... xn }, eachYHCN M H G is characterized by thevariable group x;={x;i x江.. rxij .. xTim } In this formula xj represents the jth characteristic-parameter ofthe ith sample.Because the importance of different characteristic parameters to problem analysis is different ,it is necessaryto establish the weight and give a weight-value to each parameter. The w eighted Euclidean Distance from x; to- 10x; is:dis(
-
C4烯烃制丙烯催化剂 2020-11-11
-
煤基聚乙醇酸技术进展 2020-11-11
-
生物质能的应用工程 2020-11-11
-
我国甲醇工业现状 2020-11-11
-
JB/T 11699-2013 高处作业吊篮安装、拆卸、使用技术规程 2020-11-11
-
石油化工设备腐蚀与防护参考书十本免费下载,绝版珍藏 2020-11-11
-
四喷嘴水煤浆气化炉工业应用情况简介 2020-11-11
-
Lurgi和ICI低压甲醇合成工艺比较 2020-11-11
-
甲醇制芳烃研究进展 2020-11-11
-
精甲醇及MTO级甲醇精馏工艺技术进展 2020-11-11