Analysis of Network Traffic with Extreme Value Theory Analysis of Network Traffic with Extreme Value Theory

Analysis of Network Traffic with Extreme Value Theory

  • 期刊名字:天津大学学报
  • 文件大小:778kb
  • 论文作者:舒炎泰,汪广洪,高德云,刘嘉焜,王旭
  • 作者单位:School of Electronic Information Engineering,School of Sciences
  • 更新时间:2020-12-06
  • 下载次数:
论文简介

Transactions of Tianjin UniversityVol.9 No. 2Jun.2003Analysis of Network Traffic with Extreme Value Theory*SHU Yan-tai(舒炎泰) , WANG Guang-hong(汪广洪) , GAO De-yun(高德云)',LIU Jia-kun (刘嘉烷)", WANG Xu(王旭)=(1. School of Eectronic Information Engineering, Tianjin Lniversity ,Tianjin 300072, China ;2. School of Seiences, Tianjin University , Tianjin 300072, China)U49 AAbstract;It is very important 10 analyze network taffie in the network cuntrol and management. ln this paper ,extreme value theory is first introduced and a model with threshold methods is proposed to analyze the character-istics of network trfic. In this model, only some taffc data that is greater than threshold value is considered.Then the proposed model with the trace is simulated by uaing S-Plus sofware. The modeling reults show thenetwork taffc model constructed from the extreme value theory fits well with that of empirical distribution. Fi-nally, the extreme value model with the FARIMA(p,d,q) modeling is compared. The analytical resuls ilus-trate that extreme value theory has a good application foreground in the statistie analysis of network taffie. Inaddition, since only somne trffe data which is greater lhan the threshold is processed , the computation overheadis reduced greatly.Keywords:extreme value theory ; generalized Pareto ditribtion; threshold value; FARIMA(p ,d,q) modelArticle ID: 1006- 4982(2003 )02-0131-05With increasing demand of wide bandwidth service value theory is intrinsically about extrapolation. ln its sim-such as multimedia and video, it is very desirable to re- plest form the problem is given a set of independent datasearch on diferent transport techniques for high-speed net- X; ,X,* ,X。from unknown distibution F to estinate ac-works. Congestion control, bandwidth assignment, network curately the tail of F. The principal issues considered inmanagement and performance evaluation are some exam-this problem are:ples that have been researched widely. As the foundation1) By definition, extremes are rare;of these research fields, it is the most essential to build a2) Estimates are often required beyond Xmx, the lar-model that can accurately characterize real network trafc. gest observed data value;Hence, it is also necessary to understand and analyze sta-3) Standard density estimation techniques fit welltistical properties of the network trffle because it can indi- where the data have greatest density, but can be severelycate the network congestion degree , which in turn can be biased in estimating tail probilities.used for network design and management, and even a-The role of extreme value theory is to develop proce-plied to nelwork control design. For example, when there dures which are scenifcally and stitically rational forare many users sharing network sources, it is possible that estimating the extremal behaviour of random variables ornetwork congestion emerges and user application quality isprocesses.degraded gealy. In the conventional analysis of network 1.1 Brief history to extreme value theorytraffice , trffc data was usually assumed to meet some spec-The foundations of the asymptotic argument whichifed distribution. But in the real world, it is very difcult forms the backbone for the extreme value theory were setto determine the distribution of network trafic. This paper oul by Fisher and Tippett in the 1920's'"', though theyintroduces the extreme valule theory for the analysis of net- were reluctant to propose the models for stistical aplica-work trfe. The featue of this type of analysis is that itis tions. This theory was unified and extended by Gnedenkonol necessary to assune the distributed function of networktralfic, we only need to consider extreme events.Accepted date:2002- 09 23.1 Extreme value theory中国煤化工cundation of China( No.Being unique amongst all areas of statistics, exlremeFHCNM H Gruai Api Manh..Transactions of 7Tianjin University Vol. 9 No.2 2003in the 1940's ”. 'The statistical applications of the proba- rameterξ is referred to as the shape parameler, while μ ,bilistic models for extremes were first studied and formal- and σ >0 are location and scale parameters respectively.ized by Gumbel in the 1950's. Also in he 1950's, Jenkin-A typical application of the model is to fit the distri-son worked on the application of extreme value models to bution lo a series of annual maximum data (i. e.,taking nextreme wind speeds ,and developed a model parameter-to be the number of observations in a year). Estimations ofizationl*l ,which unified a number of previously disparate extrerme value quantiles of the annual maxima are then ob-models, thus adding clarity to the modeling procedure. In tained by invering Fq. (4):the 1970's the classical limit laws were generalized by Pic-x=μ-°={1-[ -ln(1-p)]-6}(5)kands(4) , leading to sulbstantially improve modeling proce-dures, which were developed in the 1980's and 1990's. where G(x,) =1 -p. In extreme value terminology, x, isThroughout the 1980's the extremal behavior of a muchthe returm level associated with the retum period 1/p, andmore general class of processes admiting various types of it is common to extrapolate the relationship to obtain esti-non-stationarity and dependence was investigated.mates of return levels considerably beyond the range of the1.2 Extreme value distribution typesdata to which the model has been ftted.Suppose that x, x,,X。is a sequence of inde- 1.3 Generalized Pareto distributionpendent identically distribuled( ID) random variables withAs a procedure for statistical inference the techniquethe same distribution function. Let M。= max(x,x,,of modeling annual maxima with a GEV distribution isX,),then we can have the fllowing theorem:highly iefficient. For example, data may have been col-Theorem 1 ( Extremal types theorem) If there exist lected daily ,or even hourly ,but only one observation persequences of constants a。>0 and b. such that,as r-→∞year is included in the inference. This weakness has led tothe development of other characterizations which enable ex-°≤x)-→G(x)treme data other than just the maxima value to be incorpo-For some non-degenerale distibution C, then C is of the rated into the analysis. One of such methods to be consi-same type as one of the following distributions:dered was “threshold methods ”whereby all observationsI G,(x) =exp{ -exp( -x)} -∞ 0,a>0ue u, observations grealer than u are called superthresh-jexp!-(-x)"}x<0,a>0old. Pickands proved that superthreshold value Y matchesI G.(x)={ix≥0with generalized Paretn distribution.Conversely, each of these distribution , C, may appearP(Y>u+xIY>u)≈(1+ξ兰) -ve(6)asa limit for the distribution of (M。-b。)/an and, infact, does so when G itself is the distribution of X.Clltively, the three casses of dstribution in Theo- 2 Analysis of network trafficremI are referred to as extreme value distibutions, withtypes I ,II and I widely known as the Gumbel , FrechetVideo taffie is expected to be a significant componenland Weibull types respectively['].of the taffic in integrated service packet networks. There-For satisical purposes it is inconvenient to work with fore, our research work on taffil modeling is based on vid-three distinet classes of limiting distribution, as in Theo-eo trafic from Rose[5]. We choose one typical MPEG-1rem 1, so it is usual to adopl a parameterization which en-trace (a file star2_ . IPB) to analyze. Some simple statisti-compasses all three types. Jenkinson5l derived the gener-cal quantities of the trace are shown in Tab. I. The foldalied extreme value dstribution (GEV), with dsribution line and frequency plots are shown inFig 1 and Fig.2re-function;spectively.Now,we explain these paramelers in the following:()=xp{-[1+(=])“}(4)1) Mean:defined on {x:1 +ξ(x-μ)/σ>0}. Thetype I and typex=⊥&x(7)肛classes of extreme value distribution correspond respee-tively to the cases ξ >0 and ξ <0 in this parameterization,中国煤化Inimum and maximumwhile the type class I arises in the limit asξ→0, The pa- respeYHCNMHG-132-SHU Yan-tai et al :Analysis of Netvork Trufjic with Extrene Vulue Thery⊥E(x;-x)°120 0(11)营10ux1片(x-*)”The Kurtosis of normal distribution is 3. If b. >3,! 600then it characterizes heavy-lail, i. e.,there are many ob-hMservations far from mean value.0 (NDTab.1 Summary statistics for data in file star2. . IPBlin2.720 000 x 1021000 2000 3000 4 oi dIst Qu2. 392 00x10)Tim:/(1/24 *)9. 313 200 x 10PMedian4. 432 00x10)Fig. 1 Foldline plot of network traftic_3nd Qu9. 160 000x10000021. 248 160x10Variance1. 664 803 x10*Standard Deviation1.290 273 x104SE Mean .6. 451 363 x10Skewness2.738 999Kurtosis8. 903 514From Tab.1, Fig.1 and Fig.2, we can arrige thesefollowing points:1) Network taffc is basically distributed around the27264 272 96 272number 9 313.2.Nerworik tatli ; bit2) From Fig. 1, the fluctuation o[ network taffie hasFig.2 Probability plot of network traffcstrong autocorelation. It has somne continuity for high/lowfluctuation.3) 1st Qu and 3rd Qu represent “first quartile" and3) From Fig.2, the distribution shape has right“third quartile" respectively. Quartile is any of the threeavertence.values that divides the items of a frequency distribution in-4) It is high discretionary for extreme value far awayto four clases with each containing one fourth of the total from the distribution center.population.5) In Fig. 1, it has no reguarity when large quantity4) Median: a value in an ordered set of values below of trffce occurred.and above which there is an equal number of values or6) Due to Kurtosis > 3,it shows the character ofwhich is the arithmetic mean of the two middle values if heavy tail clearly.there is no middle number.These phenomena support the applicability of extreme5) Variance: the square of the standard deviation.valuevar=s=_&(x-x)3(8)3 Modeling6) Standard deviation:std=√var=S(9)We build a model with threshold methods and only7) Skewness: lack of symmetry in a frequency distri- consider some trffc data that are greater than thresholdbution.value, because only those packets can cause network con-工文(x,-x)’gestion. Stuart Coles has witen various functions in S-b,=-(10)PLUS to support extreme value modeling 63.1 Threshold value choice[(x,一到“It should be careful to choose a proper threshold val-8) Kurtosis: the peakedness or flatness of the graph ue. It is also the issue of how t0 make this choice ofof a frequency distribution especially with respect to the threshaldmnice, it needs to deter-concentration of values near the mean as compared with the mine中国煤化工r. There is a trade-ofnormal distribution.in thCNMHGaretoolowinecurbi--133-Transactions of Tianjin University Vol.9 Na.2 2003as due to the invalidity of the asymptotie argument, while1-[ -1n(1-1-+-1-1012}1thresholds which are too high have few exceedances and soWe simulate our model using S-Plus software andsampling variability is high. One method of threshold veri- some program of extreme value theory4. Modeling resultsfication is based on the likelihood analysis itself. The mod-are shown in the fllwing Fig. 3. Fig. 3 is probability-el with parameters is valid at all levels above some suitably probability and quantile-quantile plots.high threshold. Thus, if we make the model ft at a rangeof thresholds, then we should observe some stability in theparameter estimates ( relative to their sampling variability )0.8at a threshold where the asymptotics are valid. The thresh-0.6old value should be chosen such that the shape parameterof extreme value distribution is stable in some range, i. e. ,the shape parameter varies lttle when the threshold value0.2changes. Through simulation, we choose the number38 000 as the threshold value.).2.8.03.2 Modeling resultsEmpiricalNow, the problem is how to make inferences about(a) Probability plolhe parameters (μ, o, ξ ). Numerous suggestions havebeen made, generally based on: graphical techniques ;100 000moment-based estimators; likelihood. For many reasonslikelihood-based methods are prelerable. The theory of80 00likelihood estimators is well-understood , and the inferences晋6000are easily modifed to incorporate more complex modelstructures. There is one slight technicality as to whether40 00the usual regularity conditions of maximum likelihood are20 0000applicable here, since the end-point of the GEV distribu-20000 40000 60000 80000 100000 200 000tionμ士σ/ξ is parameter dependent. This isue was stu-died earefully by Smith[7] . He obtained the fllowing re-(b) Quanile plotFig.3 Probability and quantile plots of model comparedsults:with empirical distribution1) Whenξ> -0. 5, maximum likelihood estimatorsare completely regular;A quantile-quantile plot is a graphical data analysis2) When -1<5< -0. s, maximum lkelihood esti- tchnique for comparing the dsribuion of2 data sets. Themalors exist, but non-regular ;lot consists of the fllowing: Vertical axis is estimated3) Whenξ< -1, maximum likelihood estinators do quantiles from data set 1; Horizontal axis is estinatednot exist.quantiles from data set 2. The quantiles of a distributionBased on threshold value methods,the estimated val-are the distribution's percent points (e.g,the 50% pointues (33 045.34, 14 284. 06, -0.070 217 24) of ex- is the median). The advantage of the quantilequaniletreme value distibution's parameters (μ,σ,ξ) are ob- plot is 2-fold; the sample sizes do not need to be identical;tained using maximum likelihood estimators in S-Plus sof- many distributional aspects can be simultaneously testedware,( For example , changes in symmetry/ skewness, etc). TheFrom Eq. (5), we have the extreme value distibution quantile-quanile plot has 2 components: the quantileunderp{Q>x|:points themselves; a 45 degree reference line. The proba-bility plot has the similar meaning to the quantile-quantile(=)=ep{ -1+.070217) 24( 53414 284.06 )plot, so just the quantile is replaced by probability.Frum Fig.3, we can get the following results:Also, trafie value also can be obtained under certain1) In the extreme situation, the quantile of model ba-probabilily p:14 284. 06fectiv中国煤化工x=33 045.34--0.070 21724YHC NM H Gility levelp, at some一134-SHU Yan-ai et al :Analysis of Netwvork Trufie with Extreme Value Thearyrange, it is satisfied with the estimated results of cxtreme used to build a time series model and very easy 10 be ap-value theory. However, with the increase of probability plied to prediction-based network control. Extreme valuelevel , the bias error of extreme value distribution and em- theory modeling method starts from slalistics, so it needspirical distribution increase rapidly. It ilustrates that it has some other works for network traffic prediction. However,a limit scope of application with the extreme value theory. it reduces cormputation overhead with extreme value theoryand it is possible to be applied to real-time network con-4 Comparison with FARIMA(p,d,q) model trol.In the highspeed network, it is very important to 5 Conclusionsanalyze the behavior and characteristics of network traffic.It will be of benefit to network design, bandwidth llcaThis paper analyzed the characteristics of networktion and resource management. In our previous work, we trffic with the extreme value theory and built models forhave studied a method to model and predict various video real network trffc. The analytical results iluslratle thattraffic using FARIMA models.extreme value theory has a good applicability foreground inWe have built an FARIMA(p,d,q) model of the the statistic analysis of network taffe. We simulated theMPEG-1 trace (a file star2_ . IPB). The comparison of the proposed models with the trace using S-Plus software. Themodel with trffice trace is shown in Fig. 4. And we have modeling results are consistent with our analytical results.applied our FARIMA(p,d,q) model to adnmission con-Furthermore, since some traffe dala greater than thetrol 8] and bandwidth allocation schemel9. Our experi- threshold are processed, the conputation overhead is re-ments showed that the FARIMA model is a good trafic duced greatly. It indicales that the extreme value theorymodel and is capable of capturing the properties of video may be applied to real time network control. In our futuretraffie because the FARIMA processes are much more flexi- work, we will use the extreme value theory to analyzeble and capable of simultaneously modeling both the long- queue length in order to improve buffer management mech-range and the short-range dependent behavior of a time se- anisms to alleviate network congestion, such as RED.ries. It also showed that it was possible to simplify the FA-ReferencesRIMA modeling. This method has less number of modelparameters and keeps same order of model on quite a large[1] Fisher R A,Tippeu L H C. Limiting forns of the frequency disrbu-tion of the largest or amalleat member of a sarmple[ A]. Proceedings oftime scale. Consequently, one can reduce the time toCambridge Philosphieal Sociery[ C]. Cambridge University Press,build trffc models and do real-time modeling and predic-1928.24;180--190.tion for the prediction- based admission control algorithm[2] Gnedenka B. Sur la distibution linite du lerme maximum d' une strieand bandwidth allocation scheme.aleatoire[J]. Annals of Malh, 1943. 44: 423- -453.] Jenkinson A F. The frequeney distribution of annual marimum ( or250 00minimum) values of mneteorological elenents[J]. Quarterly Journal ofthe Royal Meteorological Sociey, 1955, 81: 158-171.[4] Pickunds J m. Stistcael ifernce using extreme order staistics云1000J]. Ann Stais, 1975, 3: 19--131.[5] Rose 0. Simple and Eficient Models for Variable Bit Rate MPEGVidco Tffe[ 2小. Technical Report 120, Insitute of Computer Sci-月50000ence, University of Wurbug. July 1995.---. Mods:l[6] Stuart Coles. Exrene value software[Z]. htp://ww. stats. bris. ac.uk/ - masge/ , 2001. .[7] Smith R L. Staistics uf exreme value[ A]. Procedingv of 45th Ser-Tsion of ISI[C]. Amnstendam, 1985. Book IV, Paper 26. 1.Time/s[8] Sbu Yanai, Jin 2higang, Wang Jidong et al. Prediction based admis-Fig.4 Comparison of traffc trace withsion control uing FARIMIA models[A]. Pocedingns of the 2000FARIMA(p ,d,q) modelIEEE Intemnatiornal Conference on Communications[ C]. New Orleans,Louisiana, USA, 2000: 1325--1329.With the comparison of extreme value theory and FA-[9] Jin Zhigang, Shu Yantai, Liu Jiakun et al. Prediction based nelwuckhandwidth contro][ A1. Proceedines of the IEEE Canulian ConferenceRIMA(p,d ,q) model, each of them characters and analy-中国煤化工C]. Hilla, Nova Soia,zes network trafic well. FARIMA (p,d,q) model can beMHCNMHG- 135-

论文截图
版权:如无特殊注明,文章转载自网络,侵权请联系cnmhg168#163.com删除!文件均为网友上传,仅供研究和学习使用,务必24小时内删除。