Traffic Modeling and Probabilistic Process Abstraction

期刊名字：中国矿业大学学报（英文版）
文件大小：794kb
论文作者：HU Lian-ming
作者单位：Computer Science Department
更新时间：2020-11-10
下载次数：次

论文简介

Dec. 2003Journal of China University of Mining & TechnologyVol. 13 No. 2Traffic Modeling and Probabil isticProcess A bstractionHU Lian-ming (胡廉明)(Computer Science Department, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China)Abstract: State-based models provide an attractive and simple approach to performance modeling. Unfortunately,this approach gives rise to two fundamental problems: 1) capturing the input loads to a system efficiently withinsuch presentations; and 2) coping with the explosion in the number of states whtn system is compostionallypresented. Both problems can be regarded as searching for some optimal representative state model with a minimalconst. In this paper a probabilistic feedback search approach (popularly referred to as a genetic algorithm) waspresented for locating good models with low (state) cost.Key words: traffic; model; performance; automaton; probabilisticCLC number: TN911Document code: AArticle ID: 1006- 1266(2003)02 -0226-05recognize the state within a long sequence of actions1Introdutionand then derive a small model involving the observedThe performance estimates given by a model ofbehavior.a system are heavily dependent both on correctlyIn this paper we taken an approach to theseestimating the parameters governing the behaviourproblems based on the observation of a particularof the system's components and on obtaining aautomaton. It can rapidly produce a sequence osuitable model of the loading imposed the systemactions according to its assigned probabilities. Thisprovided by external events. The usual method forsequence can then be used as the source behavior forobtaining models of traffic is to measure statisticala new model of muich smaller complexity. Suchproperties of its behaviour and then a suitablestate- based performance models have a long historyprobability distribution is used to provide a source ofin the form of Markov chainsl1. And they havetraffic that has the same statistical properties. Butrecently been become the focus of some attentionmany sources of traffic that we currently wish towithin the concurrency theory community in themodel cannot be modeled using such simpleform of probabilistic extensions of process algebrasprobabilistic aproaches. Therefore, the considerableand stochastic Petri nets. These systems are oftenlimitations on the accuracy of the predictions can beused as an alternative to simulation languages.obtained according to the analytic or simulationSometimetheyprovidemoreexpressivemodels.synchronizations and occasionally permit completeA related problem consists of taking a longformal analysis of the performance of concurrentsequence of behavior (e. g.，relative distances ofsystems.memory read and write requests， actions of anIn this paper we also formalize our approachanimal within a camera ) and discovering theand put it into the test through a variety ofstructure to enhance or recognize the systemexperiments. Finally, we draw our conclusions.performance. The traffic problem means to中国煤化工MHCNMHGRecetved date: 2003 - 06 - 22Biograpby: HU Lian-min(1969-), male, from Leshan Sichuan Province, Master's graduatestudent, engaged in the research on computernetworks and database management system.HU Lian-min et al.Traffic Modeling and Probabilistic Process Abstraction2271) Choose a parameterized automaton Mp and2 The Probleman ititial substitution ao;We define a variant of probabilistic2) Repeat the following for a predeterminedautomatal2]. It can provide models probabilisticnumber of iterationsprocess algebras or stochastic Petri nets like Markora) generate from aoa set of substitutions an...chains.Ck,Definition 2. 1 A weighted automaton is ab) for 0≤i≤k compute the distance d.←μ(s,tuple M= (Q,A,T) consisting of a set Q of states,sequ (Mp (a))) between s and a sample sequencean alphabet A and a transition relation TS Q xgenerated by automaton M p(q;)，(Ax([0,∞)UP))xQ, written q q', withac) take i∈{....,k} which minimizes d,∈A and p∈[0,∞). It can be writtenq--→q' if pand set ao+ -a;3) Return Mp(a) as our model.=1.0. ) The parameter p denotes the weight, orrelative frequency of the transition.hCc2By dividing the weight of transition in the sumof the weights leaving its source, we can interpret itFig.1 Two automataas the probability of the transition. This approachIn the above there are several areas of *blackpermits us to ignore the constraint that the sum ofart’that can only be addressed by experimentation,‘probabilities' must be 1. This is a considerablespacifically:convenience when we wish to manipulate th1) the initial choice of Mp;automata.We can leave thenecessary2) the initial substitution ;renormalization to the point at which we evaluate3) the definition of the distance μthe behaviour of any particular state.The initial substitution can always be chosenDefinition 2. 2 Given a weighted automaton M ,according to the where the parameters appear withinwe use seqn (M) to denote a sample sequence ofthe automaton and this would seem to be goodlength n) i.e. element of A") which is generated byunbiased starting point. The distance function canautomaton by fllowing n transitions in accordanceeither be dictated by the underlying probabilisticwith their relative frequencies. Clearly this sequencemethod that we adopt or more usually by computewill be subject to probabilistic variance.time.In order to manipulate the output of ourClearly the choice of alphabet is dictated by theautomata，we introduce a variable- wieghtedalphabet of the source string that we wish to match,and the number of states of the automaton can beDefinition 2. 3Given a set of names P，aregard as the price we are willing to pay for theparameterized automaton is tuple M,= (Q,A,P,T)model. The choice of automaton may be influencedconsisting of a set Q of states, an alphabet A and aby prior knowledge of the structure of the sourcetransition relation T∈QX(AX([0,∞))∪P)XQ,string or a wish to minimize the number ofwritten qq'. Given a substitution a : p→[0,parameters in the search.We can use generative automata throughout the∞), M, (a) denotes the weighted automatonobtained by substitution of the values a(p) for eachfollowing[4]. These automata have the restrictionthat all transitions out of a given state are labeled byname instance in Mp.A probabilistic feedback search, which is oftenthe中国煤化工this restriction canhelps any source stringreferred to as a genetic algorithm[a], is used to findMHCNMHGstructure. olnce rennuving arcs express definitely thean efficient statebased model for generating a givensequences s. We define this algorithmically asremoving paths of the state, the autmaton isgenerative. It should be noted that this restriction228Journal of China University of Mining & TechnologyVol.13 No. 2does not necessarily impose a structure on ourdistance as in Table 1. Notice that this distance canstring. For example the two automata of Fig. 1 arebe computed in a time which is linear in the lengthconsidered. After the first action, the process C isof the sample and the substring sampling length.indistinguishable from the generative process H.This length is important for the sample sizes wedesire.3 ExperimentsTable 1 String distance for substrings of length 2It is difficult to prove anything other than basicSubstringStringl .String2difference6/101/10convergence theorems in this setting. Therefore ,5/104/10we had to examine the effectiveness of the approacha2/90in an experimental manner. We tried a selection ofab3/91/9modeling problems to act as an initial demonstration>b1/of effect. For this, we made use of probabilisticSum of squared difference0.044 6automata modeling software WSCCSIs].Our experiments aim is to demonstrate theThe search problem was run 10 times forfollowing problems:1) the modeling of 'bursty' traffic， a well10 000 generations of 10 perturbed automata, toselect a good approximating automaton. For eachknown difficult matching problems;run, 30 sample sequences of the resulting2 ) the modeling of probabilistic trafficautomaton were compared against the referencecontaining structure, ademanding test of arstring and the calculated mean distance. This canunstructured approach ;3) the generation of a small abstract model of agive a good indication how to choose the automatoncomplex system when modeled precisely, tomatching the source by repeated runs.To given an indication of how closely we coulddemonstrate the potential of the approach for systemexpect a model to getwe took a sample from thesimplification ;4) an attempt to generated a point probabilitygenerating automaton and measured its distancedistribution，which is very demanding of thfrom the reference signal. This appears as theunderlying representation of time distributions in'Self’distance in Table 2. The distance from thereference to the starting automaton is given in a likestateful syste ;manner in the‘Unbiased’ entry (For ease of5)anattemptdistributions in upper and lower bounds; suchreading the resulting distances are multiplied by100 000).distributions are important for timed systems andTable 2 The 'Self' distancebounds permit us to derive results regarding, e.g. ，Distance_guaranteed response times.Self1883.1 Experiment 1:‘Busty' trafficUnbiased30 528Eight trials24653216934We took a sample sequence of 100 000 bursty207202209431traffic events and attempt to find a model startingfrom the automaton below: for 1≤i≤6,a[p]Despite the simplicity of the approach, our bestT。r， rLT，T2四T，result (run 3) appears to have greater similarity toT, bCsT, T, bT，T。buT.the signal than the signal does to regenerations ofThe notion of distance used is sum of theitself. Clearly this demonstrates that， up to thissquares of differnece between the frequencies ofdistanc中国煤化工nd a very goodsubstrings (of lengths ≤4). For example, theapprox.MYHCNMHG.strings aababbabb and abbabaaba are considered.3. 2 Experiment 2: Structure trafficFor substrings of lengths ≤2 we compute theIn this case we used the automaton of Fig. 2 toHU Lian min et al.Traffic Modeling and Probabilistic Process Abstraction229generate the load pattern that the search was2) If it gets the acknowledgement b, it acceptsintended to match. The sequences generated fromforsending the nextmessage from theSi are the form [(aa)+ (bb)+]+. We used the sameenvironment.starting automaton as the previous example, as well3) If it is kept waiting too long, itassumesas the same experimental set-up. In this instancethat the message is lost and re- transmits it.eightexperiments were performed and thThe recciver Re works in an analogous way.computation of distance measurement are same as inAfter it receives a message tagged with b it returnsthe previous experiment.the tag b as an acknowledgement and awaits thenext mesage tagged with b'.(S)a[3/10]1) It ignores messages tagged with b, which(3/10]a[7/10]b[7/10]transmissions of the earlier message.h2) If it gets a new message with b' ，then itFig.2 The automatareturns the tag b' acknowledging its arrival.3) If it is kept waiting too long， it assumesIn this instance the traffic was not probabilisticin nature, at least in respect of an action pairs. Itthat the acknowledgement is lost and re-transmitst.places a considerable load on the searching system.At each tick of some clock， the sender andExamination of the traces of generated traffic showsthe presence of‘ impossible' action sequences. Butreceiver retry with rt whilst they are waiting for anthey are very rare. In the other words, withoutacknowledgement or the message bit to change.considering the starting of an unstructured modelThe messages are lost by the medium withwith a highly strctured system a simple probabilisticprobability err in each transmission. The pattern ofsearch can structure a good model. The addition oftraffic used involves two actions, representing thetruncation techniques can well remove the finalreception of a new message and the ticking of theclock; and a and b representing the reception andartifacts from the model.3.3 Experiment 3: System abstractiontick actions, respectively. The variable settings ，In this experiment, we use the alternating bitwhich we used in the experiment, were chosen rt=protocol (ABP) as the source for the traffic to0.2 and err=0.1 in a reasonable range. We usedmatch. Milner[6] presented an implementation ofthe same starting automaton as the previousABP in CCS and demonstrated that the protocol isexample, as well as the same experimental set- up.correct. This implementation ignored the temporalDistances given are means from 30 samples as beforeand probabilistic properties of the system and its(Table 4).Table 4 The results of experiment 3components. In our presentation we take intTrialDistanceaccount of such properties.Self19The basic design of the ABP consists of twoUnbiased150 822processes, a sender and receiver. The senderTen trials285424C_374accepts messages one-by-one from its environmentand forwards these to the receiver tagged withalternate bits ( the odd- numbered messages beingIt is obviously that we can generate modelstagged with a 1 and the even-numbered mesageswhose time behavior would be difficult totagged with a 0). After sending a message taggedsystems viz. TgInalwith b, the sender awaits the return of the tag bexamp中国煤化工rom a modelingfrom the receiver as an acknowledgement that theperspe;TYHC N M H Gthese abstractions .message has successfully arrived.could take the place of an ABP in further more1 ) It ignores acknowledgements b， whichcomplex analyses requiring the ABP as arepressents acknowledgement of an earlier message.component.230Journal of China University of Mining & TechnologyVol.13 No. 2belong. If the observed data are unknown in the4 Concl usionsprocesses, this means that the modeller must aWe can exploit the especial demand to simplifydistribution from the data. In our approach we startsystems;in particular, we canexploit theoff with a system of expressing any theoreticalprobabilistic nature of performance problems todistribution and allow the data to drive the choice ofsimplify the state models. Moreover, we can usemodel in this space.( upper and least ) bounds on the system tWe can make a search for the number of statesguarantee the accuracy of the estimates for systemby the model in two ways. Firstly , according to theperformance.model drawn from one of the starting formsThe difference between our approach and thedescribed in 2. 1, the instances of the known sizeclassical methods is the simulation and the modelcan be obtained directly. The minimization problemset. In our approach the data are draw, and thecan then be regarded as taking a sequence of modelparameters are substituted for the distribution. It isof increasing size and optimization. Secondly, weimportant to remember that this approach cancould generate an optimized model of a particularinclude states in the intended model distribution.size and check the possibility for minimization. If itClearly such a method implies that the modelleris impossible to minimize the model, it is necessaryknows the theoretic distribution to which the data to attempt a match with a large model.References[1] Keilson J. Markov chain models[J]. Applied Mathematical Science, 1989, 28(6) :128-156.[2] Paz A. Introdution to probabilistic automatabaudo[M]. New York: Acedemic Press, 1981.[3] Goldberg D. Genetic algorithms[M]. Washington: Addison Wesley, 1989.[4] Steffen B. Reactive, generative and stratified models of probabilistic process[M]. New York: In Proe LICS, 1990.[5] Tofts C. Exact, analytic, and locally approximate solutions to discrete event simulation problems[J]. Simulat PracticeTheory, 1998, 6(8):721-759.[6] Minler R. Communication and concurrency[M]. New York ; Prentice-Hall, 1989.中国煤化工MYHCNMHG

论文截图