Markov Discovers the Theory of Linked Probabilities

Andrey Andreyevich Markov’s formal development of linked probabilities (Markov chains) provided new computational models for a wide variety of random processes.


Summary of Event

Before Pafnuty Lvovich Chebyshev, Andrey Andreyevich Markov, and Andrey Nikolayevich Kolmogorov, a number of individuals, mainly natural scientists applying statistical methods, had offered oblique working definitions of key statistical properties. These definitions, however, frequently did not coincide with one another, and in some cases they were even contradictory. This was also true of basic statistical methods such as the use of histograms Histograms in count data analysis. A histogram is constructed through the division of the horizontal axis into segments or classes of a certain data value (such as size or temperature) that cover the entire range of data values. Over each segment, a rectangle is constructed whose height is proportional to the number of data points in each class segment. Such a histogram shows the relative class frequencies, which in the limits of smaller subdivisions and larger numbers of samples approaches the underlying probability density distribution for the population. Linked probabilities, theory
Mathematics;linked probabilities
Markov chains
[kw]Markov Discovers the Theory of Linked Probabilities (1906)
[kw]Linked Probabilities, Markov Discovers the Theory of (1906)
[kw]Probabilities, Markov Discovers the Theory of Linked (1906)
Linked probabilities, theory
Mathematics;linked probabilities
Markov chains
[g]Russia;1906: Markov Discovers the Theory of Linked Probabilities[01510]
[c]Mathematics;1906: Markov Discovers the Theory of Linked Probabilities[01510]
Markov, Andrey Andreyevich

A key example of functional, yet vague, statistical concepts awaiting further development was Jakob Bernoulli’s Bernoulli, Jakob I so-called law of large numbers. Law of large numbers Formally stated, Bernoulli’s theorem asserts that frequencies of occurrence for independent chance events must converge eventually in the limit of large numbers of observation to the underlying probability law. In other words, this theorem states basically that the experimental histogram of samples from a statistical population will match the theoretical probability distribution function more closely, defining a given population as the number of selected samples increases without bound.

Even by the late 1880’s, satisfactory proofs under general assumptions had not been found for the large number law, nor had its limits of applicability been well determined. This was particularly true for the case of events whose statistical independence or relation was not known or determinable beforehand. Thus, for example, the German statistician Wilhelm Lexis, although arguing for the conceptual modeling of statistical interpretation of natural events on the results of a notional urn drawing, did not believe that evolutionary or otherwise linked or connected series were susceptible to formulation in terms of quantitative probability theory. As numerous investigators pointed out near the beginning of the twentieth century, in its original (chiefly heuristic) form, the large number law provided nothing useful about the precise manner or function form in which statistical averages approach their limit. Many statisticians believed the law of large numbers to be true only for statistically independent events or trials. Nevertheless, without a more rigorous law of large numbers, probability theory would lose its common intuitive foundations, and so the issue could not simply be dismissed.

In contrast to the German and French schools of statistics, Russian statisticians (almost exclusively at the University of St. Petersburg) were strongly theoretical in focus, more interested in formal existence and consistency conditions than in applications of statistical laws. It was precisely the members of this group who, from their own perspective, addressed the problems of consistently defining the interpretation and applications scope of the law of large numbers. In 1867, Chebyshev found the first elementary proof of the law of large numbers as well as the “central limit theorem.” As an advanced graduate student of Chebyshev, Markov in 1884 published a shorter and more perspicuous reformulation of Chebyshev’s proofs, which began a series of exchanges and further clarifications of the basic concepts underlying, and scope of application of, the large number and central limit theorems to linked events as well as to totally independent events. From a purely theoretical perspective, Markov approached these questions indirectly by considering whether the law of large numbers applied equally to dependent and independent random variables. Then, using the Markov model of (weak/linked/conditional) dependence to verify the law of large numbers, he examined whether sums of dependent variables satisfy the central limit theorem.

Specifically, in subsequent work Markov focused closely on the theoretical laws governing what is in effect the convergence of empirical histograms to theoretical “probability distribution functions” for a variety of conditions of weak statistical dependence, or sample interlinking, as well as independence. The general classes of weakly linked statistical events that Markov considered originally can be viewed as a generic mathematical model of a random process with only specifically delimited aftereffects or with a very short-term memory. This model describes any physical or other system in which the probability of change or transition from one state to another depends only on the state of the system at the present or immediately preceding time and not on the longer prior history of the process.

A perennial problem by the end of the nineteenth century, related to the issue of independent versus dependent statistical data, was that of estimating the statistical variance of time or space averages of ocean, weather, geological, or other statistically measured data that could not, strictly speaking, be assumed as totally independent, random, or unconnected by virtue of their (causal) adjacency. One of many practical examples of events that both common sense and science describe as linked are meteorologic observations. In 1852, French mathematician Adolphe Quetelet Quetelet, Adolphe proposed what is probably the first probabilistic model to fit weather observations of consecutive rainy (or cold) and dry (or warm) days. To account for the frequent persistence, or temporal linkedness, of rainy (or dry) weather, Quetelet devised a simple mathematical formula that explicitly captures an apparently random element but also exhibits the influence or effect of one or more previous elements on subsequent events. Neither Quetelet nor other largely empirical science-oriented statisticians of the era pursued further the more general possibilities of accounting for linked events or processes.

Markov stumbled onto the formal properties, and applications potential, of a general method for describing events with linked probabilities at the beginning of the twentieth century. In his 1906 paper “Extension of the Law of Large Numbers,” Markov proved for the first time that both the number of occurrences of a studied event and the sequence of its associated random variables obey the law of large numbers. Assuming a simple one-link only (first-order) chaining of events and an unconditional normalized total probability for the event of p, Markov first derived the mth-order transition probabilities, or probability of a change of binary-determined states between temporally adjacent events, to be given by the relation Rm = p + (l – p)(p1
p
2)
m
. This relation can be interpreted as linking the general conditions under which a system of equations has a unique solution with the specific parameters defining the chaining of events. As Markov stated in his 1907 paper “Extension of the Limit Theorems of Probability Theory to a Sum of Variables Connected in a Chain,” these weakly dependent, or linked, sequences are definable as “numbers connected into a chain in such a manner that, when the value of one of them becomes known, subsequent numbers become independent of the preceding ones.”

Likewise, in publications in 1908 and from 1910 to 1912, Markov considered more complex (higher-order) linkings or chainings of events whose probabilities of occurrence depend on the outcomes of two or more prior trials. In modern probability theory, a sequence is considered to be a Markov chain if the probability distribution function governing its underlying process can be said to have connection, link, or memory extending to one or more prior events. In other words, in the Markov model, given the present, the future is independent of the past. More generally, random processes having dependence between successive terms as an intrinsic property of the underlying process are defined as Markov processes.

The classical example of a (multiple) Markov chain, as a semiserious test of the efficiency of information transfer originally given by Markov himself, is that of the written (Russian) language in Alexander Pushkin’s novel Evgeny Onegin (1825-1833; Eugene Onegin, 1881). Here, letters of the alphabet are subdivided into type states, denoted by 0 if a vowel and 1 if a consonant, so that a page of written text appears as a sequence of the occurrence of 0’s and 1’s. The vowels and consonants form a first-order Markov chain if, given any string of letters, the probability for the next letter to be a vowel or consonant (0 or 1) is the same as the probability that the next letter will be a vowel or consonant if one knows only the last letter of the entire story.

Although there is no direct evidence that the St. Petersburg school’s original work to theoretically validate probability theorems to the case of dependent variables was motivated by prior publications by Lexis or his contemporaries between 1907 and 1911, Markov and a colleague repeatedly referred to several examples and applied problems from their publications. In addition, Markov and his colleagues generally indicated several other examples of physical phenomena exhibiting linked probabilistic behavior. These included the theories of molecular random (“Brownian”) motion of Albert Einstein and Paul Langevin, biological theories of the extinction of genetic families, and the so-called random walk problem. Random walk problem (The random walk, paradigmatic for many probability applications, is defined generically by the motion along a straight line that, at each unit time, can move one unit left or right or not at all, whose probabilities depend only on position.)



Significance

Almost since their discovery, Markov chains have proven to be a powerful method for modeling a wide variety of physical, chemical, and biological phenomena involving time-dependent/transient and long-run/steady-state behaviors. As early as 1908, A. K. Erlang carried out studies of the steady-state behavior of commercial telephone exchange traffic (theory of queues), deriving what is now known as the Kolmogorov equations for a finite Markov process. In addition to stock exchange speculation, and particularly telephone, mail, and road traffic, similar Markov queue models have been applied widely to the landing of aircraft and ships, assembly-line component breakdown, scheduling and checkout for clinics and supermarkets, and inventory maintenance. In the first three decades of the twentieth century, researchers considered problems of quantitative epidemic spreading and population growth using a second-order Markov model. In addition to the theory of elementary particle collisions and cascades, the theory of statistical mechanics describing the physics of molecules—an important Markov chain application—is the so-called nearest neighbor system used, for example, as models for crystal lattices; likewise in chemistry, for treating chemical kinetics and diffusion-controlled reactions. During the past decades, following the work of Ludwig von Mises, a number of researchers in operations research have combined the methodologies of Markov models and Bayesian decision analysis to facilitate quantitative solution of complex problems in economic and military equipment procurement and deployment.

As exhaustively detailed in Albert T. Bharucha-Reid’s advanced monograph, thousands of papers have been published on specialized applications of temporal and spatial Markov series. Theoretically, the mathematical methods initiated by Markov were extended later and formalized by Aleksandr Khinchin, Norbert Wiener, and notably Kolmogorov, establishing probability theory as an identifiable and rigorously founded subdiscipline. Linked probabilities, theory
Mathematics;linked probabilities
Markov chains



Further Reading

  • Bartos, Otomar J. Simple Models of Group Behavior. New York: Columbia University Press, 1967. Examines several applications of Markov process models to the prediction of the persistence of specific behaviors in small-group settings studied by sociology.
  • Bhanscha-Reid, Albert T. Elements of the Theory of Markov Processes and Their Applications. 1960. Reprint. Mineola, N.Y.: Dover, 1997. An encyclopedic presentation of basic and advanced Markov theory and applications to physical, chemical, astronomical, biological, engineering, and other sciences.
  • Damerau, Frederick J. Markov Models and Linguistic Theory. The Hague: Mouton, 1971. An important theoretical study concerning the degrees of approximation to the English language by Markov models of progressively higher order.
  • Gani, Joseph M., ed. The Craft of Probabilistic Modeling. New York: Springer-Verlag, 1986. Includes sufficient theory and applications. Remains within the scope of general undergraduate readers.
  • Gigerenzer, Gerd, et al. The Empire of Chance: How Probability Theory Changed Science and Everyday Life. New York: Cambridge University Press, 1989. Extensive discussions of empirical probability rules and applications in the natural sciences throughout the nineteenth and twentieth centuries.
  • Gray, Robert M., and Lee D. Davisson. Random Processes: A Mathematical Approach for Engineers. Englewood Cliffs, N.J.: Prentice-Hall, 1986. Provides a clear and concise introduction to discrete random processes.


Steinitz Inaugurates Modern Abstract Algebra

Mises Develops the Frequency Theory of Probability

Turing Invents the Universal Turing Machine

Bourbaki Group Publishes ÉLÉments de mathématique