CE3: The Batting Average

Probability Digests the Baseball Hitter's Data Stream
Living Algorithm digests the Baseball Player's Data Stream of 'at bats'
10-day Average vs. the Predictive Cloud
Living Algorithm & Probability: Complementary Systems of Analysis

Probability Digests the Baseball Hitter's Data Stream

In the preceding article, we showed how the Living Algorithm System is the ideal mathematics to deal with Life's data streams. The Living Algorithm is sensitive to the moment and weights each data point in the stream according to its relation to the present. Further the Living Algorithm's predictive cloud also describes the trajectory of the moment's recent trends. This up-to-date information about the moment provides estimates about the nature of future moments.

A Data Stream of 'At Bats' generates a Batting Average

To illustrate these concepts, let's explore a concrete example. The baseball player’s actions during a game can be characterized by any number of data streams. One of the most basic measures of a players’ performance at the plate is the batting average. The batting average is determined by one of these data streams. The rules that generate this data stream are simple. If he gets a hit, he generates a one; and if he doesn't get a hit, he generates a zero. There are several kinds of ‘at bats’ that are excluded from the data stream that determines the batting average (e.g. walks, batters hit by pitches, and sacrifices).

The Batting Average: Probability's Descriptor

Probability looks at this flow of information as an ever-growing set and computes an average (the mean), which is appropriately called the batting average. This statistic has had a significant role to play in the evaluation of the success of any professional hitter. Raw numbers such as hits, home runs and RBIs are also significant measures of success, but the batting average has been a traditional indicator that compliments these raw numbers. Baseball players' salaries and fame are based, in part, upon these batting averages.

Probability's Descriptor, also Predictive

These batting averages, which describe a players' performance, can also be used to serve a predictive function. The knowledge of a player’s batting average is likely to shape the strategy of opposing coaches and pitchers. Owners and general managers also use the batting average to predict how well the player will do in the following year(s). Bonuses, salaries, and long-term contracts are also likely to be influenced by a player's batting average.

Batting Average only a Rough Approximation of Future Performance

It is easy to see from this example how descriptive measures of past performance, such as the batting average, are used to predict future performance. It is equally obvious that these predictive descriptors only provide very rough approximations of future performance. Even though a baseball player might have a batting average of .333, this does not in any way guarantee that he will continue to bat .333 for the rest of his season, contract, or career. This obvious lack of guarantee reminds us that the batting average is only a rough approximation of future performance.

Rough Approximation extremely Meaningful

The rough approximation of the future provided by a batting average is valued by those who have a stake in predicting future events. Large salaries are given because of these rough approximations; huge bets are placed on them; and strategies are formed on these predictive descriptors called batting averages. It is evident that, despite their relative imprecision, these guesstimates are extremely meaningful to the world at large.

Let's see what happens when the Living Algorithm processes this data stream – not as an extended fixed set, but as an ongoing stream.

Living Algorithm digests the Baseball Player's Data Stream of 'at bats'

This analysis might suggest that the Living Algorithm is nothing more than a subset of Probability. In the ensuing discussion, we hope to illustrate that the Living Algorithm is a unique approach to data analysis. Rather than being a subset, the Living Algorithm appears to be a valuable complementary approach. The Living Algorithm digests the exact same data stream – the baseball player's 'at bats'. From this data stream, the Living Algorithm generates a predictive cloud, consisting of the previously mentioned trio of descriptors. This predictive cloud describes the context of each moment in the player's career. Instead of characterizing the entire stream of 'at bats' as an enlarged fixed set, the Living Algorithm characterizes the changing pattern that results from a constant focus on the most recent at bats. While Probability weights each data point (each 'at bat') equally, the Living Algorithm assigns the greatest weight to the most recent 'at bat' (data point), and scales the rest of the 'at bats' in descending order from the present. Accordingly, the Living Algorithm's predictive cloud is context sensitive, adjusting to recent 'at bats' and providing predictive information about the next 'at bat'.

Living Algorithm's up-to-date Predictive Cloud indicates batter is 'hot'.

Because of this context sensitivity, the Living Algorithm's predictive cloud provides up-to-date, relevant information as to the character of the next 'at bat'. The trio provides information about the hitter's current state of affairs – the position, range of variation, and direction of the momentum of the batter's hitting data stream. This information could lead to the following scenario. The hitter's batting average for the year is .333 (Probability's mean average). Complementing this knowledge, are the insights provided by the Living Algorithm – his current weighted average of his most recent ‘at bats’ is .375, with a range of .20 and a positive direction of .10. The Living Algorithm’s predictive cloud indicates that the batter is 'hot' right now. Recently, his weighted average of .375 exceeds his overall batting average of .333. In addition, he has been very consistent in recent ‘at bats’, as indicated by the tight range of variation (.20). Furthermore, his recent batting data stream has a positive momentum (+.10). These descriptors of the current state of affairs provide rough approximations of the immediate future. This information can be exceedingly relevant to the opposing pitcher and his coaching staff.

Living Algorithm's up-to-date Predictive Cloud indicates batter is 'cold'.

In contrast, a hitter might have the same batting average for the year of .333, but the predictive cloud could indicate his weighted average is .285, with a range of .80 and a negative direction of .20. This indicates that the batter is currently 'cold'. His weighted average of .285 is less than his overall average of .333. His performance is erratic, as indicated by the large range of variation (.80). Furthermore his current hitting momentum is negative (–.20). In this case, the Living Algorithm’s predictive cloud would provide data that could lead to a very different set of strategies for the opposing pitcher and his coaching staff.

Could Living Algorithm be relevant to living systems?

In both scenarios, the batting average generated by Probability remains the same. Yet, these two different hypothetical moments in the hitter’s data stream suggest that there are two very different patterns at work. The Living Algorithm reveals these diverse patterns by providing unique information about the data stream of a hitter that is both timely and context sensitive. Could it be that what is relevant to the data stream of a baseball player may also be relevant to the data streams of other living systems?

Probability best characterizes Season; the Living Algorithm the Moment

The use of Probability's mean average, the famous batting average, is certainly a better way to characterize the player's entire season than the use of the Living Algorithm. Probability's general averages are adequate, providing a fairly accurate summation of annual talent. This information is essential when determining annual awards (MVP) and the next year's rewards (salaries). However, it is equally certain that the batting average for the entire season does not provide up-to-date information as to the hitter's status at the current time. For those who have a stake in the current game, these general averages merely provide a diluted reflection of the player's present status. In contrast, the Living Algorithm's predictive clouds provide up-to-date information that is extremely relevant to the manager, the pitcher, and even the betting community. On the other hand, this up-to-date information, while relevant to the next game, loses its potency when applied to the entire year.

Beautiful example of Complementary Nature of the 2 Systems

This example from America’s game beautifully illustrates how these two approaches to data analysis complement each other. Probability's measures accurately characterize the fixed data set of the year, while the Living Algorithm's measures accurately characterize each baseball moment by analyzing the dynamic data stream of ‘at bats’.

10-day Average vs. the Predictive Cloud

Statistics augment Coach's invaluable intuitive sense

Parenthetically, we would like to note that statistics are not the only way an effective coach evaluates the momentum of player performance. One of the intangible qualities of a talented coach is the ability to recognize when players are hot and when they are not. A keen intuitive sensitivity regarding this changeable momentum can be a potent skill in an effective manager’s toolbox. Beyond intuition however, what else does the baseball community do to address the notion of performance momentum? Coaches do rely on statistics to help inform their decisions and to provide a reality check for their intuitive insights. Timely information may even have the capacity to stimulate intuitive insights. Let’s examine one typical approach that baseball utilizes to characterize srecent performance.

The 10-day average – Probability’s attempt to characterize recent events

The baseball community clearly recognizes that a ball player’s seasonal average or career average is a big picture measure of performance. When the baseball community wants to focus on the momentum of a player’s recent plate appearances, it generally takes a snapshot of a recent series of at-bats. The 10-game batting average is typical of this attempt to complement big-picture seasonal statistics with an updated sense of the pattern of recent performances.

Small picture averages, a cave man version of Living Algorithm’s Predictive Cloud

The sports community intuitively understands that there is more at work than big picture averages. Their use of statistics, like the 10-day batting average, is designed to reveal a recent piece of the picture. We applaud their efforts to recognize the importance of the present moment. In fact, there are obviously times when a feel for the present moment is the most important thing to specific members of the baseball community. Those whose business is game day strategy are necessarily concerned with the momentum of recent performances. This contemporary insight is every bit as significant as overall seasonal averages to the baseball community. However, in the analysis that follows, we shall claim that this solution is just a junior version of what the Living Algorithm’s predictive cloud offers – the cave man model, as it were.

10-day average: a limited analysis of the moment

Let’s focus upon the nature of the 10-day average as an object of comparison with the predictive cloud. Essentially, a 10-day average offers a very limited analysis of player performance. First, the 10-day batting average is a single measure of a fixed set, while the Living Algorithm offers a trio of measures. Second, every at bat in a 10-day average is equally weighted, so that what happened 10 days ago is just as important as what happened yesterday. And finally, the 10-day average snapshot requires extensive recordkeeping in contrast to the Living Algorithm.

Living Algorithm’s Predictive Clouds: a trio of measures

Initially, we note that the Living Algorithm’s predictive cloud provides a trio of measures, while the 10-day average provides a single measure. The trio indicates a batting average that: 1) reflects recent performance (similar to baseball’s 10-day average), 2) an estimate of the probable range of variation from the recent average, and 3) the direction of the momentum of the recent average. This trio of measures provides a 3 dimensional perspective on a player’s recent performance, as contrasted with the 1 dimensional perspective of the 10-day average.

Living Algorithm weights recent moments most heavily

Not only does the Living Algorithm provide 3 measures for the price of 1, but, its focus is more sharply attuned to the present moment. When we examine the single measure provided by the 10-day average, we find a second significant area of contrast. The Living Algorithm addresses the concept of a 10-day average in a different manner. While the 10-day average weights each at-bat equally over a 10-game stretch, the Living Algorithm assesses each at-bat on a sliding scale that gives the greatest weight to the most recent at-bat. While the traditional 10-day average considers an at-bat from 10 days ago to be as significant as an at-bat yesterday, the sliding weighted scale considers the most recent at bats to be a better indication of the pattern that incorporates the present moment. In essence, the Living Algorithm’s weighted focus provides a constantly updated sense of the pattern of recent player performance in a way that the more generalized focus of the 10-day average does not .

Predictive Cloud more user-friendly

In addition to providing 3 measures whose weighted focus is more attuned to the present moment, the Living Algorithm also provides a more user-friendly approach. The traditional approach to computing a batting average, regardless of whether it is a seasonal average or a 10-game snapshot, requires a database that includes each individual at-bat. Each new at-bat in the 10-game snapshot requires constantly adjusting the members of the set by adding the most recent at-bat to the database and removing the now out-dated at-bat. In contrast, the Living Algorithm integrates the current at-bat into her updated, ongoing weighted averages – the trio of measures that constitute our predictive cloud. Once integrated, the raw data has served its function, and has no further meaning or importance. The simplicity of the Living Algorithm’s approach to recordkeeping requires no database, and therefore, contrasts favorably with the raw data requirements of the traditional 10-day average.

10-day average stuck in Probability’s paradigm, when better is available

The 10-day average, while focusing a little more closely upon recent events, is still stuck in Probability’s paradigm. This paradigm makes estimates about the future based upon general statements about a fixed set. By viewing the data stream as an extended fixed set the traditional paradigm ignores important potential information. In contrast, the Living Algorithm is designed to access these storehouses of potential information. In essence, the Living Algorithm digests a stream of data by relating a sequence of numbers to each other in an evolving, dynamic manner.

Summary

The Living Algorithm provides a trio of measures to mine this untapped potential information. This evolving trio consists of the following ongoing descriptors: 1) batting average, 2) a range of variation, and 3) a description of recent trends (momentum). These evolving measures weight the data stream of at-bats on a sliding scale – according to their proximity to the most recent data point. Further, the Living Algorithm’s simple algorithm (procedure) is more user-friendly than the unwieldy use of the 10-day average. Rather than relying on a database that consists of all relevant at-bats, the Living Algorithm requires only the memory of evolving measures that characterize the most recent player performance.

Living Algorithm & Probability: Complementary Systems of Analysis

Predictive Information, although imprecise, filled with relevance

These measures, not only describe the characteristics of the entire season or of the dynamic moment, but also provide predictive information. The predictive information contained in either of these measures does not satisfy conventional mathematics’ demand for predictive rigor. However, these estimates of future performance are extremely relevant to those making decisions about the next season or about the next game. Probability distributions allow scientists to set the confidence limits that enable them to make precisely defined predictions about fixed sets. These probability distributions, where each known element is weighted equally, are however, helpless before the dynamic nature of the unknown future. Even though Probability computes the batting average, it can't set the confidence limits for this average at a level that would satisfy traditional standards of predictive rigor. A baseball player’s performance is too idiosyncratic to create the population size requirements of traditional Probability mathematics. This problem regarding the idiosyncrasy of living data streams suggests that the batting average may very well serve as a representative example of living systems.

Living Systems require Focus on the Moment

Instead of Probability's well-defined fixed-set predictions, living systems require rough approximations of the ongoing patterns of data streams. These rough estimates enable a range of interpretation and response to the environment, which is represented by an inherently changeable data stream. What these descriptive measures lack in predictive rigor, they more than make up for by focusing their predictive relevance on the next moment of performance. Only the Living Algorithm provides the unique trio of predictive descriptors that enhance the ability for the flexible interpretation and response that is essential for living systems.

Living Algorithm's Data Stream Mathematics broadens Current Paradigm

Note: the new perspective of data stream mathematics is meant to broaden the current paradigm. The current paradigm, which has evolved over centuries, has become very adept at the mathematics of fixed data sets. Our new perspective does not find fault with the descriptive and predictive power of this traditional approach. We do, however, suggest that the traditional approach has limits to its explanatory and predictive power. These limits lie in the inability to address the immediate and dynamic nature of the biological world. By addressing these current limitations, a Mathematics of Data Streams appears to be a perfectly, necessary complement to the Mathematics of Data Sets. One approach deals with the general and permanent features of fixed populations (fixed sets), while the new approach deals with the immediate features of a dynamic data stream. The unique predictive descriptors of the two approaches (Probability’s data sets and the Living Algorithm’s data streams) provide a dual perspective that better encompasses the whole of inanimate and animate existence. Similarly balancing the general/permanent components of fixed set mathematics with the changeable/immediate, components of a mathematics of data streams, should provide a more complete and deeper insight into the nature of living systems.

Links

Probability and Data Stream Mathematics of the Living Algorithm System do not merely represent a difference of degree, but rather a difference in kind. To see why these two forms of data analysis are complementary check out the next article in the series – Mathematics of the Moment (vs. Probability).

To see why Life is filled with doubt despite this successful interaction, read Is the Living Algorithm just an insignificant subset of Probability?

Home Articles Prior Next Comments

3. The Batting Average Living Algorithm vs. Probability