2.5 Data Stream Characteristics

Home Science Page Data Stream Momentum Directionals Root Beings The Experiment

2.5 Data Stream Characteristics

A. An Exploration of Live & Dead Data

The Precision of Live Data?

A second more serious question would be whether the precision of the Data itself is worth considering. Many scientists believe that Live Data is not worth considering. We will explore this topic, briefly, now on the ideational level, i.e. introducing concepts only, nothing technical. {This complex issue is tackled more completely in the Notebook, Random Data.}

Live & Dead Data

If it wasn't mentioned before, we will mention now that Random Data is one-dimensional because it has no Duration. It just is. It is sharp, hard, cold and considered Dead Data. The closer Real data comes to this sharpness the Deader it becomes. But if Data goes to other extreme it becomes meaningless. At that point, the fuzziness has taken over and nothing can be discriminated anymore. The line is of Hard and Soft Data. The harder it becomes, the Deader it becomes. Scientists love Dead Data. The softer the Data becomes the more Alive it is, to the extreme, when it carries no information at all. But when Data is totally Dead, the amount of information carried is also narrowed. When Data dies, it becomes one dimensional, losing its relativity. This is good if you want to hang onto a deterministic universe, bad if you want to describe the uncertainties of Life. However without its Dead aspect the Data has no meaning, it is all uncertainty and no certainty. We are not talking about Random Numbers. We are talking about Data that is so alive that it has no relationship to anything. It is a number, which is pretending to be Real, which in fact is Random. A Random number doesn't pretend to be 2D; it knows it is 1D. Numbers that are too alive are the bane of Scientists. We will use the example of communication, verbal and non- verbal, to illustrate the connections of Live and Dead Data.

An example: Learning to Speak & Communicate

Learning to talk. First, there are meaningless sounds with no relation to anything. This is the equivalent of Live Data. Then the child begins to communicate, through this gibberish. At first communication is basic and uncomplicated. "Wahhh! I'm hungry, tired, diaper rash, teething." Take your pick. Then gradually the child begins to use words. Initially the use is rudimentary and still the communication stays simple. "Nurse! Mine! No!" Then gradually with increasing vocabulary and understanding, the level of communication becomes much more complex. At this point the child mixes his non-verbal communication skills with the words to communicate in an incredibly rich way. As the child progresses to adulthood, the non-verbal communication skills fade away replaced by the narrowly defined words. Because we adults write history and the non-verbal can't be communicated through words, we tend to minimize the amount of complex information communicated in a non-verbal way by young children. We want to kill the words, by giving them a distinct meaning, distinct spelling, and distinct pronunciation. We teach proper diction and intonation. And soon that rich contextual information flow has been standardized so that any adult anywhere can understand what is being said. By the time the words reach adulthood, they have died, each with a distinct definition, standardized by dictionaries. Dictionaries started out descriptive but then became definitive.

Written Words are Dead Data pretending to be Alive: Regeneration

Written words are like Dead Data, communicating something very distinct. This is very important or you would not be able to read this piece. By defining words and concepts mathematically, I kill words completely so that there is no misunderstanding. For written literature this is very important because there is no context except the word. It is a one-dimensional source, pretending to be multi-dimensional. Can't you hear the pealing of church bells over the sweet smelling countryside?

The Very Merry Middle of the Very Muddy Puddle

The point we're trying to reach is that Data which is somewhere between Live and Dead carries the richest amount of information. However this is because of the existence of the non-verbal dimension. "I prefer my words dead, thank you. Without the non-verbal dimension." However in an interactive situation, it is always pleasant to have a strong non-verbal component to keep things interesting. Below is a chart, which shows the balance.

Degeneration: Spoken Words Sometimes Pretend to be Dead

The Written Word for the most part followed the Spoken Word. Initially it was only descriptive, i.e., trying to emulate people's speech. But then as with dictionaries, the written word became definitive. It was then, whenever that was, and whoever cares, that Cultured people everywhere began to chain the speech of everyday people with definitions. Of course everyday people everywhere refused to be chained and so the non-verbal component of language continues to flourish. Although there are insidious creatures everywhere trying to weed out the non-verbal aspect of communication; maybe because they have killed their own ability to understand non-verbally. Sounds like another story.

From Another Perspective

The above chart shows a two dimensional perspective. Below is a more complete three-dimensional perspective of the same phenomenon. In some ways it is three 2D graphs in one 3D graph. When the cube is viewed from straight in front it is similar to the graph above.

The Loss of Dimensionality

When it is turned towards one diagonal the Non-Verbal Plane disappears as in the Chart below on the left. When it is turned towards the other diagonal the Verbal plane disappears as in the Chart below on the right.

The point we're making is that verbal and nonverbal communications are not on one line or one plane together but instead are on two orthogonal or perpendicular planes. Each plane exists in its own right independent of the other, as is shown in the graphs above. So a degeneration to either all nonverbal communication or all verbal communication is a retreat from a dimension of communication with a resulting loss of information. Neither verbal nor nonverbal communication tells the whole story. When one loses the ability to communicate non-verbally, one loses a whole dimension of communication not just a degree. One relinquishes dimensionality.

B. Data Stream Predictability

Data Stream Density of Data Streams and their Averages

One way of distinguishing Data Streams is by their predictability. The chart below exhibits the differences. Live Data Streams and their measures are semi-predictable because their Data Stream Density is between zero and one. Dead Data Streams are totally predictable because there Data Stream Density is one. Random Data Streams are totally unpredictable because their Data Stream Density is zero. Random Average Streams are very predictable because their Data Stream Density approaches one. We will discuss this anomaly in the following paragraphs.

Random vs. Average Random

Our Random Data Stream has a 0.00 Density because the Realm of Probability equals the Range of Possibility. The next number in a Random Data Stream has an equal probability of being any number in a Range of Possibilities. Ironically, although the Density of a Random Data Stream is zero, the Density of an Average Random Number Stream approaches one as N, the number of elements in the Data Stream, becomes larger and larger. While Random numbers can readily bounce all over the Range, Random Averages are very stable. As a matter of fact as N, the number of Data Bytes, increases, the Random Average gets locked closer and closer to the exact middle of the Range.

An example: Random Averages Most Predictable

Let us create a Random Average Data Stream. For our Stream, the Duration is 30 random numbers. X_n is the nth Data Point in the stream and is the average of the nth 30 random numbers. Let N be 30. By heavy probability the average of these averages would be very close to the exact middle of the Range, as before, but this time the SD would be close to zero. While the Random Data Stream has no area of Improbable Possibility, the Random Average Stream has a great area of Improbable Possibility. Actually as the Duration and N increase the area of Improbable Possibility approaches the Range. Hence the Data Stream Density approaches one. {For more technical details, see the Random Averages Notebook.}

Live Data Streams & their Averages

In The Experiment the Duration was 30 Data Bytes, which created an average, which was the Data Byte for the Stream. Ironically Random Averages are much more stable than Live Averages. While the Data Stream Density of the Random Average Stream approaches one, this is not at all true for Live Data Streams. Because of the vagaries of Life, Live Data Streams can change unpredictably at any time. Live Data makes large radical changes because of the inherent nature of Life. This is also true with Live Average Data Streams. While the predictability of Random and Random Average Data Streams differ dramatically, the predictability of Live and Live Average Data Streams are variable. If the Random Averages and the Live Averages are generated the same way then the Random Average Stream is much more predictable.

C. Emergence or Different Durations

Emergence in Random Number Streams

Although two Data Streams may be based upon the same numbers, when the Duration is different, they may have totally different characteristics. In the case of Random numbers, when the Duration was only one, i.e. each Data Point was its own Duration, then the Stream was totally unpredictable and the Data Density was zero. When however the Duration was 30, i.e. 30 Random Numbers went into making the Data Point, then the Data Density was exactly opposite, approaching one. From Random and Chaotic on one level, the Random Data Stream emerges to ordered and organized on another level. What looks chaotic up close has a rigid order viewed far away. Emergence of Order from Chaos with just Random Numbers. {See Random Average Notebook for more detail.}

Too big, nothing distinguishable: Too small, Static

This emergent property of Data Streams also applies to Live Data Streams. If a big enough Duration is chosen with broad enough criteria, then nothing exists, practically speaking, because nothing can be differentiated from the rest. Everything is thrown into the box and no individual Data Streams exist. Likewise if a small enough Duration is chosen with equally refined Criteria, then Chaos emerges, as no Pattern can exist. For instance if we are studying Sleep and choose a Duration of minutes with brain waves as the criterion, the multiplicity of the Data destroys the pattern. All that is perceived is Static. Each Duration and Criterion creates different emergent patterns from the same data.

Different Durations yield independent results: an Emptiness Principle example

This variability is not to be perceived as a shortcoming of Data Stream analysis. Each of these emergent properties exists on the level of Duration and criterion that they were generated on. We'll see in the Emptiness Principle Notebook, that the Emptiness Principle applied to consciousness on the daily level, i.e. the Duration is one day, yields sleep. The Emptiness Principle applied to consciousness on a weekly level, yields weekends. The Emptiness Principle applied to consciousness on an annual level yields vacations. The same principle applied to the same phenomenon on a decade level yield sabbaticals, ironically the same root word as the weekly break, the Sabbath. Each of these needs for emptiness is mutually independent of the rest, although they are based ultimately upon the same Data.

A Data Stream follows Data Stream Rules & Mechanisms

Regardless of the variations, Data Streams still follow Data Stream rules. With a Duration of 1 the Standard Deviation of the Random Data Stream was at a maximum. As the Duration increased, i.e. Random Numbers are combined to form mean averages in larger and larger groups, the Standard Deviation decreases, approaching zero. These variations are emergent. Regardless of variation a Data Stream is a Data Stream and must follow the probabilistic mechanisms of a Data Stream. There are also certain laws and rules that limit the behavior of the Data Stream and its derivatives. The Nature of Data Stream has to do with its laws and mechanisms. This Study, with a Capital 'S', is about the Nature of Data Streams, Live Data Streams in particular.

D. Killing a Data Stream vs. a Data Stream that Dies Naturally

Killing a Data Stream merits a Nobel prize consideration

A Live Data Stream is semi-predictable on all levels, unless it dies, not killed. A Live Data Stream is killed when it is functionalized, turned into a Dead Data Stream. If a Live enough Data Stream is killed, the Scientist/Hunter is in the running for a Nobel prize. This is why mathematicians don't get Nobel prizes. They study systems but they are not hunters. They do not kill Live Data Streams, although their systems allow others to kill Live Data Streams.

A Data Stream that dies

However, we are not talking about killing Data Streams here. We are merely talking about Data Streams that die. A Data Stream that dies is one that ends with an infinite string of zeros. Once a Data Stream comes into existence, its measures never completely die. They all approach zero. This is predictable. Because we as finite beings can never complete an infinite string of zeros, we can only assume by looking to the Source that a Data Stream has died. Looking only to the Data is not enough. Data Streams that have been comatose for years can spring into existence, unannounced. This is one of the features of unpredictability.

E. Predictive Probability and Change

Central Tendencies Descriptive and Predictive

What have we established? We have established that each Data Stream has central tendencies. These central tendencies describe the Data Stream and also make certain predictions as to the subsequent Data Bytes belonging to the same Stream.

As an example:

Suppose someone has averaged 6.7 hours of sleep per day averaged over a month for 5 years with a SD of 0.3. This describes the Data Set and also inherently makes a prediction about the next member of the Stream. It makes the prediction that because 70% of the previous Data Bits fell between 7.0 and 6.4 that there is also a 70% chance that the next Bit will fall within that range. For the same reasons it also predicts that there is over a 99% chance that the next Bit in the Stream will be between 5.8 and 7.6 hours Sleep per day averaged over the next month. Obviously some people sleep more than this and others sleep less. Equally obvious is the possibility that our subject could sleep more or less than these limits. At the extreme end our subject could sleep 24 hours per day, say perhaps if he were injured and went into a coma. However the likelihood of his next Data Point being beyond the prescribed range is 1 in 400.

Something has Changed

The next point that needs to be stressed is that if the next Data Point is beyond the limits prescribed by the average and SD, that most likely the source upon which we've drawn our Data Bit has changed. Going back to our example: if our subject registered 8.0 hours Sleep/day in the next month, we might find that he was very sick. If it were 5.0 hours Sleep, we might find that our subject was in a crisis situation, which created the need for a unique response. If the rise in Sleep continued we might suspect depression or a lingering illness. If the fall in Sleep continued we might predict a Crash in the coming months to balance the deficit.

A Diagnostic Tool

The point made is that the Data Stream makes predictions independent of Source. If however a new reading is beyond the levels of prediction, then something in the Source has changed. The Data Streams and their measures can be used as a Diagnostic tool.

The Underlying Assumption

'If our Source has remained unchanged then we can expect our Data Streams to continue within their prescribed limits, plus or minus 3 SD from the average. Conversely if any of our Data Streams exceed the prescribed limits then something has changed in the Source.'

Once, an Accident; Twice, a Probability, Thrice a Certainty

It is of course possible that the new Data Bit falls accidentally outside the limits, once. Twice in a row would be considered in the realm of the impossible. Once outside 3 SD happens 1/4% of the time. While two consecutively outside the limits would occur only 1/4%*1/4% = 0.25%*0.25% = 0.0025*0.0025 = 0.00000625 = 1/160,000 = 1/400*1/400 of the time. One reading outside the range sets up the warning flag. Two in a row sets off the alarm. Three in a row is certainty for us mortals. Change in the Source is the certainty, whether internal, external, or both.

The Data Stream Carries its own information

The Data Stream carries information independent of the Source. We do not need to know the age, sex, race, parents, grandparents, weight, eye color, job or address of our subject to make predictions off the Data Stream. This leads to some interesting conclusions. 1. The numbers themselves make certain predictions concerning human behavior, independent of neurological function, environment, heredity, race or species. 2. Emergence arises from the choice of Duration independent of planetary, solar or biological cycles. Each Duration has its own dynamic although it describes the same activity. Although each Duration Cycle is generated independently of the others, their interactions interrelate.

F. Beginning Data Stream Quakes

Cutoff Point, Justification

We have chosen 3 SD from the Average as our cutoff point for Probable vs. Improbable, but it could have been chosen anywhere. A radius of 3 SD from the Average contains 99.74% of the Data, while a radius of 2 SD from the Average contains 95.44% of the Data, and a radius of 1 SD from the Average only contains 68.26% of the Data. We would recommend a cutoff between two and three SD from the mean. This would contain from 95.44% to 99.74% of the Data. Remember we are not trying to strangle the Data Stream. We want to contain it but to also give it breathing room. When the cutoff point is too high everything is probably possible. When the cutoff point is too low not enough of the Data is contained.

What constitutes a Quake in a Live Data Stream?

Remember again that because we are talking about Live Data Streams, we expect abrupt jumps outside these bounds. Because we are using this as a detection system, we need sensitive parameters. The New Data will be outside 3 SD, 1 out of 400 times. The New Data will be outside 2 SD, 1 out of 20 times. The New Data will be outside 1 SD, 1 out of 3 times. Because we are speaking of a Data Stream, not isolated pieces of Data, even 1 out of 400 times is not in the realm of the impossible. However for 2 pieces of Data in a row to be outside 3 SD will occur only 1 out of 160,000, definitely in the realm of the miraculous. When 2 pieces of Data in a row are beyond 2 SD from the mean will occur only 1 out of 400 times. Curiously 2 Data Bits in a row beyond 2 SD is the same chance as 1 Data Bit above 3 SD. 2 pieces of Data in a row beyond 1 SD from the mean will occur 1 out of 9 times. 3 pieces of Data in a row are beyond 1 SD from the mean will occur 1 out of 27 times, not at all improbable. Who cares? This is well within the range of the ordinary and reveals nothing. Because we are just trying to determine when there has been a change, we will stick with any Data Reading over 3 SD, or 2 in a row over 2 SD from the mean.

Bound Systems

These distinctions are especially useful when dealing with bound systems, where all is possible but not much is probable. As an example: the Bound System of the 24 hour day is broken into many parts, each of which can be anywhere between 0 and 24 hours, witness the Guinness World Book of Records where people have been even known to kiss for over 24 hours. While any activity that is recorded can possibly fall between these limits of 0 and 24, this bound system, it is unlikely or improbable that any activity will fall outside a tightly prescribed area. While 24 hours is possible, it is not probable.

When New Data falls consistently outside the Realm

One of the goals of this study is to identify when a Quake has occurred. When the New Data Bytes fall repeatedly outside the Realm defined by the Deviation, then we know from probability that something has changed in the Source's response to his environment. (Or, as mentioned previously, in the experimenter's response to his Data, i.e. the experimenter could change the way he collects Data. However we will ignore that possibility in this theoretical discussion.)

Too Much Stability? Alive & Dead Measures

We spoke of probability associated with Standard Deviations and Averages. We spoke of stability in relationship to N, the number of trials, i.e. the number of elements in the Data Stream. Once again we don't want to Kill our Stream. An N that is too large creates a Stream that is so stable that it dies. An N that is too small yields a Data Stream, which is too Alive because it is so changeable, and therefore more unpredictable. {See Decaying Averages Notebook.}

G. Underlying Assumptions

Underlying Assumptions, #1: Choice exists

Once again, we must emphasize the postulates or underlying assumptions to this study. Acknowledged is the possibility that we have only one Choice or that we have an indeterminate amount. But we are going to assume that many choices exist, Live Data Sets exist. {See Live Data Set Notebook} An underlying assumption is that the truth lies somewhere between the polarities of Determinism and Free Will and that hence Change is possible but within certain boundaries. These boundaries are what the study is about.

Underlying Assumptions, #2: Data Streams have an independent life

A second and perhaps more controversial assumption is that the Mathematics of the Data Streams generated by our living, willing subjects, has a life of its own. The Data Stream describes, predicts, diagnoses, and prescribes from its nature as a Number untied to any biological function or any other characteristic of the source of the Data Stream.

If these Assumptions are unbelievable then Shut Down

If the Reader rejects either of these hypotheses, out of hand, then stop. Turn off your computer. Change disks. Go back to your mindless Computer games, to pop culture. If you believe that enough equations can predict any situation, if you believe that God knows what is going to happen next, if you believe that you are independent of your Past, then please shut down and return to your latest distraction. If however you are willing to extend me these possibilities, then fasten your seat belts and Hold on for the ride of your life.

Home Science Page 2. Data Stream Momentum Next Section