2.1 The Complexities of Data accumulation

Home Science Page Data Stream Momentum Directionals Root Beings The Experiment

2.1 The Complexities of Data accumulation

A. Generating Data

Data Streams & Data Sets: preliminary definitions

This Notebook is all about Data Streams. What is a Data Stream? A Data Stream is growing set of Data. This distinguishes it from a Data Set, which is a static set of Data. All of life and existence consists of a multitude of growing Data Streams. Scientists, for the ease of study, have broken the multitude of Data Streams into Data Sets. This breaks up a dynamic system into static subsets. The whole growing phenomenon is broken into parts. Divide and Conquer. We can now study the dead parts, easily, with our super-duper electron-laser-optical instruments. This study, however, is about growing, changing aspects of Life rather than its static, dead side. We have the same problem with multiplicity that any scientist has. We, however, have broken the Phenomenon of Life into a series of living Data Streams, instead of a series of static Data Sets.

Definition: Data is a number that quantifies experience.

We keep talking about Data. Before getting into the dynamics of Data Streams, we need to look at Data itself. What is it and how is it generated? First what is it? Data is defined as a number, which quantifies experience. (We realize that there are many other forms of data qualifiers, i.e. color, smell, taste for instance, but even these must be turned into numbers to deal with them mathematically. Since this is a mathematical study, we only deal with data after it has been converted to a number.)

First: Build a Data Box with Criteria to distinguish inside from outside

For a Data Stream to exist first a box must be made to hold the data. It can be a large box or a small box. But the box is necessary to determine what goes inside and what stays outside. With no box there is no data, because data, by its nature delimits, qualifies, and quantifies. Information comes in. Does it go in the box or does it stay out of the box? This is the first question. The boundaries of the box are the Criteria. How do we know what goes into the Box and what stays out? The Criteria. They are like a filter. They block out certain data and allow other data to enter.

Second: What number do we put in it?

We will soon see the limits of Criteria, but first let us take an example. We build a box with boundaries, criteria, determining when something is awake and when it is asleep. Let us put anything in the box that is awake and everything out of the box that is asleep. The second question is what number should we associate with this asleep/awake phenomenon? At the simplest the number could be a one if the person-data-generator is awake and a zero if the data-generator was asleep.

Third: What Duration of time do we base the number on?

This leads to the third question: How often to we put a one in the box? This is the question of time duration. We know to assign a 1 to awake and a zero to asleep but how often do we thrown a 1 into the Data Box? Continuously you say, having taken Calculus and feeling very smart. If a continuous number of 1's, no matter how short, is thrown into the box, it immediately adds up to infinity because of the density of the real number line. We are not talking about integrals of continuous functions here. We are talking about the generation of data. We must decide if we will put a one in the box, every moment, second, minute, or every hour. For simplicity let us throw a 1 in the box every hour that our subject is awake.

B. Beginning Fuzzy Data

Difficulties with fuzzy borders

Sounds simple? Try accumulating some data. Do we put a 1 in the Box if someone falls asleep half way through the hour? What happens if the person is groggy, half asleep/half awake, punch drunk, or falling in and out of sleep. It doesn't simplify things to take a smaller time increment because we still have the transitional zones and transitional states. This is where Data Density comes in with an assist from Fuzzy Logic.

How to deal with Transitional States

To deal with these transitional zones and states let us increase our choice of numbers. Let us instead throw a number in the Data Box between 0.0 and 1.0 at the end of every hour, which best represents our assessment of the state of sleep/wake for that hour. If the subject was asleep for half the hour throw a 0.5 into the box. If the delirious patient is only 1/2 awake throw a 0.5 into the box for that hour (Subjective, but all live data is somewhat subjective. See below a discussion of Live and Dead Data.)

'Both/And' Data rather than 'Either/Or'

Now our Data can more accurately reflect the state of affairs. Instead of being required to say that the subject was completely awake or completely asleep we can say that he was awake or asleep to a certain degree. The percent of sleep plus the percent of awake always equals 100 percent. Now the number thrown into the Data Box each hour will reflect the density of the Awakeness that hour. We can now define Data Density.

Definition: Data Density

Data Density is defined as the ratio between the real Data and the potential Data. The potential Data would be the maximum value the real Data could be. Thus the Data Density can never be below 0 or greater than 1. Therefore the Data density could be expressed as a percentage. If our subject were only half awake, his Data Density for that hour would be 0.50 or 50%. Data Density will be an important concept in the development of Fuzzy Science.

A Fractal Aside

We threw 0.50 into the Data Box when our subject was half awake. We can further divide the Awake Data Box into smaller boxes. Perhaps our subject woke up at 6:30 and then read for a half an hour. Let us create a Read Box inside the Awake Data Box. Any reading done when the Awake Box is turned on will be thrown into Read Box inside the Awake Box. The rest of the Awake Data will be placed outside the Read Box but inside the Awake Box. For that half hour, Reading had a 100% data density in the Awake Box but only 50% density for the Hour. Each of the Boxes can be broken into infinitely many interior boxes. While each data point represents a pure value, they can be broken into myriad sub-values. The closer the experimenter looks, the more complex the details become. It is like fractal patterns, where the closer you look the more you see. This is also true with our example of Sleep, which can also be broken down into many sub-states. Each Live Data Stream is made up of many other Live Data Streams.

C. Equivalent Data Densities

A note on Fuzzy Logic

The idea of Data Density incorporates the concept of Fuzzy Logic, which says that the phenomenon is partly in the box and partly out of it. The only thing that needs to be quantified is what percentage is in the Box. (Instead of the word Box, we could have easily used the word Set instead.) Under our sleep/wake example when a 0.6 is thrown into the box, it means that for that hour 60% of the data is Awake data. It does not necessarily mean that the person was fully awake for 60% of the hour. It could just mean that our subject seemed to be a little more awake than asleep for that hour. It is not as if the subject is either fully awake or fully asleep. He could be in a transitional state, neither here nor there. The idea of Data Density allows us to talk about transitional states whether they are based upon the concepts of Fuzzy Logic or on Traditional Either-Or On-Off Logic. Following is a graph, which exhibits the difference between the two.

Notice that the same value would be thrown into the Data Box after the hour, but that they represent different phenomenon. Notice also that each Data Bit has two dimensions rather than one. This conception includes the ideas behind Fuzzy thinking as well as traditional Logic: a marriage rather than a divorce: Both-And & Either-Or. Finally be aware that the dimensions are flexible. The Data Number thrown into the box could be anything but the Data Density would always be 0.5, regardless of the limits.

Data Marriage & Beyond

The graph on the left indicates a type of Marriage between Either/Or & Both/And Data types. The subject is fully awake and then falls completely asleep. It also has a Density of 0.5, but is different again from the preceding Data Bits. Finally the graph on the right also has a Data Density of 0.5 but has a completely different type of representation than the other data bits. I will not try to speculate what it could mean, in terms of our Wake/Sleep Data Box, but I want to introduce this novel approach to Data now so that I can reintroduce it at a later date. All four data bits would throw the same 0.5 into our Data box and so are equivalent for our purposes.

There are two ideas that are important from this discussion. One is that the Data in our Stream is 2 dimensional. Two is that equivalent bits of data can have very different manifestations. This factor becomes very important when we begin discussing the Fractal Nature of Behavior. {See corresponding Notebook on Fractal Behavior}.

2D Data, Bi-Dimensional Numbers, and a Variable Limit

Although each of our Data Bits is an individual number, it represents an area or density as represented above. Although the maximum limit of each of these four graphs is 1, it could have been assigned any number. These are Unit Data Bytes, i.e., the Duration and Potential Data are both one. We agreed that 1 would be the maximum number that we would throw in the Box if the subject were fully awake for an entire hour. It could have just as easily have been called 60 or 3600 or any other number.

The Constancy of Data Density

When the duration and the maximum reading are both 1, then the Data Density and the Data are the same. If we threw 60 into the box whenever our subject was fully awake for the whole hour, we would only throw 30 into the box if our subject was asleep for half the hour and fully awake the rest. The Data would be 30 but the Data Density would be 50% or 0.50. The Data Density is the same no matter the limits, while the Data Byte could vary.

The Dimensions of the Data

Our Live Data has two dimensions, Duration and Density. The Density moving across the Duration is an Event. A precise beginning and end define the Duration. An Event is that which occurs between these limits. A Data Reading occurs at the beginning and end of each Duration. Each time a Reading is taken, a number quantifies an Event. This number tells what is as well as what isn't. (This concept becomes important, later on, in defining the Null Stream.)

Diagram: the First 3 Events of a Data Stream

Below is a diagram illustrating our words. Each square represents the whole potential event, perhaps a square hour of time. The shaded area represents the percent of time during that hour that the specified behavior was participated in. Data Density. The Data Reading would be a product of Data Density and Duration.

Behavior Studies

This discussion of Data Density primarily refers to Behavior Studies. Behavior Studies are the study of behavior, i.e. time spent per Duration. The Criterion determines when it is or is not the behavior. Or, on a more general level, the Criterion determines what percent the subject participates in the behavior during the set Duration. Our Data does not quantify performance. It only quantifies the amount of time spent.

The Event Box must have consistent limits

For Behavior studies, the potential upper limit of the data and the time duration must be consistent for each Event. The Experimenter could take a reading every second, every minute, every hour, every day, every week, or whatever Duration was acceptable. He could record the number of minutes, seconds, or years, whatever was convenient. But each Duration must be equal. When taking a Reading, one asks what percent was the Total Potential Event participated in during that Duration. If the Duration is an hour and the Readings are going to be taken in minutes, then the maximum Reading would always be 60 for each Duration.

The Ratio of Time to Time

Hence each data point has two dimensions. Data in general can have many different types of dimensions, i.e. height, speed, position. But for Behavior studies the two dimensions are both time dimensions, say minutes/hour, hours/day, and hours/month. The units of the Data Reading could be minutes/hour, shifts/week, hours/day, or many other combinations. Because of this, the number reduces to a pure number, although the two dimensions must be kept in mind to make sense of the data. For our Action studies, our units are hours per day.

D. Back to Data Density

The Undifferentiated Tao

Earlier in this notebook we saw some Data Bit diagrams. We briefly discussed Fractal Data. Let us return to the discussion. First is the undifferentiated Whole: the Flow without names, without distinction, the Tao. Then comes the polarity, differentiation: either this or that, or a mixture of the two, Yin Yang, Black and White, maybe even Gray. But then comes the Colors. Remember White consists of all the colors of the rainbow, unless you're working with crayons, of course. As any 4-year-old artist knows, all the colors then create Mud or Black.

The Day

We start out with the undifferentiated 24 hour Day, flowing endlessly. We then set up the first polarity, Waking and Sleeping, White & Black. But then the Waking hours are further broken into myriad categories, the beginning of Color. So our Data Bit assumes much more complexity. Let the duration of our Data Bit be a Day, which we break into ½ hour segments. It would look like this.

The Diagram of a Day

Note that in the diagram on the right that there are large blocks of continuous patterns as well as the same pattern occurring at different discrete spots in the diagram. The idea of Data Density ignores the fact that the patterns are separated, and only counts the number of ¼ hours that are covered with the same pattern, when throwing a data bit in the appropriate box.

Other Notes on the Diagram of the Day

Note also the white ¼ hour squares that are unpatterned. These are undifferentiated Awake times. Although the individual is awake, the ¼ hour doesn't fit into any of the Awake boxes. It also doesn't fall into the Sleep box. Finally be aware that each of these quarter hour boxes can also be fuzzy. The experimenter has called each box a distinct name and given it a corresponding pattern, but in reality each quarter hour square has some elements of uncertainty associated with it unless of course, it is Dead Data. {See Live Data Sets Notebook for more Info.}

Diagram of Undifferentiated Days

If we assume linear time, then our days link together consecutively in an undifferentiated manner. Data can be collected but it has no meaning if a Duration is not defined.

Diagram of the Differentiated Week

Following is the Diagram of the Week with one day Durations. Each Day has its own Data Associated with it and the corresponding Data Density, i.e., 3 hours of Writing in day one. How the hours are spread out in the day make no difference. Although' daily readings are taken, it doesn't really matter when the activities occur in the week, if our Duration is one week.

A Further Step Into Density

Then taking the Density a step further, we can say 2.5 hours of Writing per day per week. This one number can characterize the Writing for the designated week. If we had a series of numbers expressed in the same form which added up to 24 hours, then these numbers would completely characterize the entire week with nothing left over. In The Experiment, for reasons we will discuss in the Fuzzy Data Notebook, the Duration we chose was one month. Although' readings were taken daily, the readings were thrown into a monthly container and then averaged out. One number expresses the entire month's contents as an average daily amount.

The Data Bit for a Month

Above we showed a Data Bit for a Day when a Day is the Duration. Below is a Data Bit for a Month, if that is the Duration. In this scenario as in the Daily scenario, it doesn't matter how the Data is distributed. The only thing that matters is how many hours are in each box at the end of the month.

Home Science Page 2. Data Stream Momentum Next Section