Statistics
Glossary C-E

These are Glossary entries C-E. For Additional Notes, click on the bar within or at the end of the entry.

C

Centrality. Some, not all, data sets have a tendency to cluster around a central value. This tendency is called "centrality." There are three common measures of centrality in data sets which possess it. These are the Mean, Median, and Mode. None is permanently better than the others. Each has uses in different situations, depending on how the numbers in the data set are distributed; see also Clustering. The mean is usually preferred in statistical practice, but for reasons sometimes having more to do with the ease of calculating it than with its appropriateness for the problem. For the complementary measure, of how widely data points are scattered around a central value, see Dispersion.

Certainty. The degree to which we are sure that some result is significant; that is, not due to chance. The point of much statistical analysis is to separate out, from the random background noise of chance variation, any effects that are probably produced by other causes. The level of certainty chosen by a given experimenter, called the Alpha Level, determines the boundary between the "probably random" and the "probably nonrandom." In these Lessons, we will mostly use the level a = 0.99 (meaning that we have 99% certainty that something nonrandom is going on), but see the discussion in Lesson 1.

Chi Squared. A test of "goodness of fit" between two data sets. Though often misapplied, it is useful within its proper limits. The Greek letter chi (c, pronounced "kye") was chosen by Karl Pearson for this test, which he invented in 1900, because its graphically suggests "crossover" or "cross-comparison." We will write it as c² only in formulas. In one's own research notes, and even in print, c² tends to look a lot like x². Pearson should have chosen another symbol. There have been various improvements on the original c² formula; see for example Cramer's test.

Clustering. The tendency of data points to occur in bunches, rather than evenly spaced over their range. A data set which tends to bunch only in the middle is said to possess Centrality. Data sets which bunch in several places do not possess Centrality. What they do possess has not been very much studied, and there are no infallible methods for locating the describing more than one cluster in a data set (the problem is much worse when some of the clusters overlap). The existence of clustered data sets is one easy refutation of the notion that everything in the world can be mapped onto the Normal or Gaussian paradigm. For a lawful but also non-normal distribution, see Cauchy Distribution.

Coefficient. A multiplier. In the expression 3x, 3 is the coefficient of x. There is no usable etymological connection between "coefficient" and the concept of "efficient." Treat it as an opaque term that must be learned as such.

Combinations. The number of ways n items may be combined together, irrespective of order. When counting combinations, the coin toss results HHT, HTH, and THH count as one, since they have the same constituents (2H, 1T) and differ only in the order in which those constituents are arranged. Compare Permutations, where the order of the constituents does make a difference. At least one great mathematician was confused by the difference between combinations and permutations. Take time to get the distinction clear in your mind.

Conditional Probability. The probability of event B, given that a previous event A has already occurred. We will use the notation P(B < A). The conventional form is P(B|A).

This concept is central to the Bayesian approach, but as far as is known, any problem in conditional probability can be solved without using Bayesian methods, or accepting Bayesian philosophical assumptions. We will on the whole adopt this noncommitted approach in these pages.

Cube Root. The cube root of n is here written *3Ö(n). As with powers, the asterisk is meant to indicate that the following number should be a superscript. See Roots.

D

Data. The set of information points we have to work with. There are different types, which it is crucial to distinguish. See Nominal, Ordinal, Interval.

Data Set. Any group of related values which we are considering together. The results of several trials of the same experiment, or measurements of the same quantity.

Dispersion. A measure of the degree to which the data points are spread out from the central value. There are three such measures in common use, the mean deviation, the interquartile range, and, far most commonly, the standard deviation. For the other descriptor of a set of data points, see Centrality.

E

e [Greek e or "epsilon"] = 2.7182818. . , an irrational number which appears in equations having to do with growth. It is the base of natural or Napierian logarithms. The procedures of statistics can't be carried out without the aid of irrational numbers like e. This amounts to saying that statistics can't avoid considering the way in which quantities grow. See also p.

Euler's Constant. The number 0.5772165 . . . , which arises in such formulas as Euler's Approximation for the partial sum of a harmonic series.

Expectation. The expectation in a wager is the value of the prize times the chance of getting it. If the value of a prize is \$10 and you have a 1/10 chance (p = 0.1000) chance of winning it, your expectation is \$1. This has meaning for many people making the same wager over time; the total winnings of such a group divided by the total number of people betting on winning them will be approximately \$1. The professional gambler occupies a similar position. But for the individual betting once or rarely, a different calculus of desirability will apply. For such an individual, the either/or options are more drastic, and any risk of losing may offset even a considerable chance of winning. This fact contradicts some standard "economic man" assumptions, not to mention the gambling metaphor that is basic to the argument for Bayesian Statistics. See also Pascal's Wager.

Exponent. A number indicating the power to which another number is to be raised; that is, the number of times that number is to be multiplied together. It is normally written as a superscript, for example x² = (x)(x), the product of 2 x's. It is only possible to write a few exponents as actual superscripts on Internet pages, and we here write most exponents as numbers or quantities preceded by an asterisk. Thus x² is here often written as x*2, and x³ as x*3. More complex exponents are invariably in the form x*(n + 1), and so on. The raised position of the asterisk is intended to remind readers to visualize the number or quantity which follows as being in a typographically raised, or superscript, position. Negative exponents are here usually avoided by the use of reciprocals.