Statistics
Glossary K-N

These are Glossary entries K-N. For Additional Notes, click on the bar within or at the end of the entry.

L M N

L

Least Squares. A method for determining the line that comes nearest to passing through a set of data points. The squares come in because of Pythagoras' theorem about triangles. The method aims to minimize (hence the word "least") the sum of the differences from the data points to the line in question.

Line. A straight line as drawn on graph paper has the equation y = ax + b, where a and b are any positive or negative numbers. The quantity a gives the slope of the line (the angle it makes with the x axis), and b gives the intercept, or the point at which the line cuts the y axis.

ln(x) = natural logarithm of x.

Logarithm. The power to which 10 [or some other base number] must be raised to give the desired number. 10x² = 100, so the logarithm of 100 is 2. The generalization of this definition to fractional powers of 10 is slightly counterintuitive. For an explanation and some basic formulas see Logarithms in the Math section.

log(x) = logarithm of x.

Lucus a non lucendo (Lt) "it is called a grove because it is not light," a Roman grammarian's absurd attempt to find the etymology of lucus "grove of trees" in lux, lucis "light." Used of any reverse derivation, usually pejoratively. There is an especially pejorative use in Problem 8a of Lesson 16.

M

m = the sequence of integers, considered as beginning with 0 (compare n). Thus m = 0, 1, 2, 3, 4, 5, . . . The first value of the sequence, m(1), equals 0.

m is also used here for the mean of the values in a data set (as distinct from the mean of a population; see m). See Mean for a definition, and see Standard Deviation for the use of m in a formula. The usual symbol for m in this sense is x-bar.

Mean. The mean or average of a group of n numbers a, b, c, . . . is their sum divided by n, thus m = (a + b + c . . .) / n. Example, the mean (or average) of 7, 8, 9 is (7 + 8 + 9) / 3 = 24 / 3 = 8. The mean is the commonest measure of centrality. It is appropriate when the values tend to cluster toward a center, and when there are no or few outlying extreme values. When most values cluster but extreme values occur, the Median is a better measure of central tendency. Where there are no or few middling values, it is sometimes possible to use the Mode, but it is questionable whether the concept of "central tendency" properly applies in such cases.

Median. The middle member of a set of data values arranged in increasing order of size. If the number of data values is even, there will be two middle values; in this case we take the average of those two values as the median. The median has the property that half the values in the set (half minus 1, if the median itself is one of the values) lie above it, and half lie below it. It is more useful than the Mean in situations where the values cluster toward a center but extreme values are likely to occur. The corresponding measure of disperson is the Interquartile Range. See also Mode.

Minus Sign (-). See under Absolute Value.

Mode. The commonest value in a data set. In the set 1, 2, 2, 3, 4, 7, the value 2 occurs most often and is therefore the mode, or modal value. If a set has more than one mode (as does 1, 2, 2, 3, 7, 7), it is said to be bimodal. The mode may be adopted as a centrality measure in situations where the mean is distorted by one or two extreme values, and the median is meaningless because there are in effect no middle values.

m (Greek: mu). The usual symbol for the Mean of a distribution, one of the two parameters which define a particular version of the normal distribution curve. One should carefully distinguish mu or m (the mean of all the values of x in the population, or population mean) from m or x-bar (the mean of a sample of values of x drawn from that population).

N

n = the sequence of integers, considered as beginning with 1 (compare m). Thus n = 1, 2, 3, 4, 5, . . . The first value in the sequence, n(1), equals 1. Compare m.

Napierian logarithm = natural logarithm. The implied tribute to John Napier is not undeserved, but it is anomalous, and we will use the term "natural logarithm" instead.

Natural Logarithms have the base e = 2.7182818. . .

Nominal Data. "Name data:" information bits whose labels simply serve to distinguish them, and don't imply any size or sequence relationships. The names Andy, Bert, Charlie, Dave, and Ernie (plus their associated heights in inches) just tell us where the associated measurements come from; they don't help us interpret those measurements. We don't know how they finished in the sack race. Compare Ordinal.

Nonparametric. Problems for which a distribution curve cannot be drawn, either because the parameters of the equation are not known, or because there is no equation at all. Nonparametric statistical tests are typically simpler, and also more robust, than those of parametric statistics.

Normal. The technical meaning of this word in statistics is "governed by a norm." The data numbers are assumed to be more or less successful approximations of that norm; there is nothing going on in the set except a more or less successful effort to reach the norm. Also popularly used in the sense of "corresponding to the norm," being average for the set, or "within the usual limits" (as a "normal" blood pressure). Blood pressures, however, are not in the technical sense normally distributed, and this usage is thus treacherous for the student of statistics.

Normal Distribution. One of several distributions characterized by data points with most values close to the center or norm, and fewer and fewer occurring as we move further and further out from the norm. Among those distributions, the normal distribution is what results if you let grains of sand fall through a pinhole onto a level surface: few will bounce far away. If a data set closely fits a normal distribution curve, the implication is that nothing is going on except random variation around a norm.

Null Hypothesis. The assumption that nothing other than chance is operating to produce the effect which we see in a particular data set. The null hypothesis is rejected if a particular outcome, or the data set as a whole, is very unlikely to have been produced by chance. No particular alternative hypothesis is thereby proved; the only conclusion is that something is going on. See Lesson 1.