I just uploaded a new Video, "Variance". It is part of the playlist on Variation, Variability, Dispersion, and Spread. Below are the 6 Keys to Understanding for this concept. See the videos page of this website for the latest status of available and planned videos.
Statistics Tip: In ANOVA, Sum of Squares Within (SSW) is the sum of Variations within each of several datasets or Groups.
In ANOVA, Sum of Squares Total (SST) equals Sum of Squares Within (SSW) plus Sum of Squares Between. (SSB). That is, SST = SSW + SSB. In this Tip, we'll talk about Sum of Squares Within, SSW. In ANOVA, Sum of Squares Within (SSW) is the sum of Variations within each of several datasets or Groups.
The following illustrations are not numerically precise. But, conceptually, they portray the concept of Sum of Squares Within as the width of the “meaty” part of a Distribution curve – the part without the skinny tails on either side.
Here, SSW = SS1 + SS2 +SS3
Sorry to have to say this, but errors have been found in the book. The errors and corrections are shown on the new "Errata" page on this website.
The concept of Null Hypothesis can be confusing for many of us. It is a statement of nonexistence, For example:
"There is no (Statistically Significant) difference between the Means of these two Populations."
And we normally think in terms of what exists, not what doesn't. It would be more natural for most of us to start by asking a question instead:
"Is there a (Statistically Significant) difference between the Means of these two Populations?"
Then, we could rephrase it as a Negative Statement to produce a Null Hypothesis, as shown below.
Sums of Squares is an important concept in Variation, playing a major role in ANOVA and Regression. 4 Keys to Understanding and plenty of graphics help to give the viewer an good conceptual understanding.
For the latest status of available and planned videos, visit the Videos page of this website.
A method for performing 1-way (single factor) ANOVA can be illustrated in 7 steps, as shown in this concept flow diagram:
A Sum of Squares SS, is a measure of Variation within one Sample. In fact, it is the numerator in the formula for Variance.
SSB and SSW are described further in two of my recent Tips of the Week. And this 7-Step method is covered in my video: ANOVA -Part 3 (of 4): 1-Way aka Single Factor.
Step 1. Calculate the Sum of Squares (SS) for each Sample.
SS is a measure of Variation within one Sample. In fact, it is the numerator in the formula for Variance.
Step 2. Add all these up for all Samples to get the Sum of Squares Within.
SSW is a measure of the Variation within all the Samples.
Step 3. Calculate the Overall Mean, of all the data values in all Samples.
Forget which data values go with which Samples, just put them all in one bucket and calculate the Mean.
Step 4: Sum up the squared differences between each Sample Mean (X-bar) and the Overall Mean (X double bar), to get Sum of Squares Between.
Step 5: Calculate the Mean Sum of Squares Within (MSW) and Between (MSB).
Sums of differences (like SSW and SSB) provide a gross measure of Variation. But it is often not meaningful to compare sums of different numbers of things. Averages (Means) are generally more meaningful than totals. So, we calculate MSW and MSB.
Step 6: Perform an F-test
The crux of ANOVA is comparing the Variation Within groups to the Variation Between (Among) groups. A group is a dataset like a population, process, or sample. The best way to do a comparison is to calculate a ratio. The F-statistic is a ratio of two Variances, MSB and MSW.
Note that this is a different concept from the usual F-test comparing Variances of two Samples. In that case, the Null Hypothesis would be that there is not a Statistically Significant difference between the Variances of two Samples. Although MSB and MSW have formulas like Variances, MSB and MSW contain information about the differences between the Means of the several groups. They contain no information about the Variances of the groups.
In the F-Test within ANOVA, the ANOVA Null Hypothesis is that there is not a Statistically Significant difference between MSB and MSW – that is, there is not a Statistically Significant difference among the Means of the several Groups.
Our choice of the Significance Level, Alpha, (most commonly 5%) determines the value of F-critical, and the F-statistic (calculated from the Sample data) determines the value of the Probability p. Comparing p to 𝛼 is identical to comparing F and F-critical.
If F ≥ F-critical (equivalently, p ≤ 𝜶), then there is a Statistically Significant difference between the Means of the groups. (Reject the ANOVA Null Hypothesis).
If F < F-critical (p > 𝜶), then there is not Statistically Significant difference between the Means of the groups. (Accept/Fail to Reject the ANOVA Null Hypothesis).
Statistics Tip of the Week: For Nuisance Factors in Designed Experiments, Block what you can and Randomize what you can't.
In our Statistics Tip of the Week for April 6, 2017, we described the difference between Common Cause Variation in a process and Special Cause Variation. Common Cause Variation is like random noise in a process that is under control. Special Cause Variation comes from external factors outside the process, like the effect that the ambient temperature in a factory rising through most of the workday has on a chemical reaction. This type of factor is sometimes called a "Nuisance Factor" in the discipline of Design of Experiments.
And, we said that any Special Cause Variation must be eliminated before one can attempt to narrow the range of Common Cause Variation. Narrowing the range of Common Cause Variation is a major objective of process improvement disciplines like Six Sigma.
Factors -- the inputs -- are denoted by x's, and y is the output -- also known as the Response. Briefly, statistical software for Design of Experiments can provide us the number of trials ("Runs") to do and the levels of x for each trial. We might get test results that look like the following. (There is a lot to explain here, more than we can cover in this blog post).
Our Tip for April 12, 2018 said that Designed Experiments, together with Regression Analysis, can provide strong evidence of Causation. When we're doing these experiments, we are often not able to easily get rid of the Special Cause (Nuisance Factor) Variation, but we can try to reduce or eliminate its effect on the experiment.
A known Nuisance Factor can often be Blocked. To “Block” in this context means to group into a Block. By so doing, we try to remove the effect of Variation of the Nuisance Factor. In this example, we Block the effect of the daily rise in ambient temperature by performing all our experimental Runs within a narrow Block of time. And, if it takes several days to complete all the Runs, we do them all at a similar time of day in order to have the same ambient temperature. We thus minimize the the Variation in y caused by the Nuisance Factor.
There can also be Factors affecting y which we don’t know about. Obviously, we can’t Block what we don’t know. But we can often avoid the influence of Unknown Factors (also known as “Lurking” Variables) by Randomizing the order in which the experimental combinations are tested.
For example – unbeknownst to us – the worker performing the steps in a process may get tired over time, or, conversely, they might “get in a groove” and perform better over time. So, we need to Randomize the order in which we test the combinations of Factors. Statistical software can provide us with the random sequences to use in the experiment.
Andrew A. (Andy) Jawlik is the author of the book, Statistics from A to Z -- Confusing Concepts Clarified, published by Wiley.