0 Comments
The concept of ANOVA can be confusing in several aspects. To start with, its name is an acronym for "ANalysis Of VAriance", but it is not used for analyzing Variances. (F and Chisquare tests are used for that.) ANOVA is used for analyzing Means. The internal calculations that it uses to do so involve analyzing Variances  hence the name.
For more details on ANOVA, I have a 6video playlist on YouTube.
I just uploaded a new video to YouTube: https://youtu.be/gGkRkDBlICU For the latest status of my videos completed and planned, see the videos page on this website.
There are a number of seesaws (aka "teetertotters" or "totterboards") like this in statistics. Here, we see that, as the Probability of an Alpha Error goes down, the Probability of a Beta Error goes up. Likewise, as the Probability of an Alpha Error goes up, the Probability of a Beta Error goes down. This being statistics, it would not be confusing enough if there were just one name for a concept. So, you may know Alpha and Beta Errors by different names:
The seesaw effect is important when we are selecting a value for Alpha (α) as part of a Hypothesis test. Most commonly, α = 0.05 is selected. This gives us a 1 – 0.05 = 0.95 (95%) Probability of avoiding an Alpha Error.
Since the person performing the test is the one who gets to select the value for Alpha, why don't we always select α = 0.000001 or something like that? The answer is, selecting a low value for Alpha comes at price. Reducing the risk of an Alpha Error increases the risk of a Beta Error, and vice versa. There is an article in the book devoted to further comparing and contrasting these two types of errors. Some time in the future, I hope to get around to adding a video on the subject. (Currently working on a playlist of videos about Regression.) See the videos page of this website for the latest status of videos completed and planned. I just uploaded a new video. It's on Covariance, and it is the first in a playlist on Regression.
Most users of statistics are familiar with the Ftest for Variances. But there is also a ChiSquare Test for the Variance. What's the difference? The Ftest compares the Variances from 2 different Populations or Processes. It basically divides one Variance by the other and uses the appropriate F Distribution to determine whether there is a Statistically Significant difference. If you're familiar with ttests, the Ftest is analogous to the 2Sample ttest. The Ftest is a Parametric test. It requires that the data from both the 2 Samples each be roughly Normal. The following compareandcontrast table may help clarify these concepts: ChiSquare (like z, t, and F) is a Test Statistic. That is, it has an associated family of Probability Distributions.
The ChiSquare Test for the Variance compares the Variance from a Single Population or Process to a Variance that we specify. That specified Variance could be a target value, a historical value, or anything else. Since there is only 1 Sample of data from the single Population or Process, the ChiSquare test is analogous to the 1Sample ttest. In contrast to the the Ftest, the ChiSquare test is Nonparametric. It has no restrictions on the data. Videos: I have published the following relevant videos on my YouTube channel, "Statistics from A to Z"
There are 3 categories of numerical properties which describe a Probability Distribution (e.g. the Normal or Binomial Distributions).
Skewness is a case in which common usage of a term is the opposite of statistical usage. If the average person saw the Distribution on the left, they would say that it's skewed to the right, because that is where the bulk of the curve is. However, in statistics, it's the opposite. The Skew is in the direction of the long tail.
If you can remember these drawings, think of "the tail wagging the dog." New Video: Standard Error.
See the Videos pages of this website for a listing of available and planned videos. Many folks are confused about this, especially since the names for these tests themselves can be misleading. What we're calling the "2Sample ttest" is sometimes called the "Independent Samples ttest". And what we're calling the "Paired ttest" is then called the "Dependent Samples ttest", implying that it involves more than one Sample. But that is not the case. It is more accurate  and less confusing  to call it the Paired ttest. First of all, notice that the 2Sample test, on the left, does have 2 Samples. We see that there are two different groups of test subjects involved (note the names are different)  the Trained and the Not Trained. The 2Sample ttest will compare the Mean score of the people who were not trained with the Mean score of different people who were trained.
The story with the Paired Samples ttest is very different. We only have one set of test subjects, but 2 different conditions under which their scores were collected. For each person (test subject), a pair of scores  Before and After  was collected. (BeforeandAfter comparisons appear to be the most common use for the Paired test.) Then, for each individual, the difference between the two scores is calculated. The values of the differences are the Sample (in this case: 4, 7, 8, 3, 8 ). And the Mean of those differences is compared by the test to a Mean of zero. For more on the subject, you can view my video, t, the Test Statistic and its Distributions. One would think that a chemical engineer would be pretty adept at technical things. And one with a PhD even more so. But, it seems that the confusion sowed by statistics knows no bounds. A PhD in chemical engineering recently told me, "I never did get the hang of statistics."

AuthorAndrew A. (Andy) Jawlik is the author of the book, Statistics from A to Z  Confusing Concepts Clarified, published by Wiley. Archives
December 2018
Categories 