Variation is also known as "Variability", "Dispersion", "Spread", and "Scatter". (5 names for one thing is one more example why statistics is confusing.) Variation is 1 of 3 major categories of measures describing a Distribution or data set. The others are Center (aka "Central Tendency") with measures like Mean, Mode, and Median and Shape (with measures like Skew and Kurtosis). Variation measures how "spread out" the data is.
There are a number of different measures of Variation. This compare-and-contrast table shows the relative merits of each.
Alpha is the Significance Level of a statistical test. We select a value for Alpha based on the level of Confidence we want that the test will avoid a False Positive (aka Alpha aka Type I) Error. In the diagrams below, Alpha is split in half and shown as shaded areas under the right and left tails of the Distribution curve. This is for a 2-tailed, aka 2-sided test.
In the left graph above, we have selected the common value of 5% for Alpha. A Critical Value is the point on the horizontal axis where the shaded area ends. The Margin of Error (MOE) is half the distance between the two Critical Values.
A Critical Value is a value on the horizontal axis which forms the boundary of one of the shaded areas. And the Margin of Error is half the distance between the Critical Values.
If we want to make Alpha even smaller, the distance between Critical Values would get even larger, resulting in a larger Margin of Error.
The right diagram shows that if we want to make the MOE smaller, the price would be larger Alpha. This illustrates the Alpha - MOE see-saw effect. But what if we wanted a smaller MOE without making Alpha larger? Is that possible? It is -- by increasing n, the Sample Size. (It should be noted that, after a certain point, continuing to increase n yields diminishing returns. So, it's not a universal cure for these errors.)
If you'd like to learn more about Alpha, I have 2 YouTube videos which may be of interest:
Continuing the playlist on Regression, I have uploaded a new video to YouTube: Regression -- Part 4: Multiple Linear. There are 5 Keys to Understanding, here is the 3rd. See the Videos pages of this website for more info on available and planned videos.
Categorical Variables are used in ANOMA, ANOVA, with Proportions, and in the Chi-Square Tests for Independence and Goodness of Fit. Categorical Variables are also known as "Nominal" (named) Variables and "Attributes" Variables.
The concept can be confusing, because the values of a Categorical Variable are not numbers, but names of categories. The numbers associated with Categorical Variables come from counts of the data values within a named category. Here's how it works:
Continuing the playlist on Regression, I have uploaded a new video to YouTube: Regression Part 3: Analysis Basics. It talks about things that are required for all 3 types of Regression covered in the book -- Simple Linear, Multiple Linear, and Simple Nonlinear Regression. Topics include clip levels for R squared, Residuals, establishing Cause and Effect, and the dangers of Extrapolation. See the videos page of this website for the status of completed and planned videos.
In Hypothesis Testing, before the data is collected, a value for Alpha, the Level of Significance, is selected. The person performing the test selects the value. Most commonly, 5% is selected.
Alpha is a Cumulative Probability -- the Probability of a range of values. It is shown as a shaded area under the curve of the Distribution of a Test Statistic, such as z.
If we have Distribution of a Test Statistic and a Cumulative Probability at one or both tails of the curve of the Distribution, software or tables will tell us the value of the Test Statistic which forms the boundary of the Cumulative Probability.
In the above concept flow diagram, we show how selecting Alpha = 5% for a one-tailed (right tailed) test results in the Critical Value being 1.645.
I earlier uploaded videos on the statistical concepts mentioned above to my YouTube channel: "Statistics from A to Z -- Confusing Concepts Clarified"
Continuing the playlist on Regression, I have uploaded a new video to YouTube; Regression -- Part 2: Simple Linear. See the videos page of this website for the status of completed and planned videos.
A Boxplot, also known as Box and Whiskers Plot, is a good way to visually depict Variation in a dataset (e.g., a Sample or Population). And showing several Boxplots vertically is useful for comparing Variation among several datasets.
The boxes depict the range within which 50% of the data falls for each dataset.
In this illustration, a higher score is better. Treatment A has the highest individual score, but it has considerable more Variation in results than Treatments B and C. The Medians for Treatments A, B, and C are fairly close. So, we can see at a glance that Treatment D can be eliminated from consideration. Treatment B has the highest Median and is gives very consistent results (small Variation). So, this plot may be all we need to select B as the best treatment.
Andrew A. (Andy) Jawlik is the author of the book, Statistics from A to Z -- Confusing Concepts Clarified, published by Wiley.