This tip is about what one might call an "asymmetry" in concept names.

- Beta,
*β*, is the probability of a Beta ("False Negative") Error - So, one would think that "Alpha",
*α*, would be the Probability of an Alpha ("False Positive") Error. Right? - Wrong!
*p*, the*p*-value, is the Probability of an Alpha Error.

So, what is Alpha? First of all, the person performing the a statistical test selects the value of Alpha. Alpha is (called the "Significance Level"). It is 1 minus the Confidence Level.

Alpha is the maximum value for *p*(the probability of an Alpha Error) which the tester is willing to tolerate and still call the test results "Statistically Significant".

For more on Alpha and *p, *you can view 2 videos on my YouTube channel __http://bit.ly/2dD5H5f__

]]>- Alpha, the Significance Level https://youtu.be/rl_J9UTXiMA
- p, p-value
__https://youtu.be/vyX4m89VkyI__

Let's say we're comparing 3 Groups (Populations or Processes) from which we've taken Samples of data.

**ANOM retains the identity of the source of each of these variations **(#1, #2, and #3), and it displays this graphically in an ANOM chart like the one below. In this ANOM chart, we are comparing the defect rates in a Process at 7 manufacturing plants.

The dotted horizontal lines, the **Upper Decision Line, UDL and Lower Decision Line, LDL, define a Confidence Interval**, in this case, for α = 0.05. Our conclusion is that only Eastpointe (on the low side) and Saginaw (on the high side) exhibit a Statistically Significant difference in their Mean defect rates. So **ANOM tells us **__not only ____whether __any plants are Significantly different, but __which __ones are.

In ANOVA, however, the individual identities of the Groups are lost during the calculations.

In ANOVA, however, the individual identities of the Groups are lost during the calculations.

The 3 individual variations __Between__the individual Means and the Overall Mean are summarized into one Statistic, MSB, the Mean Sum of Squares Between. And the 3 variations __Within __each Group are summarized into another Statistic, MSW, the Mean Sum of Squares Within.

- The formulas for MSB and MSW are specific implementations of the generic formula for Variance.
- So, MSB divided by MSW is the ratio of two Variances.
- The Test Statistic F is the ratio of two Variances.
- ANOVA uses an F-Test (F = MSB/MSW) to come to a conclusion.
- If F ≥ F-Critical, then we conclude that the Mean(s) of one or more Groups have a Statistically Significant difference from the others.

This is the 4th video in a planned playlist on Statistical Tests. See the Videos page on this website for a list of the available and planned videos.

]]>Correlation mainly uses the Correlation Coefficient,

Correlation analysis and Linear Regression both attempt to discern whether 2 Variables vary in synch.

In Correlation, we ask to what degree the plotted data forms a shape that seems to follow an imaginary line that would go through it. But we don't try to specify that line.

Correlation Analysis does not attempt to identify a Cause-Effect relationship

The data can be distributed in any way. For example -- as shown above -- it can be double-peaked and asymmetrical, or it can have the same number of points for every value of *x*. If we take many sufficiently large Samples of data with any Distribution, the Distribution of the Means (x-bar)'s of these Samples will be approximate the Normal Distribution.

There is something intuitive about the CLT. The Mean of a Sample taken from any Distribution is very unlikely to be at the far left or far right of the range of the Distribution. Means (averages), by their very definition, tend to average-out extremes. So, their Probabilities would be highest in the center of a Distribution and lowest at the extreme left or right.

Less intuitively obvious is that the CLT applies to Proportions as well as to Means.

There is something intuitive about the CLT. The Mean of a Sample taken from any Distribution is very unlikely to be at the far left or far right of the range of the Distribution. Means (averages), by their very definition, tend to average-out extremes. So, their Probabilities would be highest in the center of a Distribution and lowest at the extreme left or right.

Less intuitively obvious is that the CLT applies to Proportions as well as to Means.

Let's say that *p*is the Proportion of the count of a category of items in a Sample, say the Proportion of green jelly beans in a candy bin. We take many Samples, with replacement, of the same size *n*, and we calculate the Proportion for each Sample. When we graph these Proportions, they will approximate a Normal Distribution.

How large of a Sample Size,*n, *is "sufficiently large"? It depends on the use and the statistic. For Means and most uses n > 30 is considered large enough. But for Proportions, it's a little more complicated -- it depends on what the value of *p *is. *n *is large enough if *np ***> 5 **__and__ *n*(1 - *p*) > 5.

The practical effect of this is:

This table gives us the specifics; the minimum Sample Size,*n*, is shown in the middle row.

]]>How large of a Sample Size,

The practical effect of this is:

- If the value of
*p*is near 0.5, then we can get by with a smaller Sample Size. - If the value of
*p*is close to 0 or 1, then we need a larger*n*.

This table gives us the specifics; the minimum Sample Size,

Let's say we are inspecting shirts at the end of the manufacturing line. We may be interested in the number of defective Units – shirts, because any defective shirt is likely to be rejected by our customer. However, one defective shirt can contain more than one defect. So, we are also interested in the Count of individual defects – the Occurrences – because that tells us how much of a quality problem we have in our manufacturing process.

For example, if 1 shirt has 3 defects, that would be 3 Occurrences of a defect, but only 1 Unit counted as defective.

We would use the Poisson Distribution in analyzing Probabilities of Occurrences of defects. To analyze the Probability of Units, we could use the Binomial or the Hypergeometric Distribution.

There is an article in the book focusing on the Poisson Distribution. There is also a__video__, on my YouTube channel, Statistics from A to Z.

]]>We would use the Poisson Distribution in analyzing Probabilities of Occurrences of defects. To analyze the Probability of Units, we could use the Binomial or the Hypergeometric Distribution.

There is an article in the book focusing on the Poisson Distribution. There is also a