This was uploaded to my YouTube channel Statistics From A to Z, Confusing Concepts Clarified. It will be part of a playlist on Samples and Sampling.

See the __Videos__ page of this website for the latest status of my statistics videos completed and planned

]]>A coin flip illustrates this perfectly. You either get "heads" or "tails", and the Probability of each coin flip (Binomial Trial) is always 50% heads (or 50% tails).

In a Binomial Trial, each trial is counted as either a success or failure. And a success is defined as what we want to count. Let's say we are performing quality control in a manufacturing process. We are counting defects. Every time we find a defect, we add 1 to the count of "successes".

I always found that confusing, so in the book, instead of saying "success" or "failure", I suggest saying "yes" or "no".

]]>I always found that confusing, so in the book, instead of saying "success" or "failure", I suggest saying "yes" or "no".

Here are the 6 Keys to Understanding the concept. This 17 minute video explains them in detail. ]]>

In this example, we are testing 2 Factors for their effect on the y Variable, Cleanliness.

We see from the graph that -- for all three levels of Factor B, Detergent #1 cleans better than Detergent #2. The lines are substantially separated, indicating that the difference is Statistically Significant. (The ANOVA numbers will tell us for sure.)

If Factor B did have an effect, the lines would be slanted. Again, separated lines tell us that Factor A has an effect.

- Factor A is the Detergent type. There are 2 "levels": Detergent #1 and Detergent #2.
- Factor B is the Water Temperature. There are 3 levels: Cold, Warm, and Hot.

We see from the graph that -- for all three levels of Factor B, Detergent #1 cleans better than Detergent #2. The lines are substantially separated, indicating that the difference is Statistically Significant. (The ANOVA numbers will tell us for sure.)

If Factor B did have an effect, the lines would be slanted. Again, separated lines tell us that Factor A has an effect.

If the lines were not separated, as below, then Factor A does not have an effect.

]]>The Regression analysis comes up with the values for b and a.

**Residuals represent the Error in the Regression Model**. They represent the Variation in the *y* variable which is __not explained__by the Regression Model.

So,**Residuals must be Random. ** If not -- if Residuals form a pattern -- that is evidence that one or more additional factors (*x*'s) influence *y*.

A Scatterplot of Residuals against*y*-values should illustrate Randomness:

So,

A Scatterplot of Residuals against

Being Random means that the Residuals

Here are some patterns which indicate the Regression Model is incomplete.

]]>- are Normally distributed
- have constant Variance
- show no patterns when graphed
- have no unexplained Outliers

Here are some patterns which indicate the Regression Model is incomplete.

This tip is about what one might call an "asymmetry" in concept names.

- Beta,
*β*, is the probability of a Beta ("False Negative") Error - So, one would think that "Alpha",
*α*, would be the Probability of an Alpha ("False Positive") Error. Right? - Wrong!
*p*, the*p*-value, is the Probability of an Alpha Error.

So, what is Alpha? First of all, the person performing the a statistical test selects the value of Alpha. Alpha is (called the "Significance Level"). It is 1 minus the Confidence Level.

Alpha is the maximum value for *p*(the probability of an Alpha Error) which the tester is willing to tolerate and still call the test results "Statistically Significant".

For more on Alpha and *p, *you can view 2 videos on my YouTube channel __http://bit.ly/2dD5H5f__

]]>- Alpha, the Significance Level https://youtu.be/rl_J9UTXiMA
- p, p-value
__https://youtu.be/vyX4m89VkyI__

Let's say we're comparing 3 Groups (Populations or Processes) from which we've taken Samples of data.

**ANOM retains the identity of the source of each of these variations **(#1, #2, and #3), and it displays this graphically in an ANOM chart like the one below. In this ANOM chart, we are comparing the defect rates in a Process at 7 manufacturing plants.

The dotted horizontal lines, the **Upper Decision Line, UDL and Lower Decision Line, LDL, define a Confidence Interval**, in this case, for α = 0.05. Our conclusion is that only Eastpointe (on the low side) and Saginaw (on the high side) exhibit a Statistically Significant difference in their Mean defect rates. So **ANOM tells us **__not only ____whether __any plants are Significantly different, but __which __ones are.

In ANOVA, however, the individual identities of the Groups are lost during the calculations.

In ANOVA, however, the individual identities of the Groups are lost during the calculations.

The 3 individual variations __Between__the individual Means and the Overall Mean are summarized into one Statistic, MSB, the Mean Sum of Squares Between. And the 3 variations __Within __each Group are summarized into another Statistic, MSW, the Mean Sum of Squares Within.

- The formulas for MSB and MSW are specific implementations of the generic formula for Variance.
- So, MSB divided by MSW is the ratio of two Variances.
- The Test Statistic F is the ratio of two Variances.
- ANOVA uses an F-Test (F = MSB/MSW) to come to a conclusion.
- If F ≥ F-Critical, then we conclude that the Mean(s) of one or more Groups have a Statistically Significant difference from the others.

This is the 4th video in a planned playlist on Statistical Tests. See the Videos page on this website for a list of the available and planned videos.

]]>