Confusing language and terminology is a big part of what makes statistics confusing. Each Binomial Trial -- also known as a Bernoulli Trial -- is a random experiment with only 2 possible outcomes, called "success" and "failure". The Probability of success is the same every time the experiment is conducted.
A coin flip illustrates this perfectly. You either get "heads" or "tails", and the Probability of each coin flip (Binomial Trial) is always 50% heads (or 50% tails).
In a Binomial Trial, each trial is counted as either a success or failure. And a success is defined as what we want to count. Let's say we are performing quality control in a manufacturing process. We are counting defects. Every time we find a defect, we add 1 to the count of "successes".
I always found that confusing, so in the book, instead of saying "success" or "failure", I suggest saying "yes" or "no".
I just uploaded a new video, "Samples, Sampling", to my YouTube channel. Here's the link to the video: https://youtu.be/bvglAgAZtXE. It is the first in a playlist on Samples. The other two videos will be about how to determine the Sample Size for Proportions and Count Date, and Sample Size for Measurement/ Continuous Data.
Here are the 6 Keys to Understanding the concept. This 17 minute video explains them in detail.
In this example, we are testing 2 Factors for their effect on the y Variable, Cleanliness.
We see from the graph that -- for all three levels of Factor B, Detergent #1 cleans better than Detergent #2. The lines are substantially separated, indicating that the difference is Statistically Significant. (The ANOVA numbers will tell us for sure.)
If Factor B did have an effect, the lines would be slanted. Again, separated lines tell us that Factor A has an effect.
If the lines were not separated, as below, then Factor A does not have an effect.
The purpose of Regression analysis is to develop a cause and effect "Model" in the form of an equation. To keep things simple, let's talk about Simple Linear Regression, in which the equation is
y = bx + a
The Regression analysis comes up with the values for b and a.
Residuals represent the Error in the Regression Model. They represent the Variation in the y variable which is not explainedby the Regression Model.
So, Residuals must be Random. If not -- if Residuals form a pattern -- that is evidence that one or more additional factors (x's) influence y.
A Scatterplot of Residuals against y-values should illustrate Randomness:
Being Random means that the Residuals
Here are some patterns which indicate the Regression Model is incomplete.
This is the final video in a 6-video playlist on Statistical Tests. youtu.be/MpQAxe6vM00
A key to understanding, compare and contrast tables and a 6-step process, help you gain a clear understanding of this concept. https://youtu.be/MnnOK0I2PLE
Statistics Tip: Beta is the Probability of a Beta Error, but Alpha is not the Probability of an Alpha Error
One of the things which make statistics confusing is the confusing language. There is the triple negative of "Fail to Reject the Null Hypothesis". There are several instances where a single concept has multiple names.
This tip is about what one might call an "asymmetry" in concept names.
So, what is Alpha? First of all, the person performing the a statistical test selects the value of Alpha. Alpha is (called the "Significance Level"). It is 1 minus the Confidence Level.
Alpha is the maximum value for p(the probability of an Alpha Error) which the tester is willing to tolerate and still call the test results "Statistically Significant".
In the Tip for September 8, 2018, we listed a number of things that ANOVA can and can't do. One of these was that ANOVA can tell us whether or not there is a Statistically Significant difference among several Means, but it cannot tell us which ones are different from the others to a Statistically Significant amount.
Let's say we're comparing 3 Groups (Populations or Processes) from which we've taken Samples of data.
ANOM calculates the Overall Mean of all the data from all Samples, and then it measures the variation of each Group Mean from that. In the conceptual diagram below, each Sample is depicted by a Normal curve. The distance between each Sample Mean and the Overall Mean is identified as a "variation".
ANOM retains the identity of the source of each of these variations (#1, #2, and #3), and it displays this graphically in an ANOM chart like the one below. In this ANOM chart, we are comparing the defect rates in a Process at 7 manufacturing plants.
The dotted horizontal lines, the Upper Decision Line, UDL and Lower Decision Line, LDL, define a Confidence Interval, in this case, for α = 0.05. Our conclusion is that only Eastpointe (on the low side) and Saginaw (on the high side) exhibit a Statistically Significant difference in their Mean defect rates. So ANOM tells us not only whether any plants are Significantly different, but which ones are.
In ANOVA, however, the individual identities of the Groups are lost during the calculations.
The 3 individual variations Betweenthe individual Means and the Overall Mean are summarized into one Statistic, MSB, the Mean Sum of Squares Between. And the 3 variations Within each Group are summarized into another Statistic, MSW, the Mean Sum of Squares Within.
4 Keys to Understanding and illustrated examples help you gain an intuitive understanding of this concept. https://youtu.be/VYCibUmLUic
This is the 4th video in a planned playlist on Statistical Tests. See the Videos page on this website for a list of the available and planned videos.
You can't use Linear Regression unless there is a Linear Correlation. The following compare-and-contrast table may help in understanding both concepts.
Correlation analysis describesthe present or past situation. It uses Sample data toinfera property of the source Population or Process. There is no looking into the future. The purpose of Linear Regression, on the other hand, is to define a Model (a linear equation) which can be used to predict the results of Designed Experiments.
Correlation mainly uses the Correlation Coefficient, r. Regression also uses r, but employs a variety of other Statistics.
Correlation analysis and Linear Regression both attempt to discern whether 2 Variables vary in synch. Linear Correlation is limited to 2 Variables, which can be plotted on a 2-dimensional x-y graph. Linear Regression can go to 3 or more Variables/ dimensions.
In Correlation, we ask to what degree the plotted data forms a shape that seems to follow an imaginary line that would go through it. But we don't try to specify that line. In Linear Regression, that line is the whole point. We calculate a best-fit line through the data: y = a + bx.
Correlation Analysis does not attempt to identify a Cause-Effect relationship, Regression does.
Andrew A. (Andy) Jawlik is the author of the book, Statistics from A to Z -- Confusing Concepts Clarified, published by Wiley.