Hypothesis Testing is confusing for many, if not most, people. The graphical concepts of Acceptance and Rejection Regions may help clarify the confusion.
Sample data is collected and the t-test is run. The test calculates a value for p. Like Alpha, p is a Cumulative Probability shown as an area under the curve of the t-Distribution. The illustration below shows the two possible results. Here, we show p as a hatched area and Alpha as a shaded area. The middle diagram shows the result in which p is calculated to be less than or equal to Alpha. In that case, the hatched area representing p fits entirely within the shaded Rejection Region which represents Alpha. So, the conclusion of the test is to Reject the Null Hypothesis.The right diagram shows the case in which
p is larger than Alpha. The hatched area representing p is larger than the shaded Rejection Region, and it extends into the unshaded Acceptance Region. So, the conclusion of the test is to Accept -- that is, Fail to Reject -- the Null Hypothesis.If you would like more information on Alpha, p, Null Hypothesis, Fail to Reject, and other concepts in Hypothesis Testing, there are individual videos on each concept in my YouTube channel, which has the same name as the book: "Statistics from A to Z -- Confusing Concepts Clarified."
0 Comments
The 10th in our irregular "You are not alone ..." series.
Even statisticians are not immune to misinterpretations of Null Hypothesis Significance Tests. http://bit.ly/2hdr11o A Statistic is a numerical property calculated from Sample data. A Test Statistic is one which has an associated Probability Distribution. Given a value for a Test Statistic, the Probability Distribution will tell us the Probability of that value occurring. How this is used in statistical tests and Hypothesis Testing is described in my video on the concept of Test Statistic.There are 4 commonly-used Test Statistics -- z, t, F, and Chi-Square. They are used in different types of test as summarized in the table below:Both t and z can be used in comparing Means. The test will tell you whether there is a Statistically Significant difference between the Means. But z has some shortcomings, especially when the Sample Size, n, is not large. So, it's probably best to use t for comparing Means. There are 3 different types of t-tests: - 1-Sample t-test
- 2-Sample t-test
- Paired t-test
The 1-Sample t-test compares a specified Mean to the Mean calculated from 1 Sample of data. The specified Mean can be a target value, a historical value, an estimate, or anything else. The difference between the 2-Sample and Paired t-test is explained in my first blog post, back in Sept. 22, 2016.The Mean is one Statistic. The Variance is another. There are two different Test Statistics used with Variances: F and Chi-SquareIf we want to determine if there is a Statistically Significant different in the Variance of 2 Populations or Processes, we use the Test Statistic F and an F-Test. This is analogous to the 2-Sample t-test.If, on the other hand, we want to compare the Variance of a Population or Process to a specified Variance, we use the Chi-Square Test Statistic and the Chi-Square Test for the Variance. This test is analogous to the 1-Sample t-test. Chi-Square is a versatile Test Statistic, It is used in 2 other types of statistical tests: - Chi-Square Test for Independence
- Chi-Square Test for Goodness of Fit.
The Chi-Square Test for Independence can tell us, for example, whether or not gender and ice-cream preference are independent (males and females show similar preferences) or dependent (one gender likes a given flavor and the other gender likes another.) The test is needed to determine if any observed difference is Statistically Significant. And the Chi Square Test for Goodness of Fit can tell us whether there is a Statistically Significant difference between a set of expected or predicted Frequencies (percentages converted to Counts) and the actual Frequencies shown in a Sample of data. For example, we might predict the set of percentages of customers each day as shown in the "Expected" row in the table below. And the "Observed" counts would be the number of customers who actually came. Is the expected/ predicted set of percentages a good fit with the actual? A "good fit" means that there is not a Statistically Significant difference between Expected and Observed. The Test Statistic z can be used to determine whether there is a Statistically Significant difference between the the Proportions of 2 Populations or Processes. It can also give us a Confidence Interval estimate of a Population or Process Proportion. For example,"The Proportion of voters who favor Candidate A is 55% plus or minus 2%."
The 80/20 "rule" is a bit of folk wisdom that appears to be widely (although roughly) applicable to many situations. One usage is that 80% of the effects come from 20% of the causes. This is often the case in Statistical Process Control, in which control charts and other tools are used to identify the causes or sources of defects in a process. In the example below, we show a simplified version of a Failure Mode Effects Analysis (FMEA). It calculates an Impact Score for each source of defects. The Impact of a source of defects is defined by its Severity multiplied by the number of times it was a source of a defect. We use this information to identify which -- and how many -- causes of defects to address. To make this obvious, and to aid in communication, we will display the Impact Scores in a Pareto chart. A Pareto Chart is actually two charts overlaid on each other: a bar chart and a line chart. The combined chart below -- the Pareto Chart -- has 2 vertical axes. The vertical axis on the left is for the bars. The vertical axis on the right is for the line. The line shows the cumulative percentage (of the impact score) for the first column, the first two columns, the first 3 columns, etc. There's nothing sacred about 80%. From this combined chart, we can see that we can address 74.1% of the defects by going after just 3 causes (the colored bars). After that, diminishing returns set in.
Here, we used the Pareto Chart to prioritize sources of defects. But it can be used to prioritize anything. Use it early and often where appropriate in your analysis. And it can be very helpful in communicating the conclusions to others.This video, like the article in the book on which it is based, explains 5 Keys to Understanding the concept. It is part of a playlist on Distributions which may include as many as 14 videos. For more on this, see the videos page of this website.For 2-sided tests using the Test Statistics z and t, which have symmetrical Distributions, there is only one Critical Value. That Critical Value is added or subtracted from the Mean.Since Chi-Square’s Distributions are not symmetric, the areas under the curve at the left tail and the right tail side have different shapes,for a given value of that area. So, there are two different Critical Values -– an Upper and a Lower –- for a 2-sided Chi-Square test.Unlike z and t, we do not add or subtract these Critical Values from the Mean. The two Critical Values of Chi-Square produced by tables, spreadsheets, or software are the final values to be used.- Upper Critical: 27.49
- df = 15
*𝛼*= 0.05- Lower Critical: 6.26
𝛼/2 and for 1 − 𝛼/2. Sometimes two different tables are provided for Upper and Lower Critical Values. Or spreadsheets or software will do this for you.The "Simple" means that there is only one x variable. y = f(x). So the curve exists in two dimension and can be plotted on screen or sheet of paper. The "non-linear" means that, when we graph the data, it does not roughly follow a straight line, so we must look for an appropriate curve.The following are probably the most common non-linear curves used: Exponential and Logarithmic curves have rapid accelerations or decelerations
in the slope. Power curves have a more gradual change. Polynomial functions can be used for more complex curves, which change direction, as shown in our Tip of the Week for May 17, 2017 ## Statistics Tip of the Week: an Exponential Distribution can be used with problems involving time.8/17/2017 The Exponential Distribution is useful for solving problems involving time to an event or time between events -- for example, time between emergency calls or time between equipment failures. It is especially useful with events that are relatively rare. If one were to analyze rare events per time period, using the Poisson Distribution, for example, the Counts might include a lot of zeros and an occasional 1. It may be more meaningful to think in terms of the time between events and measure the data that way. Then the Exponential Distribution could be used. An individual Exponential Distribution can be specified by just one Parameter – either the Mean (
µ), or the Rate (λ). λ = 1/ µ(If the Mean time to an event is 8 hours, then the Rate at which the events occur is 1/8 per hour.) An interesting fact about all Exponential Distributions: The Mean always splits the Cumulative Probabilities (areas under the curve) into 63% and 37%. In our March 2, 2017 Tip of the Week, we discussed 2-Factor (aka 2-Way) ANOVA. We said that:- Separated lines indicate that Factor A has an Effect, and
- Slanted lines indicate that Factor B has an Effect
parallel lines indicate that the two Factors do not interact.For example, in the left diagram, above, both detergents behave the same way to a change in water temperature -- they show no change in their Effect -- cleanliness. Likewise, the middle and right diagrams show both detergents behaving the same -- a parallel increase in Effect. But what if we got something like the two graphs below? In the example at left, Detergent #1 shows a substantial increase in effectiveness as the water temperature is increased. But for Detergent #2, heating the water has the opposite effect: its effectiveness is decreased. In the example on the right, both detergents show an increase in effectiveness as water temperature increases. But Detergent #2's increase is fairly minor. In fact, its increase may not be Statistically Significant and the Interaction may not be Statistically Significant..
In either case, we do have reason to suspect an Interaction. so 2-Way ANOVA . We must use 2-Way ANOVA Without Replication cannot be usedWith Replication.The With Replication method repeats (Replicates) the experiment several times for each combination of Factor A and B Values. This can provide sufficient data to quantify an Interaction.The number of Replications required to achieve a specified level of accuracy is determined by the methods of Design of Experiments, DOE. The Design also specifies the levels of each Factor to be used in each replication, the order of replication and other specifics of the experiment. The book has a 3-part series on DOE, and eventually there may be a video series on it as well.There is currently a video on the book's YouTube channel with more information about the subject of this post: ANOVA -- Part 4 (of 4): 2-Way (aka 2-Factor).The
video, like the article in the book, covers 6 Keys to Understanding the concept of Binomial Distribution. It is part of a playlist which is eventually planned to include the following videos. The first 3 are done. See the videos page of this website for the latest status.•Probability Distributions – Part 1 (of 3): What they are •Probability Distributions – Part 2 (of 3): How they are used •Binomial Distribution •Poisson Distribution •Hypergeometric Distribution •Normal Distribution •t Distribution •F Distribution •Chi-Square Distribution •Exponential Distribution •Probability Distributions – Part 3 (of 3): Which to use when •Sampling Distribution •Skew, Skewness •Variation/ Variability/ Dispersion/ Spread |
## AuthorAndrew A. (Andy) Jawlik is the author of the book, Statistics from A to Z -- Confusing Concepts Clarified, published by Wiley. ## Archives
September 2017
## Categories |