In earlier blog posts, we've seen how different types of charts can be used in different ways to increase our understanding of the data and to communicate that understanding to others:
Here's another example: a Line Chart uses lines to connect points that have adjacent values on the horizontal axis. It is often used to illustrate trends, with the horizontal axis representing time.
It is also used to graph cause-and-effect, in which the x Variable (horizontal axis) is the Factor which causes the Effect in the y Variable (vertical axis). In the chart below, an increase in the Factor Variable, water temperature, causes an increase in the Effect Variable, cleanliness. This is used in Regression analysis and in the Designed Experiments which are conducted to test a Regression Model.
The following chart combines two line charts into one. It has the same x and y Variables as the previous chart, but it adds a second Factor (x) Variable, Detergent. So, there are two lines, connecting two sets of data points. In 2-Way ANOVA, crossing lines indicate that there is an Interaction between the two Factors. In this case, an increase in temperature has the opposite effect for the two detergent types – it makes Detergent #1 do better, and it makes Detergent #2 do worse. If the lines were parallel or did not cross, then there would be no Interaction.
In a similar fashion, a Line Chart can help differentiate between Observed and Expected Frequencies in a Chi-Square test for Goodness of Fit.
Reproduced by permission of John Wiley and Sons, Inc.
from the book, Statistics from A to Z -- Confusing Concepts Clarified
A "Statistic" is a measure of a property of a Sample, for example, the Sample Mean or Sample Standard Deviation. The corresponding term for a Population or Process is "Parameter".
The most commonly used statistical tests are "Parametric", that is, they require that one or more Parameters meet certain conditions or "assumptions". Most frequently, the assumption is that the Distribution of the Population or Process is roughly Normal. Roughly equal Variance is also a common assumption.
If these conditions are not met, the Parametric test cannot be used, and a Nonparametric test must be used instead. This table shows the Nonparametric test that can be used in place of several common Parametric tests.
#11 You are not alone if you are confused by #statistics: Statistics software creator struggled with statistics in college.
Jay Arthur is the author of the books, Lean Six Sigma Demystified and Lean Six Sigma for Hospitals, as well as the creator of the QI Macros software for statistical process control.
He says, "In college, I struggled with statistics. Professors seemed to want to teach us the 'what' and 'how' of statistics, but not the 'why.' They used 'not' language to describe results: 'We cannot reject the null hypothesis.' People struggle with understanding the meaning of sentences containing the word 'not'. I confess, I am one of them."
A nuclear fusion physicist at MIT told me he borrowed my book from the MIT Library -- which had 2 copies. He was very complimentary.
For a Process output, y, which is a function of several Factors (x's), that is, for
the Design of Experiments (DOE) discipline can design the most efficient and
effective experiments to determine the values of the x's which produce the optimal value for -- or the minimal Variation in -- the Response Variable, y.
DOE is active and controlling. (This can be done with Processes, but usually not with Populations). DOE doesn’t collect or measure existing data with pre-existing values for y and the x’s. DOE specifies Combinations of values for inputs (Factors) and then measures the resulting values of the outputs (Responses). This is the Design of the Experiment.
Statistical software packages perform DOE calculations which specify the elements which make up the Design:
This is the 6th and final video in a playlist on ANOVA and related concepts. youtu.be/qcXzfVrj54E
ANOM does something ANOVA cannot do. It not only tells us whether there is a statistically significant difference among several Means. It tells us which Means are different.
For a current status of available and planned videos, see the "Videos" page on this website.
Statistics Tip of the Week: In Simple Nonlinear Regression, use a polynomial if the curve changes direction.
The "Simple" in "Simple Nonlinear" means that there is only one x Variable in the formula of the formula e.g. y = f(x). The "nonlinear" means that we have determined that a straight line will not fit the data. We need to use some kind of curve -- e.g. Exponential, Logarithmic, Power, Polynomial, or some other type.
A Polynomial has a formula
Note that there is just one x Variable, but it is raised to various powers, starting with the power of 2. (If there were only a power of 1, the equation would be that of a straight line.) The b's are Coefficients and the a is an Intercept.
A "2nd degree", also known as "2nd order" or "Quadratic", Polynomial is of the form:
A 2nd order Polynomial has 1 change in direction. As x increases, y increases and then decreases (or y decreases and then increases). Two examples are pictured above. These shapes are Parabolas.
A "3rd degree", aka "3rd order" aka Cubic" Polynomial has an x cubed term and changes direction twice.
A kth degree Polynomial has k – 1 changes in direction.
Simpler is better. It is usually not necessary to go beyond 3 orders. Larger orders are harder to work with. Also, they may be too closely associated with the idiosyncracies of the data provided in a particular Sample, and they may not be generally applicable to data in other Samples from the same Population or Process.
Reproduced by permission of John Wiley and Sons, Inc
from the book, Statistics from A to Z -- Confusing Concepts Clarified
You are not alone if you are confused by statistics #10: Ten statistical terms designed to confuse non-statisticians.
From the Minitab Blog
10 Statistical Terms Designed to Confuse Non-Statisticians
A Statistic is a numerical property of a Sample, for example, the Sample Mean or Sample Variance. A Statistic is an estimate of the corresponding property (“Parameter”) in the Population or Process from which the Sample was drawn. Being an estimate, it will likely not have the exact same value as its corresponding population Parameter. The difference is the error in the estimation.
So, if we calculate a Statistic entirely from data values, there is a certain amount of error. For example, the Sample Mean is calculated entirely from the values of the Sample data. It is the sum of all the data values in the Sample divided by the number, n, of items in the Sample. There is one source of error in its formula – the fact that it is an estimate because it does not use all the data in the Population or Process.
If we then use that Statistic to calculate another Statistic, it brings its own estimation error into the calculation of the second Statistic. This error is in addition to the second Statistic’s estimation error. This happens in the case of the Sample Variance.
The numerator of the formula for Sample Variance includes the Sample Mean. It takes each data value (the x’s) in the Sample and subtracts from it the Sample Mean, squares it. Then it sums all those subtracted values.
So, the Sample Variance has two sources of error:
That is why the Degrees of Freedom for the Chi Square Test for the Variance is n - 1. Subtracting 1 from the n in the denominator results in a larger value for the Variance. This addresses the two sources of error.
Here are the formulas for Degrees of Freedom for some Statistics and tests:
Andrew A. (Andy) Jawlik is the author of the book, Statistics from A to Z -- Confusing Concepts Clarified, published by Wiley.