The most commonly used statistical tests are "Parametric", that is, they require that one or more Parameters meet certain conditions or "assumptions". Most frequently, the assumption is that the Distribution of the Population or Process is roughly Normal. Roughly equal Variance is also a common assumption.

If these conditions are not met, the Parametric test cannot be used, and a Nonparametric test must be used instead. This table shows the Nonparametric test that can be used in place of several common Parametric tests.

]]>

]]>

So,

**Another way that Degrees of Freedom is described is "The number of independent pieces of information that go into the calculation of a Statistic."** To illustrate, let's say we have a Sample of n = 5 data values: 2, 4, 6, 8, and 10.

When we calculate the Sample Mean, we have 5 independent pieces of information – the five values of the data. They are independent because none of the values are dependent on the values of another. So, for the Mean, df = 5

Sample Mean = (2 + 4 + 6 + 8 + 10) / 5 = 30 / 5 = 6

But, when we calculate the Sample Variance, we use the Mean as well as the 5 data values. The Mean is not an independent piece of information, because is it dependent on the other 5 values.

Also, when we include the Mean, we only have 4 independent pieces of information left. If we know that the Mean is 30, and we have the data values 2, 4, 6, and 8, then we can calculate that the last data value has to be 10. So, 10 no longer brings independent information to the table.

**If we then use that Statistic to calculate another Statistic, it brings its own estimation error into the calculation of the second Statistic. **This error is in addition to the second Statistic's estimation error. This happens in the case of the Sample Variance.

__Example: Sample Variance__

**Numerator for Sample Variance: **

When we calculate the Sample Mean, we have 5 independent pieces of information – the five values of the data. They are independent because none of the values are dependent on the values of another. So, for the Mean, df = 5

Sample Mean = (2 + 4 + 6 + 8 + 10) / 5 = 30 / 5 = 6

But, when we calculate the Sample Variance, we use the Mean as well as the 5 data values. The Mean is not an independent piece of information, because is it dependent on the other 5 values.

Also, when we include the Mean, we only have 4 independent pieces of information left. If we know that the Mean is 30, and we have the data values 2, 4, 6, and 8, then we can calculate that the last data value has to be 10. So, 10 no longer brings independent information to the table.

The numerator of the formula for Sample Variance includes the Sample Mean. It takes each data value (the x's) in the Sample and subtracts from it the Sample Mean. Then it sums all those subtracted values.

So,**the Sample Variance has two sources of error:**

**The Degrees of Freedom is intended to adjust for the additional ****error introduced when one Statistic is used to calculate another.**

We don't need to make this adjustment for the Sample Mean, but we do need to do so for the Sample Variance. We divide by n – 1, instead of n.

]]>So,

**it is an estimate from Sample data****the estimation error from the Sample Mean**

We don't need to make this adjustment for the Sample Mean, but we do need to do so for the Sample Variance. We divide by n – 1, instead of n.

The

- Sample data is used to calculate a value for a Test Statistic, say,
*z.* - This value of
*z*forms the boundary for the area under the curve which represents the Cumulative Probability,*p*. - From this, tables or calculations give us the value of
*p*.

Similarly *α* contains the same information as the Critical Value.

So comparing *p* and the Critical Value is the same as comparing Alpha and the Test Statistic value. But the comparison symbols ( ">" and "<") point in the opposite direction. That's because p and Test Statistic have an inverse relation. A smaller value for *p* means that the Test Statistic value must be larger.

]]>A Dot Plot can be used to picture Variation if the number of data points is relatively small. Each individual point is shown as a dot, and you can show exactly how many go into each bin.

Boxplots, also known as Box and Whiskers Plots can very effectively provide a detailed picture of Variation. In an earlier Statistics Tip, we showed how several Box and Whiskers Plots can enable you to visually choose the most effective of several treatments. Here's an illustration of the anatomy of a Box and Whiskers Plot

In the example above, the IQR box represents the InterQuartile Range, which is a useful measure of Variation. This plot shows us that 50% of the data points (those between the 25th and 75th Percentiles) were within the range of 40 – 60 centimeters. 25% were below 40 and 25% were above 60. The Median, denoted by the vertical line in the box is about 48 cm.

Any data point outside 1.5 box lengths from the box is called an Outlier. Here, the outlier with a value of 2 cm. is shown by a circle. Not shown above, but some plots define an Extreme Outlier as one that is more than 3 box lengths outside the box. Those can be shown by an asterisk

]]>Any data point outside 1.5 box lengths from the box is called an Outlier. Here, the outlier with a value of 2 cm. is shown by a circle. Not shown above, but some plots define an Extreme Outlier as one that is more than 3 box lengths outside the box. Those can be shown by an asterisk

I just uploaded a new video to You Tube: __Margin of Error__. It's part of a playlist on Errors in Statistics.

]]>Both Bar Charts and Histograms use the height of bars (rectangles of the same width) to visually depict data. So, they look similar.

But, they

1.__Separated or contiguous__

2.__Types of data__

__3. How Used__

]]>But, they

- differ in whether the bars are
__separated or__placed together (__contiguous__) - depict different
__types of data__, and - used for different
__purposes__

1.

- Bar Charts: separated

- Histograms: contiguous

2.

- Bar Charts: Counts or Percentages of Nominal (also known as "Categorical") data.
- Histograms: Counts or Percentages or Probabilities of the number of data points within a Range

- Bar Charts are used to display the relative
__Sizes__(which named item has the highest (or lowest) Count - Histograms are used to display the
__Shape__of the Distribution of the data. The data in the illustration above is roughly Normally Distributed.