Blog Archives

Statistics Tip: Use Boxplots to compare Variation in several datasets.

10/31/2018

A Boxplot, also known as Box and Whiskers Plot, is a good way to visually depict Variation in a dataset (e.g., a Sample or Population). And showing several Boxplots vertically is useful for comparing Variation among several datasets.

The boxes depict the range within which 50% of the data falls for each dataset.

The bottom of the box identifies the 25th percentile (25% of the data is below)
The line in the middle is the Median (50th percentile)
The top of the box is the 75th percentile
The line segments (the "whiskers") at the top and bottom extend to the highest and lowest values of the dataset. The whiskers are drawn to extend only as far as 1.5 box lengths. (If there are no data points that far out, the whisker ends at the farthest point.) Points beyond 1.5 box lengths are termed "Outliers". Points beyond 3 box lengths are called "Extremes" or "Extreme Outliers".

In this illustration, a higher score is better. Treatment A has the highest individual score, but it has considerable more Variation in results than Treatments B and C. The Medians for Treatments A, B, and C are fairly close. So, we can see at a glance that Treatment D can be eliminated from consideration. Treatment B has the highest Median and is gives very consistent results (small Variation). So, this plot may be all we need to select B as the best treatment.

0 Comments

You are not alone if you are confused by statistics -- #22

10/22/2018

0 Comments

Statistics Tip: Sampling with Replacement is required when using the Binomial Distribution

10/7/2018

0 Comments

One of the requirements for using the Binomial Distribution is that each trial must be independent. One consequence of this is that the Sampling must be With Replacement.

To illustrate this, let's say we are doing a study in a small lake to determine the Proportion of lake trout. Each trial consists of catching and identifying 1 fish. If it's a lake trout, we count 1. The population of the fish is finite. We don't know this, but let's say it's 100 total fish 70 lake trout and 30 other fish.

Each time we catch a fish, we throw it back before catching another fish. This is called Sampling With Replacement. Then, the Proportion of lake trout is remains at 70%. And the Probability for any one trial is 70% for lake trout.

If, on the other hand, we keep each fish we catch, then we are Sampling Without Replacement. Let's say that the first 5 fish which we catch (and keep) are lake trout. Then, there are now 95 fish in the lake, of which 65 are lake trout. The percentage of lake trout is now 65/95 =68.4%. This is a change from the original 70%.

So, we don't have the same Probability each time of catching a lake trout. Sampling Without Replacement has caused the trials to not be independent. So, we can't use the Binomial Distribution. We must use the Hypergeometric Distribution instead.

For more on the Binomial Distribution, see my YouTube video.

0 Comments

New Video: Regression -- Part 1: Sums of Squares

10/1/2018

0 Comments

I just uploaded a new video. It's the third in a playlist on Regression. To see the current status of my completed and planned videos, please visit the Videos page on this website.

0 Comments

Statistics Tip: Use Boxplots to compare Variation in several datasets.

You are not alone if you are confused by statistics -- #22

Statistics Tip: Sampling with Replacement is required when using the Binomial Distribution

New Video: Regression -- Part 1: Sums of Squares

Author

Archives

Categories