STATISTICS FROM A TO Z<br />-- CONFUSING CONCEPTS CLARIFIED
  • Home
    • Why This Book Is Needed
    • Articles List, Additional Concepts
    • Examples: 1-Page Summaries
    • Examples: Concept Flow Diagram
    • Examples: Compare and Contrast Tables
    • Examples: Cartoons
    • Example: Which to Use When Article
  • Buy
  • Blog
  • Sample Articles
  • Videos
  • Author
  • Communicate
  • Files
  • Errata

Statistics Tip: Don't extrapolate your conclusions beyond the range of your data.

3/13/2019

1 Comment

 
​In Regression, we attempt to fit a line or curve to the data. Let's say we're doing Simple Linear Regression in which we are trying to fit a straight line to a set of (x,y) data.

We test a number of subjects with dosages from 0 to 3 pills. And we find a straight line relationship,  y = 3x, between the number of pills (x) and a measure of health  of the subjects. So, we can say this.
Picture
But we cannot make a statement like the following:
Picture
​This is called extrapolating  the conclusions of your Regression Model beyond the range of the data used to create it. There is no mathematical basis for doing that, and it can have negative consequences, as this little cartoon from my book illustrates.
Picture
​In the graphs below, the dots are data points. In the graph on the left, it is clear that there is a linear correlation between the drug dosage (x) and the health outcome (y) for the range we tested, 0 to 3 pills. And we can interpolate between the measured points. For example, we might reasonably expect that 1.5 pills would yield a health outcome halfway between that of 1 pill and 2 pills.
Picture
 For more on this and other aspects of Regression, you can see the YouTube videos in my playlist on Regression.  (See my channel: Statistics from A to Z - Confusing Concepts Clarified.
1 Comment

New Video: Residuals

3/10/2019

0 Comments

 
Picture
This is the 9th and final video in my channel on Regression. Residuals represent the error in a Regression Model. That is, Residuals represent the Variation in the outcome Variable y, which is not explained by the Regression Model. Residuals must be analyze several ways to ensure that they are random, and that they do no represent the Variation caused by some unidentified x-factor.

See the videos page in this website for a listing of available and planned videos.

0 Comments

Statistics Tip of the Week : the Binomial Shapeshifter

2/27/2019

0 Comments

 
The Binomial Distribution is used with Count data. It displays the Probabilities of Count data from Binomial Experiments. In a Binomial Experiment,
  • There are a fixed number of trials (e.g. coin flips)
  • Each trial can have only 1 of 2 outcomes.
  • The Probability of a given outcome is the same for each trial.
  • Each trial is Independent of the others

There are many Binomial Distributions. Each one is defined by a pair of  values for two Parameters, n and p. n is the number of trials, and p is the Probability of each trial.

The graphs below show the effect of varying n, while keeping the Probability the same at 50%. The Distribution retains its shape as n varies. But obviously, the Mean gets larger. 

Picture
​The effect of varying the Probability, p, is more dramatic.
Picture
For small values of p, the bulk of the Distribution is heavier on the left. However, as described in my post of July 25, 2018, statistics describes this as being skewed to the right, that is, having a positive skew. (The skew is in the direction of the long tail.) For large values of p, the skew is to the left, because the bulk of the Distribution is on the right. 
0 Comments

New Video: Simple Nonlinear Regression

2/17/2019

0 Comments

 
New video: Simple Nonlinear Regression.
Picture
This is the 7th in a playlist on Regression. For a complete list of my available and planned videos, please see the Videos page on this website.
0 Comments

You are not alone if you are confused by Statistics -- #22.

2/13/2019

0 Comments

 
Picture
0 Comments

Statistics Tip:  A comparison of various measures of Variation

1/30/2019

0 Comments

 
Variation is also known as "Variability", "Dispersion", "Spread", and "Scatter". (5 names for one thing is one more example why statistics is confusing.) Variation is 1 of 3 major categories of measures describing a Distribution or data set. The others are Center (aka "Central Tendency") with measures like Mean, Mode, and Median and Shape (with measures like Skew and Kurtosis). Variation measures how "spread out" the data is.
Picture
​There are a number of different measures of Variation. This compare-and-contrast table shows the relative merits of each.
Picture
  • The Range is probably the least useful in statistics. It just tells you the highest and lowest values of a data set, and nothing about what's in between.
  • The Interquartile Range (IQR) can be quite useful for visualizing the distribution of the data and for comparing several data sets -- as described in a recent post on this blog.
  • Variance is the square of the Standard Deviation, and it is used as an interim step in the calculation of the latter. This squaring overly emphasizes the effects very high or very low values. Another drawback is that it is in units of the data squared (e.g. square kilograms, which can be meaningless). There is a Chi-Square Test for the Variance, and Variances are used in F tests and the calculations in ANOVA.
  • The Mean Absolute Deviation is the average (unsquared) distance of the data points from the Mean. It is used when it is desirable to avoid emphasizing the effects of high and low values 
  • The Standard Deviation, being the square root of the Variance, does not overly emphasize the high and low values as the Variance does. Another major benefit is that it is in the same units as the data.
0 Comments

Statistics Tip: the Alpha and Margin of Error Seesaw

1/3/2019

0 Comments

 
Picture
Alpha is the Significance Level of a statistical test. We select a value for Alpha based on the level of Confidence we want that the test will avoid a False Positive (aka Alpha aka Type I) Error.  In the diagrams below, Alpha is split in half and shown as shaded areas under the right and left tails of the Distribution curve. This is for a 2-tailed, aka 2-sided test.
Picture
​In the left graph above, we have selected the common value of 5% for Alpha. A Critical Value is the point on the horizontal axis where the shaded area ends. The Margin of Error (MOE) is half the distance between the two Critical Values.

A Critical Value is a value on the horizontal axis which forms the boundary of one of the  shaded areas. And the Margin of Error is half the distance between the Critical Values.  

If we want to make Alpha even smaller, the distance between Critical Values would get even larger, resulting in a larger Margin of Error.

The right diagram shows that if we want to make the MOE smaller, the price would be larger Alpha. This illustrates the Alpha - MOE see-saw effect. But what if we wanted a smaller MOE without making Alpha larger? Is that possible? It is -- by increasing n, the Sample Size. (It should be noted that, after a certain point, continuing to increase n yields diminishing returns. So, it's not a universal cure for these errors.)

If you'd like to learn more about Alpha, I have 2 YouTube videos which may be of interest:
  • Alpha, α, the Significance Level
  • Alpha, p, Critical Value and Test Statistic -- How They Work Together
0 Comments

New Video: Regression -- Part 4: Multiple Linear

12/29/2018

0 Comments

 
Continuing the playlist on Regression, I have uploaded a new video to YouTube: Regression -- Part 4: Multiple Linear. There are 5 Keys to Understanding, here is the 3rd. See the Videos pages of this website for more info on available and planned videos. 
Picture
0 Comments

You are not alone if you are confused by Statistics -- #23

12/26/2018

0 Comments

 
Picture
0 Comments

Statistics Tip: the values of a Categorical Variable are names of categories.

12/13/2018

0 Comments

 
​Categorical Variables are used in ANOMA, ANOVA, with Proportions, and in the Chi-Square Tests for Independence and Goodness of Fit. Categorical Variables are also known as "Nominal" (named) Variables and "Attributes" Variables.

​The concept can be confusing, because the values of a Categorical Variable are not numbers, but names of categories. The numbers associated with Categorical Variables come from counts of the data values within a named category.  Here's how it works:
Picture
  • In this example there are two Categorical Variables,  "Gender" and "Ice Cream (flavor)". 
  • The values of the two Categorical Variables are the names of the categoriesfor the Variable.  For example, the Categorical Variable "Gender" has 2 possible values: "female" and "male". 
  • If we're going to use these Variables in a Chi-Square Test for Independence, for example, we need to have some numbers. The numbers are the counts of the data values in each category.  For example, the count of persons whose gender is "female" and whose favorite ice cream flavor is "vanilla" is 25.
0 Comments
<<Previous
Forward>>

    Author

    Andrew A. (Andy) Jawlik is the author of the book, Statistics from A to Z -- Confusing Concepts Clarified, published by Wiley.

    Archives

    March 2021
    December 2020
    November 2020
    October 2020
    September 2020
    August 2020
    May 2020
    March 2020
    February 2020
    January 2020
    December 2019
    November 2019
    October 2019
    September 2019
    July 2019
    June 2019
    May 2019
    April 2019
    March 2019
    February 2019
    January 2019
    December 2018
    November 2018
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    May 2018
    April 2018
    March 2018
    February 2018
    January 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017
    December 2016
    November 2016
    October 2016
    September 2016
    August 2016

    Categories

    All
    New Video
    Stats Tip Of The Week
    You Are Not Alone

    RSS Feed

  • Home
    • Why This Book Is Needed
    • Articles List, Additional Concepts
    • Examples: 1-Page Summaries
    • Examples: Concept Flow Diagram
    • Examples: Compare and Contrast Tables
    • Examples: Cartoons
    • Example: Which to Use When Article
  • Buy
  • Blog
  • Sample Articles
  • Videos
  • Author
  • Communicate
  • Files
  • Errata