Categorical Variables are used in ANOMA, ANOVA, with Proportions, and in the Chi-Square Tests for Independence and Goodness of Fit. Categorical Variables are also known as "Nominal" (named) Variables and "Attributes" Variables. The concept can be confusing, because the values of a Categorical Variable are not numbers, but names of categories. The numbers associated with Categorical Variables come from counts of the data values within a named category. Here's how it works:- In this example there are two
__Categorical Variables__, "Gender" and "Ice Cream (flavor)". - The
__values__of the two Categorical Variables are the__names of the categories__for the Variable. For example, the Categorical Variable "Gender" has 2 possible values: "female" and "male". - If we're going to use these Variables in a Chi-Square Test for Independence, for example, we need to have some numbers. The
__numbers are the counts__of the data values in each category. For example, the count of persons whose gender is "female" and whose favorite ice cream flavor is "vanilla" is 25.
0 Comments
Continuing the playlist on Regression, I have uploaded a new video to YouTube:
. It talks about things that are required for all 3 types of Regression covered in the book -- Simple Linear, Multiple Linear, and Simple Nonlinear Regression. Topics include clip levels for R squared, Residuals, establishing Cause and Effect, and the dangers of Extrapolation.Regression Part 3: Analysis Basics See the page of this website for the status of completed and planned videos.videos## Statistics Tip of the Week: How the selection of a value for Alpha specifies the Critical Value11/29/2018 In Hypothesis Testing, before the data is collected, a value for Alpha, the Level of Significance, is selected. The person performing the test selects the value. Most commonly, 5% is selected. Alpha is a Cumulative Probability -- the Probability of a range of values. It is shown as a shaded area under the curve of the Distribution of a Test Statistic, such as
z.If we have Distribution of a Test Statistic and a Cumulative Probability at one or both tails of the curve of the Distribution, software or tables will tell us the value of the Test Statistic which forms the boundary of the Cumulative Probability. In the above concept flow diagram, we show how selecting Alpha = 5% for a one-tailed (right tailed) test results in the Critical Value being 1.645. I earlier uploaded videos on the statistical concepts mentioned above to my YouTube channel: "Statistics from A to Z -- Confusing Concepts Clarified" Continuing the playlist on Regression, I have uploaded a new video to YouTube;
Regression -- Part 2: Simple Linear. See the page of this website for the status of completed and planned videos.videosA Boxplot, also known as Box and Whiskers Plot, is a good way to visually depict Variation in a dataset (e.g., a Sample or Population). And showing several Boxplots vertically is useful for comparing Variation among several datasets. The boxes depict the range within which 50% of the data falls for each dataset. - The bottom of the box identifies the 25th percentile (25% of the data is below)
- The line in the middle is the Median (50th percentile)
- The top of the box is the 75th percentile
- The line segments (the "whiskers") at the top and bottom extend to the highest and lowest values of the dataset. The whiskers are drawn to extend only as far as 1.5 box lengths. (If there are no data points that far out, the whisker ends at the farthest point.) Points beyond 1.5 box lengths are termed "Outliers". Points beyond 3 box lengths are called "Extremes" or "Extreme Outliers".
In this illustration, a higher score is better. Treatment A has the highest individual score, but it has considerable more Variation in results than Treatments B and C. The Medians for Treatments A, B, and C are fairly close. So, we can see at a glance that Treatment D can be eliminated from consideration. Treatment B has the highest Median and is gives very consistent results (small Variation). So, this plot may be all we need to select B as the best treatment.
## Statistics Tip: Sampling with Replacement is required when using the Binomial Distribution10/7/2018 One of the requirements for using the Binomial Distribution is that
each trial must be independent. One consequence of this is that the Sampling must be With Replacement.To illustrate this, let's say we are doing a study in a small lake to determine the Proportion of lake trout. Each trial consists of catching and identifying 1 fish. If it's a lake trout, we count 1. The population of the fish is finite. We don't know this, but let's say it's 100 total fish 70 lake trout and 30 other fish. Each time we catch a fish, we throw it back before catching another fish. This is called Sampling With Replacement. Then, the Proportion of lake trout is remains at 70%. And the Probability for any one trial is 70% for lake trout. If, on the other hand, we keep each fish we catch, then we are Sampling Without Replacement. Let's say that the first 5 fish which we catch (and keep) are lake trout. Then, there are now 95 fish in the lake, of which 65 are lake trout. The percentage of lake trout is now 65/95 =68.4%. This is a change from the original 70%.So, we don't have the same Probability each time of catching a lake trout. Sampling Without Replacement has caused the trials to not be independent. So, we can't use the Binomial Distribution. We must use the Hypergeometric Distribution instead.For more on the Binomial Distribution, see my YouTube video.The concept of ANOVA can be confusing in several aspects. To start with, its name is an acronym for " ANalysis Of VAriance", but it is not used for analyzing Variances. (F and Chi-square tests are used for that.) ANOVA is used for analyzing Means. The internal calculations that it uses to do so involve analyzing Variances -- hence the name.- The 1st column in the following table describes what ANOVA does do.
- The 2nd column says what ANOVA does
__not__do. - The 3rd column tells what to use if we want do what's in the 2nd column.
For more details on ANOVA, I have a
6-video playlist on YouTube.I just uploaded a new video to YouTube: https://youtu.be/gGkRkDBlICUFor the latest status of my videos completed and planned, see the
videos page on this website. |
## AuthorAndrew A. (Andy) Jawlik is the author of the book, Statistics from A to Z -- Confusing Concepts Clarified, published by Wiley. ## Archives
March 2019
## Categories |