Statistics Tip of the Week: Nonparametric Methods can use Signs, Ranks, Signed Ranks, etc.

2/24/2018

In our Tip of the Week for 12/14/2017, we said that Nonparametric Methods -- like Wilicoxon, Mann-Whitney, and Kruskal-Wallis -- can work with data that Parametric Methods -- like t-tests and ANOVA -- cannot. Nonparametric methods are often called “distribution-free,” because they are free of any assumptions about the source Distribution(s).

They can do this because they convert the data into things like Signs, Ranks, Signed Ranks, and Ranks Sums. Here's how that works:

Signs
We’ll be comparing Sample data to a value we specify. It could be a target, a historical value, an industry standard, etc. Let’s say that the historical Median time to complete an operation in an industrial process has been 30 seconds. We collect a Sample of 10 time measurements: 28, 31, 30, 33, 32, 28, 30, 31, 27, 32

If a time is less than the Median of 30 seconds, we give it a negative sign.
If it is 30 seconds, we give it a zero.
If it is greater than 30 seconds, we give it a plus sign.

Count of plusses: 5
Count of minuses: 3

We can use the Counts of these signs – instead of the original data –
in a Nonparametric method called the Sign Test.

Ranks
Let’s take that same Sample of data, and order it from low to high. Next, assign a Rank from low to high. For ties, split the difference between the values tied. For example, there are two 28’s. These occupy two Ranks after 1 (a 2 and a 3), so we give them both a 2.5. The next Rank would be a 4, but there’s another tie, so we mark the next two as 4.5’s.

Signed Ranks
Signed Ranks, as you might guess, combine the concepts of Signs and Ranks. But there is a change in how Signs are assigned, and one step uses absolute values, so we’ll use a different example with some negative numbers.

Let’s say we are doing an analysis of the effect of a training program on employee productivity. If we were doing a Parametric test, we’d use the Paired t-test (aka Dependent Samples t-test.) We count the number of transactions that they process in an hour. For each employee, we subtract their Before Training number from their After Training number. The information we are capturing is the difference.

Instead of plus and minus signs, we’ll use +1 and 0. We compare the data values to a specified value, as we did in our example of the historical Median of 30. Each Sample data value is their After production number minus their Before number.
We’ll be testing the Null Hypothesis that there is zero difference, so the specified value is zero.

Step 1: Sign: For each data value, assign a Sign:

if the data value is greater than the specified value (0 in this example), then the Sign = +1
if it’s less than or equal to the specified value, then:Sign=0

Step 2: Calculate the Absolute Values

Step 3: Rank the Absolute Values to produce the Absolute Ranks

Step 4: Signed Rank: Multiply the Sign times the Absolute Ranks

Signed Rank tests are the NP counterpart to the Dependent Samples (aka Paired Samples) t-test.

Rank Sum tests are the NP counterpart of the the Independent Samples (aka 2-Samples) t-test. Since the Rank Sum method will require a somewhat lengthy description and this Tip is getting a bit long., We'll save Rank Sums for another Statistics Tip of the Week.

2 Comments

Alan Hutson link

3/2/2018 10:30:48 am

The reason rank based tests such as the Wilcoxon, Mann-Whitney, and Kruskal-Wallis (BTW The Wilcoxon rank-sum and Mann-Whitney tests are the same test) “ can work with data like t-tests and ANOVA – cannot” has nothing to do with converting the data to ranks. The type I error control is maintained under the assumption of exchangeability since all of the test above are permutation tests. In fact one can also perform permutation tests on means such as permutation t-test and also have an exact level alpha test under the assumption of exchangeability without any parametric assumptions. Furthermore, even under non-normality a classic two-sample t-test controls the type I error very well under exchangeability assumptions since it approximates a t-test.

The more appropriate rational for using rank based tests over parametric tests is that they are oftentimes more statistically powerful, e.g. under non-normality the Wilcoxon rank-sum test is more powerful than the two-sample t-test in virtually all scenarios under the specific case of location shifts assuming constant scale. One may also make a case for robustness against outliers. In general, a Wilcoxon rank-sum test and a t-test are testing two distinct null hypotheses. The former is that two distribution functions are equivalent and the latter is that two distribution functions have different means, i.e. the Wilcoxon rank-sum test may be significant if two distributions have the same means but different standard deviations. This is an important distinction to make as well

Finally, extra caution should be taken in the one-sample setting, where for example the Wilcoxon sign-rank test is really semi-parametric. There is an implicitly assumption of symmetry under the null hypothesis in order to have a valid test that appropriately controls the type I error for both the sign tests and Wilcoxon sign-rank test. You completely fail to point out this critical feature of the test, which is often the pitfall of many data analysts.

Bottom line: Converting data to ranks does not magically protect one against understanding what assumptions need to be met and what hypotheses are actually being tested. If data are non-exchangeable under the null the Wilcoxon signed-rank test can have inflated type I error if one is testing about a shift change in location as the null hypothesis.

Ana Carol link

4/12/2023 04:33:37 pm

Excellent post on nonparametric methods! It's great to see a clear explanation of how signs, ranks, and signed ranks can be used to analyze data in nonparametric tests. The visuals and examples make it easy to understand. Well done!

Statistics Tip of the Week: Nonparametric Methods can use Signs, Ranks, Signed Ranks, etc.

Leave a Reply.

Author

Archives

Categories