# Step 5

# Data Analysis 1

# Descriptive Statistics

# Imagine you have collected your data and you now know what type of data you have. You can now use descriptive statistics. Descriptive statistics allow you to describe the data you have collected in a simplified numerical way. For example, imagine you have collected 1,000 results from 1,000 different participants, if you present 1,000 individual results the data presented would be too complicated to see similarities, trends etc. Therefore, using a descriptive statistic you can show the 1,000 results as a few numbers, for example the average result of all 1,000 participants represented as one number. You can choose between MEAN WITH STANDARD DEVIATION, MODE OR MEDIAN.

## MEAN

Mean involves adding all the results together and then dividing by the number of results. For example, the mean of 3, 2 and 7 is 3+2+7= 12, then 12 divided by 3= 4.

Mean is a useful descriptive statistic because it includes all the values in the data you have collected. This is not true for median and mode.

For example, a research project that involves comparing 20 participants speed before taking a supplement and after taking a supplement. In this research the mean would describe all of the participant running times before as one number (eg. 5.3 seconds) and all of their after supplement running times as one number (eg. 5.1 seconds)

Mean can work with ratio, interval and ordinal data, as long as ordinal is numbers.

## STANDARD DEVIATION

Standard deviation describes how spread out the result are. It is used in addition to mean and will tell you how far each collected result is away from the mean. For example, the mean for [3, 3, 4, 5, 7, 5] is 4.4 and the standard deviation is 1.5. This tells you that some participants scored (4.4 + 1.5) near 7 and some scored (4.4-1.5) near 3. This shows the spread of the results. If the standard deviation is as high as the mean then it describes that the spread is large and, possibly, that the mean is not an accurate description of the results.

For example, a research project that involves comparing 20 participants speed before taking a supplement and after taking a supplement. If the mean for before was 5.3 seconds and the standard deviation was 0.5 seconds then there would be a low spread of running times, most people ran a similar speed. If the mean was 5.3 seconds but the standard deviation was 4.2 seconds then there was a high spread of numbers, so some people were very, very quick and some were very, very slow, so the results may have some errors!

In this research the mean would describe all of the participant running times before as one number (eg. 5.3 seconds) and all of their after supplement running times as one number (eg. 5.1 seconds)

## MODE

Mode involves identifying the most common result collected in your data collection. For example, the mode of [4, 3, 4, 3, 2, 2, 4] is 4 because it occurs the most times.

Mode is a useful descriptive statistic if your data collected is not presented as numbers, for example favourite sport.

For example, a survey that asks 100 participants what sports they play, the results would be a collection of sportsâ€™ names. You could then use the mode to see the most frequently occurring sport played.

Mode can work with all types of data.

## MEDIAN

Median requires you to arrange all of the data collected in number order and then selecting the middle number. For example, the median of 4, 2 and 7 is 4 because when the numbers are put in order the number 4 is in the middle.

Median is a useful descriptive statistic for describing the difference between the lower and higher results and describing the distribution (spread) of results.

For example, a research project wants to see the number of headed goals per football team over 4 different division. In this project median would describe the upper range of headed goals and the lower range of headed goals. For example, if the median number of headed goals was 15, then a score lower than 15 you can assume is a lower score and a score higher than 15 a higher score.

Median can work with ratio and interval data.