Sample Variance

    Introduction

    Sample variance is a way to figure out how much the values in a sample differ from the average. A sample is just a smaller set of data that represents a larger group. The sample variance tells us how spread out the values are compared to the average of the sample. It's also called estimated variance because it gives us an estimate of how much the data might vary. 

    Since data can come in different forms—grouped and ungrouped—there are different formulas to find the sample variance. If you take the square root of the sample variance, you get something called the sample standard deviation, which is another way to measure how spread out the data is. In this article, we'll talk about sample variance and illustrate it with some examples to make it clearer.

     

    What Is Sample Variance?

    Sample variance helps us understand variability in a dataset. Imagine you have a big group of data points—this is called the population. But sometimes, dealing with the whole population can be tricky, especially if it's huge. So instead, we take a smaller group, called a sample, to represent the larger population. The sample variance is a way to calculate how much the numbers in this smaller group vary from the group’s average. It gives us a glimpse into how the entire population might behave based on this sample.

     

    Defining Sample Variance

    Sample variance is a measure of the dispersion of the values in a dataset around their sample mean. It's calculated by looking at how far each data point is from the mean (or average), squaring those differences, and then finding the average of all those squared differences. This gives us a way to see how much the data points deviate from the average value. It's handy for understanding how spread out the data is.

    The variance for a data set is denoted by the Greek letter sigma squared written as ` \sigma^2 `. The formula for sample variance differs with the type of data we are handling - grouped data or grouped data.

    Below is the formula for variance of sample for ungrouped data.

     

    For grouped data, we use the following formula.

     

    How to Calculate Sample Variance?

    Calculating sample variance involves several steps, depending on the type of data you have. There are two types of data - grouped and ungrouped. When data is given as distinct data points it is called ungrouped data. When data is sorted into groups, categories, classes, etc, and given in tabular form, it is known as grouped data. 

    Let’s use the formula based on the data we are given to find the sample variance.

    Calculating sample variance of ungrouped data:

    Example: Calculate the sample variance of the dataset `{3, 8, 12}`.

    Solution: 

    Step `1`: First, find the mean of the dataset. This means adding up all the numbers and then dividing by the total count of numbers. For example, for `{3, 8, 12}`, the mean would be `(3 + 8 + 12) / 3 = 7`.

    Step `2`: Next, subtract the mean from each data point. This gives us deviations from the mean. For our example, it would be `(3 - 7), (8 - 7),` and `(12 - 7)`.

    Step `3`: After that, square each of these deviations. So, for our example, it would be `(3 - 7)^2 = 16, (8 - 7)^2 = 1,` and `(12 - 7)^2 = 25`.

    Step `4`: Now, add up all the squared deviations obtained in the previous step. In our example, it would be `16 + 1 + 25 = 42`.

    Step `5`: Finally, to find the sample variance, divide the sum of squared deviations by one less than the total number of observations. In our case, that's `42 / (3 - 1) = 21`. So, for this dataset, the sample variance is `21`.

     

    Calculating sample variance of grouped data:

    For grouped data, it's ` \frac{\sum_{i=1}^{n} f (m_i - \bar{x})^2}{n - 1} `, where ` f ` is the frequency, ` m_i ` is the midpoint of each group, and ` \bar{x} ` is the mean.

    Example: The table below shows the scores from Mr. Beckham’s class. Calculate the variance.

    Solution:

    Given:

    • ` n ` (total number of observations) `= 5 + 6 + 7 + 8 + 4 = 30 `
    • ` f_i ` (frequency) for each interval
    • ` m_i ` (midpoint of each interval)

     

    To calculate variance for grouped data, we use the formula for variance of sample:

    `\sigma^2 = \frac{\sum_{i=1}^{n}f_i(m_i - \bar{x})^2}{n-1}`

    Where:

    • ` \bar{x} ` (sample mean) `= \frac{\sum_{i=1}^{n}f_i \cdot m_i}{n} `

    First, we need to find the sample mean ` \bar{x} `:

    `\bar{x} = \frac{5 \times 23 + 6 \times 20 + 7 \times 17 + 8 \times 14 + 4 \times 11}{30}`

    `= \frac{115 + 120 + 119 + 112 + 44}{30}`

    `= \frac{510}{30}`

    `= 17`

    Now, let's calculate ` (m_i - \bar{x})^2 ` for each interval and then multiply it by the frequency ` f_i `:

    Now, sum up ` f_i \cdot (m_i - \bar{x})^2 `:

    \( \sum_{i=1}^{n}f_i \cdot (m_i - \bar{x})^2 = 180 + 54 + 0 + 72 + 144 \)

    \( = 450 \)

    Finally, plug the values into the formula for variance:

    `\sigma^2 = \frac{450}{30 - 1}`

    `= \frac{450}{29}`

    \( \approx 15.51 \)

    The variance for the test score is approximately `15.52`.

     

    Sample Variance vs Population Variance

    Sample variance and population variance both help us understand how spread out the numbers in a dataset are compared to the average. However, there are some key differences between them.

    In simple terms, sample variance is used when dealing with just a portion of the data, while population variance is used when we have access to all the available data. The formula for calculating each reflects this difference.

     

    Applications of Sample Variance

    • Sample variance helps assess product consistency and quality, aiding manufacturers in identifying deviations from standards and taking corrective actions.
       
    • In finance, sample variance measures investment return volatility, enabling investors to gauge risk levels and make informed decisions about asset allocation.
       
    • Sample variance aids in analyzing variations in patient outcomes and treatment effectiveness, facilitating personalized healthcare interventions.
       
    • Sample variance is used to study fluctuations in environmental variables like temperature and rainfall, aiding climate change research and ecosystem management.
       
    • Sample variance assists in analyzing consumer preferences and behavior, guiding marketers in developing targeted strategies to meet customer demands effectively.

     

    Solved Examples

    Example `1`: A student collects data on the number of hours spent studying for an exam over a week. The data collected is as follows: `3, 4, 5, 6, 7`. Calculate the sample variance.

    Solution:  

    First, find the mean of the data set:  

    Mean `=  \frac{3 + 4 + 5 + 6 + 7}{5} = \frac{25}{5} = 5 `  

    Next, subtract the mean from each data point and square the result:  

    `(3 - 5)^2 = 4`

    `(4 - 5)^2 = 1`  

    `(5 - 5)^2 = 0`  

    `(6 - 5)^2 = 1`  

    `(7 - 5)^2 = 4`  

    Now, sum up these squared differences:  

    ` 4 + 1 + 0 + 1 + 4 = 10 `  

    Finally, divide the sum by one less than the total number of observations:  

    Sample Variance `=  \frac{10}{5 - 1} = \frac{10}{4} = 2.5 `  

    So, the sample variance is `2.5`.

     

    Example `2`: A researcher records the temperatures (in degrees Celsius) in a city for five consecutive days: `20, 22, 21, 24, 23`. Find the sample variance.

    Solution:  

    First, calculate the mean of the data set:  

    Mean `= \frac{20 + 22 + 21 + 24 + 23}{5} = \frac{110}{5} = 22 `  

    Next, find the squared differences from the mean:  

    `(20 - 22)^2 = 4` 

    `(22 - 22)^2 = 0`  

    `(21 - 22)^2 = 1`  

    `(24 - 22)^2 = 4`  

    `(23 - 22)^2 = 1`  

    Now, sum up the squared differences:  

    ` 4 + 0 + 1 + 4 + 1 = 10 `  

    Finally, divide the sum by one less than the total number of observations:  

    Sample Variance `= \frac{10}{5 - 1} = \frac{10}{4} = 2.5 `  

    Thus, the sample variance is `2.5`.

     

    Example `3`: A biologist records the number of fish in a pond over five days: `10, 15, 12, 17, 14`. Determine the sample variance.

    Solution:  

    First, compute the mean of the dataset:  

    Mean `= \frac{10 + 15 + 12 + 17 + 14}{5} = \frac{68}{5} = 13.6 `  

    Next, find the squared differences from the mean:  

    `(10 - 13.6)^2 = 14.44` 

    `(15 - 13.6)^2 = 1.96`  

    `(12 - 13.6)^2 = 2.56`  

    `(17 - 13.6)^2 = 11.56` 

    `(14 - 13.6)^2 = 0.16`  

    Now, sum up the squared differences:  

    `14.44 + 1.96 + 2.56 + 11.56 + 0.16 = 30.68`  

    Finally, divide the sum by one less than the total number of observations:  

    Sample Variance `= \frac{30.68}{5 - 1} = \frac{30.68}{4} = 7.67 `  

    Therefore, the sample variance is approximately `7.67`.

     

    Example `4`: A company records the number of customer calls received in a week: `120, 140, 110, 130, 125`. Find the sample variance.

    Solution:  

    First, calculate the mean of the dataset:  

    Mean `= \frac{120 + 140 + 110 + 130 + 125}{5} = \frac{625}{5} = 125 `  

    Next, find the squared differences from the mean:  

    `(120 - 125)^2 = 25`  

    `(140 - 125)^2 = 225`  

    `(110 - 125)^2 = 225`  

    `(130 - 125)^2 = 25`  

    `(125 - 125)^2 = 0` 

    Now, sum up the squared differences:  

    `25 + 225 + 225 + 25 + 0 = 500`  

    Finally, divide the sum by one less than the total number of observations:  

    Sample Variance `= \frac{500}{5 - 1} = \frac{500}{4} = 125`  

    Thus, the sample variance is `125`.

     

    Example `5`: A survey collects data on the monthly incomes (in thousands of dollars) of employees in a company. The incomes are grouped into intervals as follows.
    Calculate the sample variance for the monthly incomes.

    Solution:

    Given:

    • ` n ` (total number of observations) `= 5 + 8 + 12 + 10 + 6 = 41 `
    • ` f_i ` (frequency) for each interval
    • ` m_i ` (midpoint of each interval)

    To calculate sample variance for grouped data, we use the formula:

    `s^2 = \frac{\sum_{i=1}^{n}f_i(m_i - \bar{x})^2}{n-1}`

    Where:

    • ` \bar{x} ` (sample mean) `= \frac{\sum_{i=1}^{n}f_i \cdot m_i}{n} `

    First, we need to find the sample mean ` \bar{x} `:

    `\bar{x} = \frac{5 \times 22 + 8 \times 27 + 12 \times 32 + 10 \times 37 + 6 \times 42}{41}`

    `1= \frac{110 + 216 + 384 + 370 + 252}{41}`

    `1= \frac{1332}{41}`

    \( \approx 33.3 \)

    Now, let's calculate ` (m_i - \bar{x})^2 ` for each interval and then multiply it by the frequency ` f_i `:

    Now, sum up ` f_i \cdot (m_i - \bar{x})^2 `:

    \( \sum_{i=1}^{n}f_i \cdot (m_i - \bar{x})^2 = 638.45 + 317.52 + 20.28 + 136.90 + 454.14 \)

    \( = 1567.29 \)

    Finally, plug the values into the formula for sample variance:

    `s^2 = \frac{1567.29}{41 - 1}`

    `= \frac{1567.29}{40}`

    \( \approx 39.18 \)

    The sample variance for the monthly incomes is approximately ` $39.18 `.

     

    Practice Problems  

    Q`1`. A student records the scores of his last five math quizzes: `80, 85, 90, 85, 95`. Calculate the sample variance for the quiz scores.

    1. `25`  
    2. `32.5`  
    3. `16.5`  
    4. `20`  

    Answer: b

     

    Q`2`. A researcher measures the temperatures (in degrees Celsius) of five different cities: `25, 30, 35, 30, 28`. Determine the sample variance for the temperatures.

    1. `18.2`  
    2. `15.1`  
    3. `12.5` 
    4. `13.3`

    Answer: d

     

    Q`3`. A survey collects the ages (in years) of a group of individuals: `30, 40, 35, 45, 50`. Find the sample variance for the ages.

    1. `62.5`  
    2. `75`  
    3. `50.75`  
    4. `25`  

    Answer: a

     

    Q`4`. A company records the weights (in kg) of samples taken from four different products: `0.5, 0.6, 0.4, 0.7`. Calculate the sample variance for the weights, rounded to `2` decimal places.

    1. `0.02`  
    2. `0.04`  
    3. `0.06`  
    4. `0.08` 

    Answer: a  

     

    Q`5`. A scientist measures the reaction times (in seconds) of a group of participants: `1.2, 1.5, 1.4, 1.3, 1.6`. Determine the sample variance for the reaction times.

    1. `0.025`  
    2. `0.04`  
    3. `0.063`  
    4. `0.082`  

    Answer: a

     

    Frequently Asked Questions

    Q`1`. What is sample variance, and why is it important?  

    Answer: Sample variance measures how much the data points in a sample deviate from the sample mean. It's crucial in statistics because it helps assess the dispersion or spread of data within a sample, providing insights into the variability of the dataset.

     

    Q`2`. How is sample variance different from population variance?  

    Answer: Sample variance is calculated using a subset of data, known as a sample, while population variance uses the entire dataset. Sample variance tends to slightly underestimate the population variance due to the use of `n-1` in the denominator instead of `n`, where `n` is the number of observations.

     

    Q`3`. How to find the sample variance for grouped and ungrouped data?

    Answer: For ungrouped data, the sample variance equation is:  

    `\sigma^2 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}`

    For grouped data, it is:  

    `\sigma^2 = \frac{\sum_{i=1}^{n}f_i(m_i - \bar{x})^2}{n-1}`

    where `x_i` are individual data points, `\bar{x}` is the sample mean, `f_i` is the frequency of occurrence for each group, and `m_i` is the midpoint of each group.

     

    Q`4`. How do I interpret the value of sample variance?  

    Answer: A higher sample variance indicates greater variability or spread of data points around the mean, while a lower sample variance suggests that data points are closer to the mean. It's essential to consider the context of the data and the specific characteristics of the dataset when interpreting sample variance.

     

    Q`5`. Can sample variance be negative?  

    Answer: No, sample variance cannot be negative. Since it involves squared differences from the mean, all terms contribute positively to the calculation. If the sample variance calculation results in a negative value, it suggests an error in the computation or an issue with the dataset.