Variance & Data Spread
Last reviewed: May 2026
Standard deviation measures how far individual values fall from the mean on average. The 68-95-99.7 rule: in a normal distribution, 68% of data falls within ±1 SD, 95% within ±2, 99.7% within ±3. SD is the foundation of statistical inference, quality control, risk assessment, and investment analysis.1
| Type | Formula | When |
|---|---|---|
| Population SD (σ) | √(Σ(x−μ)² / N) | Entire population data |
| Sample SD (s) | √(Σ(x−x̄)² / (n−1)) | Sample from larger population |
The n−1 denominator (Bessel's correction) accounts for estimation error in samples.2
Standard deviation quantifies how spread out data points are from the mean (average). A low standard deviation means data clusters tightly around the mean; a high standard deviation means data is widely dispersed. For example, test scores of [78, 80, 82] have a mean of 80 and a very small SD (~1.6). Scores of [60, 80, 100] also average 80 but have a much larger SD (~16.3). Both datasets have the same average, but the second is far more spread out — standard deviation captures this difference.
Two formulas exist: Population SD (σ): divides by N (total count). Used when you have data for the entire population. Sample SD (s): divides by N−1 (Bessel's correction). Used when your data is a sample from a larger population, which is almost always the case in practice. The N−1 correction accounts for the fact that a sample tends to underestimate population variability. For large datasets (N > 30), the difference is negligible. For small datasets, using the wrong formula matters — always use sample SD unless you genuinely have the complete population.
For normally distributed data (bell curve): approximately 68% of values fall within 1 SD of the mean, 95% within 2 SDs, and 99.7% within 3 SDs. This is enormously useful: if exam scores have a mean of 75 and SD of 10, then 68% of students scored 65-85, 95% scored 55-95, and virtually everyone scored 45-105. A score of 95 is 2 SDs above the mean — better than approximately 97.5% of students. This rule enables quality control (manufacturing tolerances), grading curves, financial risk assessment, and scientific hypothesis testing.
In investing, standard deviation measures volatility — the risk dimension of returns. The S&P 500 has a historical annual SD of approximately 15-20%. This means in a "normal" year, returns fall within ±15-20% of the average. A fund with 8% average return and 5% SD (steady performer) is very different from one with 8% average return and 25% SD (roller coaster). The Sharpe ratio — (return − risk-free rate) ÷ SD — quantifies return per unit of risk, enabling fair comparison between investments with different risk profiles.
Confusing SD with variance: Variance = SD². SD is in the same units as the data (dollars, points, inches); variance is in squared units (dollars², points²), which is harder to interpret. Assuming normality: The 68-95-99.7 rule only applies to normal distributions. Skewed data (income, home prices) doesn't follow this pattern. Ignoring outliers: A single extreme value can dramatically inflate SD. Always examine data for outliers before interpreting SD as representative of typical spread.
Standard deviation measures how spread out values are from the average. A small standard deviation means data points cluster tightly around the mean; a large one means they're scattered widely. Consider two classes with the same average test score of 75%: Class A has scores of 70, 73, 75, 77, 80 (SD ≈ 3.5), while Class B has scores of 50, 60, 75, 90, 100 (SD ≈ 19.2). Both averages are identical, but the teaching implications are completely different — Class A needs minor adjustments while Class B suggests fundamentally different skill levels that require differentiated instruction. In investing, standard deviation measures volatility: a stock with 8% average annual return and 15% SD means roughly two-thirds of years will fall between -7% and +23% — while a bond fund with 4% return and 3% SD stays between 1% and 7%. The Sharpe ratio (return divided by SD) quantifies return per unit of risk, making it possible to compare investments with different risk profiles.
The distinction between population (σ) and sample (s) standard deviation trips up students and professionals alike. Population SD divides by N (the total count) when you have every data point in the group: all students' scores in a class, all products manufactured in a batch, all temperatures recorded in a year. Sample SD divides by (N-1), a correction called Bessel's correction, when your data is a subset drawn from a larger population. Why N-1? A sample inherently underestimates variability because it's less likely to capture extreme values. Dividing by N-1 inflates the result slightly to compensate. The practical impact: for large samples (n > 30), the difference is negligible — 1/30 versus 1/29 barely changes the result. For small samples (n = 5), it matters: dividing by 4 versus 5 changes the SD by 12%. When in doubt, use N-1 (sample SD), since most real-world data represents samples from larger populations.
For data following a normal (bell curve) distribution, standard deviation creates predictable zones. Approximately 68% of values fall within 1 SD of the mean, 95% within 2 SDs, and 99.7% within 3 SDs. This rule has immediate practical applications. In manufacturing, if bolts should be 10.00 mm with SD of 0.02 mm, then 95% of bolts measure between 9.96 and 10.04 mm (mean ± 2 SD). Setting tolerance limits at ± 3 SD (9.94-10.06 mm) captures 99.7% of production — only 3 in 1,000 bolts fall outside. Six Sigma quality methodology pushes tolerance to ± 6 SD, targeting just 3.4 defects per million. IQ scores are defined with mean 100 and SD 15: a score of 130 is 2 SD above average (top 2.3%), and 145 is 3 SD above (top 0.13%). Height in adult males follows a similar pattern: mean ~5'9" (175 cm) with SD ~2.8" (7 cm), so 95% of men are between 5'3" and 6'3".
The formula looks intimidating but follows a logical sequence. For the dataset {4, 8, 6, 5, 3}: (1) Find the mean: (4+8+6+5+3)/5 = 5.2. (2) Find each deviation from the mean: (4-5.2)=-1.2, (8-5.2)=2.8, (6-5.2)=0.8, (5-5.2)=-0.2, (3-5.2)=-2.2. (3) Square each deviation: 1.44, 7.84, 0.64, 0.04, 4.84. Squaring eliminates negative signs and gives extra weight to extreme values. (4) Average the squared deviations (variance): (1.44+7.84+0.64+0.04+4.84)/5 = 2.96 for population; divide by 4 for sample = 3.70. (5) Take the square root: population SD = √2.96 = 1.72; sample SD = √3.70 = 1.92. The square root returns the result to the original units — if you measured in centimeters, your SD is in centimeters, not squared centimeters (that's variance).
Standard deviation assumes a roughly symmetric distribution. For skewed data — income, home prices, insurance claims — the SD can be misleading. US household income has a mean of roughly $105,000 with a high SD, but the median is about $75,000 because a small number of very high incomes pull the mean and inflate the SD. Reporting "average income ± 1 SD" would suggest a range that includes negative income at the low end — obviously impossible. For skewed distributions, interquartile range (IQR, the range of the middle 50%) or median absolute deviation (MAD) better represent typical spread. Outliers also distort SD dramatically: adding a single data point of 100 to the set {4, 8, 6, 5, 3} changes the SD from 1.72 to 35.4 — a twenty-fold increase from one extreme value. Always visualize your data before interpreting standard deviation; a histogram reveals whether the normal distribution assumption is reasonable.
Investment professionals use standard deviation as the primary measure of portfolio risk. Historical SD calculated from monthly returns over 3-5 years shows how volatile an asset or portfolio has been. The S&P 500's long-term annualized SD is approximately 15-16%, meaning in a typical year returns fall within about 15 percentage points of the average in either direction. Bond funds typically show SDs of 3-6%, and money market funds approach 0%. Portfolio diversification works specifically because combining assets with low correlation reduces overall SD: two assets each with 15% SD but a correlation of 0.3 produce a combined portfolio SD of roughly 11% — reducing risk by 27% without necessarily reducing expected return. The efficient frontier in modern portfolio theory plots the maximum expected return achievable at each level of SD, helping investors identify portfolios that offer the best return per unit of risk taken. Understanding your personal risk tolerance in terms of SD helps frame investment decisions: if you can't stomach a 20% portfolio decline in a bad year, you need a portfolio with SD below approximately 10-12%.
→ Use sample SD by default. Unless you truly have the entire population.
→ Report SD alongside mean. Mean without SD is incomplete.
→ Use coefficient of variation for comparison. SD÷mean normalizes across different scales.
→ Check for normality. The 68-95-99.7 rule only applies to approximately normal distributions.
See also: Statistics · Z-Score · CAGR · Percentage