Data is everywhere — in your business reports, medical test results, news articles, and personal finances. But raw data is meaningless without the tools to analyze and interpret it correctly. This guide covers the essential concepts you need to make sense of data: measures of central tendency, variability, correlation, probability, and the most common mistakes people make when drawing conclusions from numbers.
These three metrics answer the same basic question — “what is the typical value?” — but they answer it differently, and choosing the wrong one can be deeply misleading.
| Measure | Definition | Best When | Weakness |
|---|---|---|---|
| Mean | Sum of all values ÷ count | Data is symmetric, no extreme outliers | Pulled heavily by outliers |
| Median | Middle value when sorted | Data is skewed or has outliers | Ignores magnitude of extremes |
| Mode | Most frequent value | Categorical data, bimodal distributions | May not exist or not be unique |
Real-world example: U.S. household income. The mean household income is approximately $105,000; the median is approximately $75,000. The mean is pulled up by high earners at the top. Saying “average American household earns $105,000” creates a misleading impression since more than half of households earn less than $75,000. Income, home prices, company revenue, and wealth data should almost always use the median. Use the Mean, Median, Mode Calculator to compute all three from any dataset.
Knowing the average is only half the picture. You also need to know how spread out the data is. Two datasets can have the same mean but wildly different distributions.
Standard deviation (SD) measures the average distance of data points from the mean. In a normal (bell-curve) distribution, approximately 68% of data falls within 1 SD of the mean, 95% within 2 SDs, and 99.7% within 3 SDs. This is the “68-95-99.7 rule.”
Example: Two investment funds both return an average of 8% annually. Fund A has a standard deviation of 3% (most years: 5–11%). Fund B has a standard deviation of 15% (most years: −7% to 23%). Same average, very different risk profiles. Use the Standard Deviation Calculator to quantify variability in any dataset.
Z-scores: comparing apples to oranges. A z-score tells you how many standard deviations a data point is from the mean: z = (value − mean) ÷ SD. This lets you compare across different scales. An SAT score of 1400 and an ACT score of 32 are hard to compare directly, but converting both to z-scores puts them on the same scale. A z-score of +2.0 means the value is 2 standard deviations above the mean, placing it above roughly 97.7% of the distribution. Use the Z-Score Calculator to convert.
The correlation coefficient (r) measures the strength and direction of the linear relationship between two variables. It ranges from −1 to +1. At +1, variables move perfectly together (as X increases, Y increases). At −1, they move perfectly opposite. At 0, there is no linear relationship.
| Correlation (|r|) | Strength | Real-World Example |
|---|---|---|
| 0.00–0.19 | Very weak | Shoe size and intelligence |
| 0.20–0.39 | Weak | Income and happiness (above basic needs) |
| 0.40–0.59 | Moderate | Height and weight |
| 0.60–0.79 | Strong | SAT scores and college GPA |
| 0.80–1.00 | Very strong | Height at age 3 and adult height |
Use the Correlation Calculator to compute r from any paired dataset. Remember: correlation measures linear relationships only. Two variables can be strongly related in a nonlinear way and still show low correlation.
This is perhaps the most important concept in data analysis. Just because two variables are correlated does not mean one causes the other. Three alternative explanations exist for any correlation:
Confounding variable: A third factor drives both. Ice cream sales and drowning deaths correlate because hot weather drives both. Countries with more Nobel laureates also consume more chocolate per capita — the confounder is national wealth, which funds both research and chocolate imports.
Reverse causation: The direction might be backward. Studies show a correlation between hospital visits and death. Hospitals do not cause death — being near death causes hospital visits.
Pure coincidence: With enough variables, random correlations appear. The divorce rate in Maine correlates almost perfectly with per capita margarine consumption. This is pure noise.
Small samples produce unreliable conclusions. If you flip a coin 4 times and get 3 heads, that does not mean the coin is biased. If you flip it 10,000 times and get 7,500 heads, something is definitely wrong with the coin.
A confidence interval quantifies this uncertainty. A poll showing 52% support with a ±3% margin of error at 95% confidence means: if we repeated this poll 100 times, approximately 95 of those polls would show between 49% and 55% support. The result is NOT “definitely 52%” — it is “probably between 49% and 55%.” Use the Confidence Interval Calculator and Sample Size Calculator to plan studies properly.
Calculate mean, median, standard deviation, and more from any dataset. Use the free Statistics Calculator to analyze your numbers — no signup required.
Related tools: Mean, Median, Mode Calculator · Standard Deviation Calculator · Correlation Calculator · Z-Score Calculator · Confidence Interval Calculator · Sample Size Calculator · P-Value Calculator