← All Resources

Data Analysis Basics: Mean, Median, Correlation, and How to Avoid Common Statistical Traps

✍️
By Derek Jordan, BA Business Marketing  ·  Updated May 2026  ·  Reviewed for accuracy
📅 Updated May 2026 ⏱ 13 min read 🧮 Statistics Calculator

Data is everywhere — in your business reports, medical test results, news articles, and personal finances. But raw data is meaningless without the tools to analyze and interpret it correctly. This guide covers the essential concepts you need to make sense of data: measures of central tendency, variability, correlation, probability, and the most common mistakes people make when drawing conclusions from numbers.

Measures of Central Tendency: Mean, Median, Mode

These three metrics answer the same basic question — “what is the typical value?” — but they answer it differently, and choosing the wrong one can be deeply misleading.

MeasureDefinitionBest WhenWeakness
MeanSum of all values ÷ countData is symmetric, no extreme outliersPulled heavily by outliers
MedianMiddle value when sortedData is skewed or has outliersIgnores magnitude of extremes
ModeMost frequent valueCategorical data, bimodal distributionsMay not exist or not be unique

Real-world example: U.S. household income. The mean household income is approximately $105,000; the median is approximately $75,000. The mean is pulled up by high earners at the top. Saying “average American household earns $105,000” creates a misleading impression since more than half of households earn less than $75,000. Income, home prices, company revenue, and wealth data should almost always use the median. Use the Mean, Median, Mode Calculator to compute all three from any dataset.

Variability: Standard Deviation and Range

Knowing the average is only half the picture. You also need to know how spread out the data is. Two datasets can have the same mean but wildly different distributions.

Standard deviation (SD) measures the average distance of data points from the mean. In a normal (bell-curve) distribution, approximately 68% of data falls within 1 SD of the mean, 95% within 2 SDs, and 99.7% within 3 SDs. This is the “68-95-99.7 rule.”

Example: Two investment funds both return an average of 8% annually. Fund A has a standard deviation of 3% (most years: 5–11%). Fund B has a standard deviation of 15% (most years: −7% to 23%). Same average, very different risk profiles. Use the Standard Deviation Calculator to quantify variability in any dataset.

Z-scores: comparing apples to oranges. A z-score tells you how many standard deviations a data point is from the mean: z = (value − mean) ÷ SD. This lets you compare across different scales. An SAT score of 1400 and an ACT score of 32 are hard to compare directly, but converting both to z-scores puts them on the same scale. A z-score of +2.0 means the value is 2 standard deviations above the mean, placing it above roughly 97.7% of the distribution. Use the Z-Score Calculator to convert.

Correlation: When Two Variables Move Together

The correlation coefficient (r) measures the strength and direction of the linear relationship between two variables. It ranges from −1 to +1. At +1, variables move perfectly together (as X increases, Y increases). At −1, they move perfectly opposite. At 0, there is no linear relationship.

Correlation (|r|)StrengthReal-World Example
0.00–0.19Very weakShoe size and intelligence
0.20–0.39WeakIncome and happiness (above basic needs)
0.40–0.59ModerateHeight and weight
0.60–0.79StrongSAT scores and college GPA
0.80–1.00Very strongHeight at age 3 and adult height

Use the Correlation Calculator to compute r from any paired dataset. Remember: correlation measures linear relationships only. Two variables can be strongly related in a nonlinear way and still show low correlation.

Correlation Does Not Imply Causation

This is perhaps the most important concept in data analysis. Just because two variables are correlated does not mean one causes the other. Three alternative explanations exist for any correlation:

Confounding variable: A third factor drives both. Ice cream sales and drowning deaths correlate because hot weather drives both. Countries with more Nobel laureates also consume more chocolate per capita — the confounder is national wealth, which funds both research and chocolate imports.

Reverse causation: The direction might be backward. Studies show a correlation between hospital visits and death. Hospitals do not cause death — being near death causes hospital visits.

Pure coincidence: With enough variables, random correlations appear. The divorce rate in Maine correlates almost perfectly with per capita margarine consumption. This is pure noise.

Sample Size and Confidence

Small samples produce unreliable conclusions. If you flip a coin 4 times and get 3 heads, that does not mean the coin is biased. If you flip it 10,000 times and get 7,500 heads, something is definitely wrong with the coin.

A confidence interval quantifies this uncertainty. A poll showing 52% support with a ±3% margin of error at 95% confidence means: if we repeated this poll 100 times, approximately 95 of those polls would show between 49% and 55% support. The result is NOT “definitely 52%” — it is “probably between 49% and 55%.” Use the Confidence Interval Calculator and Sample Size Calculator to plan studies properly.

Frequently Asked Questions

When should I use mean vs median?
Use the mean when data is symmetrically distributed without extreme outliers. Use the median when data is skewed (income, home prices, wealth). If one extreme value would dramatically change the average, the median is the better measure of “typical.”
What does correlation vs causation mean?
Correlation means two variables move together. Causation means one directly causes the other. Correlation does not prove causation — the relationship could be driven by a hidden third variable, reversed direction, or pure coincidence. Controlled experiments are needed to establish causation.
What is a p-value?
The probability of seeing results this extreme if there were no real effect. P < 0.05 is the conventional threshold for “statistical significance.” But a p-value does not tell you the probability your hypothesis is true, the size of the effect, or whether the result matters practically.
How large does my sample size need to be?
It depends on the expected effect size, confidence level, and population variability. Surveys typically need 300–1,000 for 3–5% margin of error. A/B tests often need 1,000+ per group. Use a sample size calculator for your specific situation.
What is standard deviation?
A measure of how spread out data is from the mean. Small SD = data clustered tightly. Large SD = widely dispersed. In a normal distribution, 68% of data falls within 1 SD, 95% within 2 SDs, and 99.7% within 3 SDs of the mean.

Analyze Your Data

Calculate mean, median, standard deviation, and more from any dataset. Use the free Statistics Calculator to analyze your numbers — no signup required.

Related tools: Mean, Median, Mode Calculator · Standard Deviation Calculator · Correlation Calculator · Z-Score Calculator · Confidence Interval Calculator · Sample Size Calculator · P-Value Calculator

← Back to all resources
📚 Sources: [1] Khan Academy — Statistics and Probability [2] OpenStax — Introductory Statistics [3] U.S. Census Bureau — Income Data [4] American Statistical Association — Statement on P-Values