What is a p-value and what does it actually tell you?

A p-value is the probability of seeing results as extreme as the observed data if there were truly no effect (the null hypothesis is true). A p-value of 0.03 means there is a 3% chance of seeing these results by random chance alone. The conventional threshold is p < 0.05 (less than 5% chance), but this is a guideline, not a magic number. A p-value does NOT tell you the probability that your hypothesis is true, the size of the effect, or whether the result is practically meaningful. A statistically significant result can be trivially small, and a non-significant result does not prove no effect exists.

What is standard deviation and why does it matter?

Standard deviation measures how spread out data points are from the mean. A small standard deviation means data is clustered tightly around the average; a large one means data is widely dispersed. If average test score is 75 with SD of 5, most scores fall between 70-80. If SD is 15, scores are spread from 60-90. In a normal distribution, about 68% of data falls within 1 SD of the mean, 95% within 2 SDs, and 99.7% within 3 SDs. Standard deviation is essential for understanding whether differences between groups are meaningful or just normal variation.

← All Resources

Data Analysis Basics: Mean, Median, Correlation, and How to Avoid Common Statistical Traps

Q: When should I use mean vs median?

Use the mean (average) when data is roughly symmetrically distributed without extreme outliers. Use the median (middle value) when data is skewed or contains outliers. Classic example: in a room of 10 people earning $50,000 each, the mean and median income are both $50,000. If one person earns $10 million, the mean jumps to $1,045,000 while the median stays at $50,000. The median better represents what a 'typical' person in the room earns. Income, home prices, and wealth data almost always call for the median.

Q: What does correlation vs causation mean?

Correlation means two variables move together — when one increases, the other tends to increase (positive correlation) or decrease (negative correlation). Causation means one variable directly causes the other to change. Correlation does not prove causation. Ice cream sales and drowning deaths are correlated (both increase in summer), but ice cream does not cause drowning — the hidden variable (hot weather) drives both. To establish causation, you need controlled experiments, not just observational data showing a correlation.

Q: How large does my sample size need to be?

It depends on the expected effect size, desired confidence level, and population variability. Larger effects need smaller samples to detect. As a rough guide: surveys typically need 300-1,000 respondents for population estimates with 3-5% margin of error. A/B tests in marketing often need 1,000+ per group to detect 5% conversion differences. Medical trials range from dozens (large drug effects) to thousands (small effects). Use a sample size calculator to determine the right number for your specific situation — too small a sample risks missing real effects, while too large wastes resources.

Q: What is standard deviation and why does it matter?

Standard deviation measures how spread out data points are from the mean. A small standard deviation means data is clustered tightly around the average; a large one means data is widely dispersed. If average test score is 75 with SD of 5, most scores fall between 70-80. If SD is 15, scores are spread from 60-90. In a normal distribution, about 68% of data falls within 1 SD of the mean, 95% within 2 SDs, and 99.7% within 3 SDs. Standard deviation is essential for understanding whether differences between groups are meaningful or just normal variation.

✍️

By Derek Jordan, BA Business Marketing · Updated May 2026 · Reviewed for accuracy

📅 Updated May 2026 ⏱ 13 min read 🧮 Statistics Calculator

Data is everywhere — in your business reports, medical test results, news articles, and personal finances. But raw data is meaningless without the tools to analyze and interpret it correctly. This guide covers the essential concepts you need to make sense of data: measures of central tendency, variability, correlation, probability, and the most common mistakes people make when drawing conclusions from numbers.

Measures of Central Tendency: Mean, Median, Mode

These three metrics answer the same basic question — “what is the typical value?” — but they answer it differently, and choosing the wrong one can be deeply misleading.

Measure	Definition	Best When	Weakness
Mean	Sum of all values ÷ count	Data is symmetric, no extreme outliers	Pulled heavily by outliers
Median	Middle value when sorted	Data is skewed or has outliers	Ignores magnitude of extremes
Mode	Most frequent value	Categorical data, bimodal distributions	May not exist or not be unique

Real-world example: U.S. household income. The mean household income is approximately $105,000; the median is approximately $75,000. The mean is pulled up by high earners at the top. Saying “average American household earns $105,000” creates a misleading impression since more than half of households earn less than $75,000. Income, home prices, company revenue, and wealth data should almost always use the median. Use the Mean, Median, Mode Calculator to compute all three from any dataset.

Variability: Standard Deviation and Range

Knowing the average is only half the picture. You also need to know how spread out the data is. Two datasets can have the same mean but wildly different distributions.

Standard deviation (SD) measures the average distance of data points from the mean. In a normal (bell-curve) distribution, approximately 68% of data falls within 1 SD of the mean, 95% within 2 SDs, and 99.7% within 3 SDs. This is the “68-95-99.7 rule.”

Example: Two investment funds both return an average of 8% annually. Fund A has a standard deviation of 3% (most years: 5–11%). Fund B has a standard deviation of 15% (most years: −7% to 23%). Same average, very different risk profiles. Use the Standard Deviation Calculator to quantify variability in any dataset.

Z-scores: comparing apples to oranges. A z-score tells you how many standard deviations a data point is from the mean: z = (value − mean) ÷ SD. This lets you compare across different scales. An SAT score of 1400 and an ACT score of 32 are hard to compare directly, but converting both to z-scores puts them on the same scale. A z-score of +2.0 means the value is 2 standard deviations above the mean, placing it above roughly 97.7% of the distribution. Use the Z-Score Calculator to convert.

Correlation: When Two Variables Move Together

The correlation coefficient (r) measures the strength and direction of the linear relationship between two variables. It ranges from −1 to +1. At +1, variables move perfectly together (as X increases, Y increases). At −1, they move perfectly opposite. At 0, there is no linear relationship.

Correlation (\|r\|)	Strength	Real-World Example
0.00–0.19	Very weak	Shoe size and intelligence
0.20–0.39	Weak	Income and happiness (above basic needs)
0.40–0.59	Moderate	Height and weight
0.60–0.79	Strong	SAT scores and college GPA
0.80–1.00	Very strong	Height at age 3 and adult height

Use the Correlation Calculator to compute r from any paired dataset. Remember: correlation measures linear relationships only. Two variables can be strongly related in a nonlinear way and still show low correlation.

Correlation Does Not Imply Causation

This is perhaps the most important concept in data analysis. Just because two variables are correlated does not mean one causes the other. Three alternative explanations exist for any correlation:

Confounding variable: A third factor drives both. Ice cream sales and drowning deaths correlate because hot weather drives both. Countries with more Nobel laureates also consume more chocolate per capita — the confounder is national wealth, which funds both research and chocolate imports.

Reverse causation: The direction might be backward. Studies show a correlation between hospital visits and death. Hospitals do not cause death — being near death causes hospital visits.

Pure coincidence: With enough variables, random correlations appear. The divorce rate in Maine correlates almost perfectly with per capita margarine consumption. This is pure noise.

Sample Size and Confidence

Small samples produce unreliable conclusions. If you flip a coin 4 times and get 3 heads, that does not mean the coin is biased. If you flip it 10,000 times and get 7,500 heads, something is definitely wrong with the coin.

A confidence interval quantifies this uncertainty. A poll showing 52% support with a ±3% margin of error at 95% confidence means: if we repeated this poll 100 times, approximately 95 of those polls would show between 49% and 55% support. The result is NOT “definitely 52%” — it is “probably between 49% and 55%.” Use the Confidence Interval Calculator and Sample Size Calculator to plan studies properly.

Frequently Asked Questions

When should I use mean vs median?

Use the mean when data is symmetrically distributed without extreme outliers. Use the median when data is skewed (income, home prices, wealth). If one extreme value would dramatically change the average, the median is the better measure of “typical.”

What does correlation vs causation mean?

Correlation means two variables move together. Causation means one directly causes the other. Correlation does not prove causation — the relationship could be driven by a hidden third variable, reversed direction, or pure coincidence. Controlled experiments are needed to establish causation.

What is a p-value?

The probability of seeing results this extreme if there were no real effect. P < 0.05 is the conventional threshold for “statistical significance.” But a p-value does not tell you the probability your hypothesis is true, the size of the effect, or whether the result matters practically.

How large does my sample size need to be?

It depends on the expected effect size, confidence level, and population variability. Surveys typically need 300–1,000 for 3–5% margin of error. A/B tests often need 1,000+ per group. Use a sample size calculator for your specific situation.

What is standard deviation?

A measure of how spread out data is from the mean. Small SD = data clustered tightly. Large SD = widely dispersed. In a normal distribution, 68% of data falls within 1 SD, 95% within 2 SDs, and 99.7% within 3 SDs of the mean.

Analyze Your Data

Calculate mean, median, standard deviation, and more from any dataset. Use the free Statistics Calculator to analyze your numbers — no signup required.

← Back to all resources