About Prevalence and Incidence Statistics
Statistical information such as prevalence, incidence, deaths,
and other data is provided from numerous sources
and is subject to numerous provisos.
Nevertheless, it is hoped to be useful,
if not completely accurate.
Prevalence versus incidence:
Prevalence and incidence are different measures
of a disease's occurrence.
The "prevalence" of a condition means the number of people
who currently have the condition,
whereas "incidence" refers to the annual number of people
who have a case of the condition.
These two measures are very different.
A chronic incurable disease
can have a low incidence
but high prevalence, because the prevalence is the
cumulative sum of past year incidence rates.
A short-duration curable condition
such as the common cold
can have a high incidence but low prevalence,
because many people get a cold each year,
but few people actually have a cold at any given time (so prevalence is low and is not
a very useful statistic).
To understand prevalence versus incidence, consider these examples (which over-simplify but
are still hopefully useful):
- Short-duration disease: A person who has a common cold for one day, would be added to the incidence statistics,
but (theoretically anyway) shouldn't be on the prevalence list.
- Newly diagnosed chronic disease: A person diagnosed with diabetes will be on the incidence numbers and prevalence numbers in that first year,
but then only on the prevalence numbers for second or later years.
A person who dies from a disease stops being on the prevalence data for both later years
and also the current year (unless prevalence statistics include this time period).
That person will be on the incidence numbers only for the year they were diagnosed,
and not in the year they die if they had the disease more than a year.
A death from a short disease like flu does get included in incidence, but not prevalence.
A death after many years from a long-term disease like diabetes removes that person
from prevalence numbers (and they should only have been on the incidence data their first diagnosis year).
Maximum of prevalence or incidence:
Taking the maximum value of either of
the prevalence and incidence numbers for
a disease is a reasonably useful indicator
that is used in certain places throughout this information.
It is a kind of "people affected" measure that gives
an approximate value to the number of people
who would have to deal with a condition in any given year.
Problems with prevalence data:
Prevalence attempts to measure the number of people
affected by a condition at any given time.
There are various possible problems with prevalence data:
- Diagnosed versus undiagnosed prevalence:
Two estimates of prevalence are not necessarily comparable.
Some estimates attempt to quantify the number
of diagnosed people.
Other prevalence estimates attempt to include
undiagnosed people who unknowingly have the condition.
Some prevalence numbers include only symptomatic conditions
whereas others may include latent infections.
- Different methods of gathering prevalence data:
Prevalence numbers may also have been computed via
various estimate methods ranging from research studies
to phone surveys.
Prevalence and "cured" or "remission" conditions:
Conditions that go into "remission" but
are not necessarily "cured", such as cancer,
cause problems for prevalence data.
Some such estimates use 5-year prevalence or 10-year prevalence
estimates, which includes only people who have had cancer
5 or 10 years previously (even if they are "cured").
This effectively assumes that a remission becomes a cure after
5 or 10 years,
so the person is then excluded from the prevalence numbers.
Problems with incidence data:
Incidence data attempts to measure the number of people
who become affected with a condition each year.
Incidence includes only new conditions,
not ongoing treatment of existing conditions.
The actual number of people affected by a condition
in a year can be less than incidence reports
in cases where people get multiple cases (e.g. common cold).
Two incidence rates are not necessarily comparable.
Some incidence data uses government notifications,
others based on physician or hospital diagnoses,
and various other methods.
Some estimates of incidence for under-diagnosed conditions
attempt to justify a larger incidence rate than
is reported by doctors or medical authorities,
whereas other rates may use only the official reported rates.
Rates of incidence/prevalence calcuations:
This site attempts to manipulate prevalence and incidence
data to give more relevant data,
such as to report the percent of the population affected,
total number of people affected nationally,
or the odds in a "1 in 1000" format.
These computations are based on population data
for the relevant reporting region (usually the national USA).
Details of abbreviations used for sources of statistics can be found on our sources page.
Some computation rates use different base data: prevalence, incidence,
or maximum of prevalence/incidence.
In some cases where the data is reported as a word
such as "common", "rare", "uncommon" or similar phrase,
an arbitrary numerical percentage has been applied
to this information.
Data that is reported based on births,
such as 1-in-3000 births,
has either been left as is (for chronic conditions)
or modified by an estimate of the number of births.
Data reported as a percent of pregnancies or pregnant women
has been calculated using an estimate of the number of pregnancies annually.
Lifetime risk data:
Some conditions report a risk factor for having
a condition in your lifetime.
For example, cancer is widely reported to affect about 1 in 3
people in their lifetime.
These rates are naturally much higher than either prevalence
or incidence data,
because they are effectively the cumulative risk
of incidence/prevalence over multiple years.
General problems with the data:
In addition to the above discussion,
there are various general qualifiers with
regard to prevalence, incidence,
and any of the other types of data.
Use of the data may incur the old apples-and-oranges comparison
problem because of data differences.
Problems with using the data include:
- Unclear sources: there are numerous statistics reported
in articles and on the internet,
and determining the actual study or survey on which
an estimate is based is often difficult,
even for statistics reported by health authorities or government agencies.
- Data ranges: where a rate is reported as a range,
such as "3 to 5 million people",
the lower number is arbitrarily chosen and used here.
This is a conservative assumption,
but may cause some estimates to be lower than they should.
- Different definitions of prevalence: some prevalence numbers
use estimates of people diagnosed,
others try also to include estimated of undiagnosed people,
and some use different values like 5-year prevalence
or 10-year prevalence data.
- Different sources: data has been collected from numerous sources,
and the reputability and accuracy of each source cannot
reasonably be completely confirmed.
- Different study methodologies: the data comes from various studies
that used different methodologies.
Some data comes from government notification bodies,
other from patient phone surveys,
others using various methods of estimation,
and so on.
Many estimates are computed from a small sample
and then extrapolated to a larger population group,
and this method has various inherent limitations to its accuracy.
- Different disease categories: some data may use different
categorization arrangements to determine who has a particular
Some studies use the ICD categories, others do not,
and there are actually small variations
in the different ICD categorizations
in any case.
For example, should wheezing be part of asthma or separate?
- Different years: data may come from numerous different years.
- Different locations: data may come from different countries,
states, or areas.
- Different age groups: data may refer to a particular age group,
such as "3% of adults",
and may not necessarily reflect the overall prevalence
in the entire population of all ages.
- Different racial factors: some data may reflect a particular race
more accurately and not apply to the entire population.
- Inherent reporting bias: although most reputable organizations use official
indepedent statistics, some organizations may tend to quote higher
numbers because either (a) they see the medical condition every day
and assume it is highly prevalence, or
(b) to make the conditions they monitor seem more important
such as to justify funding levels or seek donations.
- Country-specific information:
Most of the data is reported from USA sources,
and may be of limited value to other countries.
For example, certain conditions have a much higher
prevalence worldwide, especially in developing countries,
industrialized nations like the USA.
» Next page: Advertising Policy
Medical Tools & Articles:
Tools & Services:
Forums & Message Boards
- Ask or answer a question at the Boards: