Have a symptom?
See what questions
a doctor would ask.

About Prevalence and Incidence Statistics

Statistical information such as prevalence, incidence, deaths, and other data is provided from numerous sources and is subject to numerous provisos. Nevertheless, it is hoped to be useful, if not completely accurate.

Prevalence versus incidence: Prevalence and incidence are different measures of a disease's occurrence. The "prevalence" of a condition means the number of people who currently have the condition, whereas "incidence" refers to the annual number of people who have a case of the condition. These two measures are very different. A chronic incurable disease like diabetes can have a low incidence but high prevalence, because the prevalence is the cumulative sum of past year incidence rates. A short-duration curable condition such as the common cold can have a high incidence but low prevalence, because many people get a cold each year, but few people actually have a cold at any given time (so prevalence is low and is not a very useful statistic). To understand prevalence versus incidence, consider these examples (which over-simplify but are still hopefully useful):

  • Short-duration disease: A person who has a common cold for one day, would be added to the incidence statistics, but (theoretically anyway) shouldn't be on the prevalence list.
  • Newly diagnosed chronic disease: A person diagnosed with diabetes will be on the incidence numbers and prevalence numbers in that first year, but then only on the prevalence numbers for second or later years.
  • Deaths: A person who dies from a disease stops being on the prevalence data for both later years and also the current year (unless prevalence statistics include this time period). That person will be on the incidence numbers only for the year they were diagnosed, and not in the year they die if they had the disease more than a year. A death from a short disease like flu does get included in incidence, but not prevalence. A death after many years from a long-term disease like diabetes removes that person from prevalence numbers (and they should only have been on the incidence data their first diagnosis year).

Maximum of prevalence or incidence: Taking the maximum value of either of the prevalence and incidence numbers for a disease is a reasonably useful indicator that is used in certain places throughout this information. It is a kind of "people affected" measure that gives an approximate value to the number of people who would have to deal with a condition in any given year.

Problems with prevalence data: Prevalence attempts to measure the number of people affected by a condition at any given time. There are various possible problems with prevalence data:

  • Diagnosed versus undiagnosed prevalence: Two estimates of prevalence are not necessarily comparable. Some estimates attempt to quantify the number of diagnosed people. Other prevalence estimates attempt to include undiagnosed people who unknowingly have the condition. Some prevalence numbers include only symptomatic conditions whereas others may include latent infections.
  • Different methods of gathering prevalence data: Prevalence numbers may also have been computed via various estimate methods ranging from research studies to phone surveys.
  • Prevalence and "cured" or "remission" conditions: Conditions that go into "remission" but are not necessarily "cured", such as cancer, cause problems for prevalence data. Some such estimates use 5-year prevalence or 10-year prevalence estimates, which includes only people who have had cancer 5 or 10 years previously (even if they are "cured"). This effectively assumes that a remission becomes a cure after 5 or 10 years, so the person is then excluded from the prevalence numbers.

Problems with incidence data: Incidence data attempts to measure the number of people who become affected with a condition each year. Incidence includes only new conditions, not ongoing treatment of existing conditions. The actual number of people affected by a condition in a year can be less than incidence reports in cases where people get multiple cases (e.g. common cold). Two incidence rates are not necessarily comparable. Some incidence data uses government notifications, others based on physician or hospital diagnoses, and various other methods. Some estimates of incidence for under-diagnosed conditions attempt to justify a larger incidence rate than is reported by doctors or medical authorities, whereas other rates may use only the official reported rates.

Rates of incidence/prevalence calcuations: This site attempts to manipulate prevalence and incidence data to give more relevant data, such as to report the percent of the population affected, total number of people affected nationally, or the odds in a "1 in 1000" format. These computations are based on population data for the relevant reporting region (usually the national USA). Details of abbreviations used for sources of statistics can be found on our sources page. Some computation rates use different base data: prevalence, incidence, or maximum of prevalence/incidence. In some cases where the data is reported as a word such as "common", "rare", "uncommon" or similar phrase, an arbitrary numerical percentage has been applied to this information. Data that is reported based on births, such as 1-in-3000 births, has either been left as is (for chronic conditions) or modified by an estimate of the number of births. Data reported as a percent of pregnancies or pregnant women has been calculated using an estimate of the number of pregnancies annually.

Lifetime risk data: Some conditions report a risk factor for having a condition in your lifetime. For example, cancer is widely reported to affect about 1 in 3 people in their lifetime. These rates are naturally much higher than either prevalence or incidence data, because they are effectively the cumulative risk of incidence/prevalence over multiple years.

General problems with the data: In addition to the above discussion, there are various general qualifiers with regard to prevalence, incidence, and any of the other types of data. Use of the data may incur the old apples-and-oranges comparison problem because of data differences. Problems with using the data include:

  • Unclear sources: there are numerous statistics reported in articles and on the internet, and determining the actual study or survey on which an estimate is based is often difficult, even for statistics reported by health authorities or government agencies.
  • Data ranges: where a rate is reported as a range, such as "3 to 5 million people", the lower number is arbitrarily chosen and used here. This is a conservative assumption, but may cause some estimates to be lower than they should.
  • Different definitions of prevalence: some prevalence numbers use estimates of people diagnosed, others try also to include estimated of undiagnosed people, and some use different values like 5-year prevalence or 10-year prevalence data.
  • Different sources: data has been collected from numerous sources, and the reputability and accuracy of each source cannot reasonably be completely confirmed.
  • Different study methodologies: the data comes from various studies that used different methodologies. Some data comes from government notification bodies, other from patient phone surveys, others using various methods of estimation, and so on. Many estimates are computed from a small sample and then extrapolated to a larger population group, and this method has various inherent limitations to its accuracy.
  • Different disease categories: some data may use different categorization arrangements to determine who has a particular disease. Some studies use the ICD categories, others do not, and there are actually small variations in the different ICD categorizations in any case. For example, should wheezing be part of asthma or separate?
  • Different years: data may come from numerous different years.
  • Different locations: data may come from different countries, states, or areas.
  • Different age groups: data may refer to a particular age group, such as "3% of adults", and may not necessarily reflect the overall prevalence in the entire population of all ages.
  • Different racial factors: some data may reflect a particular race more accurately and not apply to the entire population.
  • Inherent reporting bias: although most reputable organizations use official indepedent statistics, some organizations may tend to quote higher numbers because either (a) they see the medical condition every day and assume it is highly prevalence, or (b) to make the conditions they monitor seem more important such as to justify funding levels or seek donations.
  • Country-specific information: Most of the data is reported from USA sources, and may be of limited value to other countries. For example, certain conditions have a much higher prevalence worldwide, especially in developing countries, than in industrialized nations like the USA.

By using this site you agree to our Terms of Use. Information provided on this site is for informational purposes only; it is not intended as a substitute for advice from your own medical team. The information on this site is not to be used for diagnosing or treating any health concerns you may have - please contact your physician or health care professional for all your medical needs. Please see our Terms of Use.

Home | Symptoms | Diseases | Diagnosis | Videos | Tools | Forum | About Us | Terms of Use | Privacy Policy | Site Map | Advertise