Objectives

Upon completion of this lesson, you will be able to:

  • explain Bayes’ Theorem
  • know when Bayesian Inference is applicable

Overview

Bayes’ theorem is a powerful tool for reasoning under uncertainty that is widely used in fields such as statistics, machine learning, and artificial intelligence. In information science, it has applications in areas such as information retrieval, natural language processing, and data analysis. This lesson will provide an introduction to Bayes’ theorem, including its history, key concepts, and practical applications. We will explore how Bayes’ theorem can be used to make inferences, predictions, and decisions based on uncertain or incomplete information, and how it can help us improve our understanding of complex systems and phenomena.

One key reason why Bayes’ theorem is so important is that it allows us to update our beliefs or probabilities in light of new evidence or information. This is critical when dealing with uncertain or incomplete data, which is common in many real-world scenarios. Bayes’ theorem provides a framework for incorporating new data or evidence into our existing knowledge, and for revising our probabilities or predictions accordingly.

Another important aspect of Bayes’ theorem is that it provides a way to quantify the uncertainty or risk associated with a decision or prediction. By calculating probabilities and conditional probabilities, we can estimate the likelihood of different outcomes and evaluate the potential costs and benefits of different actions.

Bayes’ theorem is also important because it can help us identify patterns and relationships in data that may not be immediately obvious. By analyzing the conditional probabilities and relationships between different variables, we can gain insights into complex systems and phenomena, and make more accurate predictions and decisions.

Overall, Bayes’ theorem is a fundamental principle in probability theory that has many important practical applications in fields such as information science, healthcare, finance, and more. Its ability to incorporate new information and uncertainty into decision-making processes makes it a valuable tool for anyone dealing with complex problems and data.

By the end of this lesson, you will have a solid foundation in the principles of Bayes’ theorem and be able to apply it to a wide range of problems in information science and beyond.

Bayes’ Theorem

Bayes’ theorem provides a mathematical way to update our beliefs about the probability of an event occurring based on new information or evidence. It is named after the Reverend Thomas Bayes, an 18th-century English statistician and theologian who developed the theorem as a way to reason about the probability of God’s existence.

Discovery of Bayes’ Theorem

The discovery of Bayes’ theorem is often attributed to the Reverend Thomas Bayes, an 18th-century English statistician and theologian. However, it is worth noting that Bayes did not publish his work on the theorem during his lifetime, and his contributions to the development of the theorem are somewhat unclear.

Another figure who is often credited with the discovery of Bayes’ theorem is Richard Price, a Welsh philosopher and mathematician who was a contemporary of Bayes. Price corresponded with Bayes and was responsible for posthumously publishing Bayes’ work on the theorem.

There is some debate among historians and scholars over the extent of Bayes’ contributions to the development of the theorem, and whether he should be considered the true discoverer of the theorem or whether his work was largely influenced by others, including Price.

Regardless of the specific contributions of Bayes and Price, it is clear that Bayes’ theorem has become an important tool in probability theory and decision-making, and has numerous practical applications in fields such as statistics, machine learning, artificial intelligence, and more. Its ability to update beliefs and probabilities based on new evidence or information, and to quantify uncertainty and risk, makes it a valuable tool for anyone dealing with uncertain or incomplete data.

Conditional Probabilities and Prior Belief

At its core, Bayes’ theorem is a formula for calculating conditional probabilities. A conditional probability is the probability of an event occurring given that another event has occurred. For example, the probability of a person having a heart attack given that they smoke is a conditional probability.

Bayes’ theorem states that:

\(P(A|B) = \frac{P(B|A) P(A)}{P(B)}\)

where P(A|B) is the probability of event A given that event B has occurred, P(B|A) is the probability of event B given that event A has occurred, P(A) is the prior probability of event A, i.e., the likelihood of event A happening on its own, and P(B) is the prior probability of event B.

To understand how Bayes’ theorem works, consider the following example:

Suppose we want to know the probability that a person has lung cancer given that they smoke. Let A be the event that the person has lung cancer, and let B be the event that the person smokes.

We can estimate the probabilities as follows:

  • P(A) = 0.01 (the prior probability of lung cancer in the general population)
  • P(B) = 0.2 (the prior probability of smoking in the general population)
  • P(B|A) = 0.8 (the probability that a person with lung cancer smokes)

We need to find P(A|B) which is the probability that a person who smokes has lung cancer.

Using Bayes’ theorem, we can calculate P(A|B) as follows:

\(P(A|B) = P(B|A) * P(A) / P(B) = 0.8 * 0.01 / 0.2 = 0.04\)

This means that the probability of a person who smokes having lung cancer is 4%. Note that this probability is much higher than the prior probability of lung cancer in the general population, which is only 1%.

Bayes’ theorem can also be used in more complex scenarios, such as in machine learning algorithms. For example, consider the problem of classifying emails as spam or not spam. Let A be the event that an email is spam, and let B be the event that an email contains certain keywords that are often associated with spam. We can estimate the probabilities as follows:

P(A) = 0.1 (the prior probability of an email being spam) P(B|A) = 0.9 (the probability that a spam email contains certain keywords) P(B|not A) = 0.01 (the probability that a non-spam email contains certain keywords) P(A|B) = ? (the probability that an email containing certain keywords is spam)

Using Bayes’ theorem, we can calculate P(A|B) as follows:

P(A|B) = P(B|A) * P(A) / (P(B|A) * P(A) + P(B|not A) * P(not A)) = 0.9 * 0.1 / (0.9 * 0.1 + 0.01 * 0.9) = 0.91

This means that if an email contains certain keywords, there is a 91% chance that it is spam. Note that this probability is much higher than the prior probability of an email being spam, which is only 10%.

Bayes’ theorem is a powerful and versatile tool that can be applied to many different problems in information science, computer science, machine learning, engineering, medicine, decision analysis, and statistics.

Prior Probability

Prior probability is the probability of an event or hypothesis before any new evidence or information is considered. It is usually based on historical data, expert knowledge, or assumptions about the system or phenomenon being studied.

Finding the prior probability requires some understanding of the context or problem at hand, and may involve collecting data or consulting experts in the relevant field. Here are some common approaches for finding prior probabilities:

Historical data: If the event or hypothesis has occurred before, the prior probability can be estimated based on historical data. For example, if we want to estimate the probability of a stock market crash, we can look at the historical frequency of market crashes and use that as a basis for our prior probability.

Expert knowledge: If there are experts in the field or domain of the event or hypothesis, they may be able to provide insights into the prior probability based on their experience and knowledge. For example, if we want to estimate the probability of a disease outbreak, we can consult epidemiologists or public health experts for their estimates.

Assumptions: If there is no historical data or expert knowledge available, we may need to make assumptions about the system or phenomenon being studied. These assumptions should be based on logical reasoning or available data, and can be used to estimate the prior probability.

It’s important to note that the choice of prior probability can have a significant impact on the final probability estimates, especially if the new evidence is weak or ambiguous. Therefore, it’s important to carefully consider the available information and knowledge when estimating the prior probability, and to update it accordingly as new evidence emerges.

Worked Examples

To help us understand the application of Bayes’ Theorem, we will look at several worked examples.

Example I

Before going on vacation for a week, you ask your friend to water your plant. Without water, the plant has a 90 percent chance of dying. Even with proper watering, it has a 20 percent chance of dying. And the probability that your friend will forget to water it is 30 percent.

So, a few questions arise:

  1. Where might the probability estimates have come from?
  2. What’s the chance that your plant will survive the week?
  3. If your friend forgot to water it, what’s the chance it’ll be dead when you return?
  4. If it’s dead when you return, what’s the chance that your friend forgot to water it?

The probabilities could have come from prior observations but most likely are subjective “guesses” based on personal belief or prior knowledge.

To answer the next three questions, we need some variables and some values:

Let’s use W for watering the plan and W’ for not watering the plant; D to indicate the plant is dead, and, of course, the complement D’ that the plant is not dead (alive). We know the following from what is given:

The probability of the plant being dead given that it was not watered is \(P(D|W') = 0.9\) and likewise \(P(D|W) = 0.2\). And, finally, the probability that your friend did not water the plant is \(P(W') = 0.3\). Of course, \(P(W) = 1 - P(W') = 0.7\).

We need to determine the probability that your friend forgot to water the plant given that you observed that it is now dead, i.e., \(P(W'|D)\). To calculate this probability we need Bayes’ Theorem:

\[ \begin{equation} \label{eq:bayes} P(W'|D) = P(D |W') \frac{P(W')}{P(D)} \end{equation} \]

We do not know \(P(D)\), but we can calculate that from the information provided using the law of total probability:

\[ P(D) = P(D|W')P(W') + P(D|W)P(W) \]

Remember that W means watered and W’ means not watered, while D means dead and D’ means alive (not dead). So, now we can calculate the probability of your friend not having watered the plant given that it is dead with:

\[ \begin{equation} \label{eq:bayesfull} P(W'|D) = \frac{P(D|W')P(W')}{P(D|W')P(W') + P(D|W)P(W)} \end{equation} \]

Using the information provided, we now have:

\[ \begin{equation} \label{eq:bayesfullplugged} P(W'|D) = \frac{(0.9)(0.3)}{(0.9)(0.3) + (0.2)(0.7)}= 0.659 \end{equation} \]

So, the probability that your friend forgot to water the plant given that it is dead is estimated at 0.659 or 65.9%.

Example II

This “classic” example from healthcare is drawn from collected disease and diagnostics data for disease screening:

From collected data, it is estimated that 1% of women at age 40 who participate in routine screening have breast cancer. Furthermore, studies have shown that 80% of women who actually have breast cancer will show positive on a mammogram, although 9.6% of women without breast cancer will also get a mammogram that is positive for the disease. So, what is the probability that a 40-year-old woman does indeed have breast cancer given that she had a positive mammography in a routine screening?

Here is what we know:

  • P(Cancer) = P(C) = 0.01
  • P(No Cancer) = P(C’) = 1 - P(C) = 0.99
  • P(Mammogram+|Cancer) = P(M+|C) = 0.8
  • P(Mammogram+|No Cancer) = P(M+|C’) = 0.096

We need to determine P(Cancer | Mammogram+):

\[ \begin{equation} \label{eq:bayesCancer} P(C|M^{+}) = \frac{P(M^{+}|C)P(C)}{P(M^{+})} \end{equation} \]

Since we do not know P(M+), we need the long version of Bayes’ Theorem:

\[ \begin{equation} \label{eq:bayesfullCancer} P(C|M^{+}) = \frac{P(M^{+}|C) P(C)}{P(M^{+}|C)P(C) + P(M^{+}|C')P(C')} \end{equation} \]

So, plugging in the parameters, we get:

\[ \begin{equation} \label{eq:bayesCancerCalc} P(C|M^{+}) = \frac{(0.8)(0.01)}{(0.8)(0.01)+(0.096)(0.99)} = 0.0776 \end{equation} \]

So, the probability that the patient who presents with a positive mammogram actually exhibits the disease is 7.76%.

Example III

Here’s an example of a Bayesian analysis problem in computer science:

Suppose we want to build a spam filter for email. We have a training dataset of emails that have been labeled as either spam or not spam, and we want to use this dataset to build a classifier that can classify new emails as either spam or not spam.

One approach to building a spam filter is to use a Bayesian classifier. This involves calculating the conditional probability of an email being spam given its features, and then using Bayes’ theorem to update this probability based on the evidence provided by each new feature.

For example, we might consider the following features for an email:

  • The presence or absence of certain keywords or phrases
  • The number of misspelled words
  • The length of the email
  • The sender’s email address

We can use the training dataset to estimate the conditional probabilities of each feature given whether an email is spam or not. For example, we might estimate the probability of a certain keyword appearing in a spam email as 0.8, and the probability of the same keyword appearing in a non-spam email as 0.2.

Then, given a new email with these features, we can calculate the conditional probability of the email being spam or not spam, and use Bayes’ theorem to update this probability as we observe each new feature.

This Bayesian approach has several advantages over other machine learning algorithms, including its ability to handle missing or incomplete data, and its ability to incorporate new evidence and update probabilities as new data becomes available.

Overall, Bayesian analysis has many applications in computer science, including machine learning, natural language processing, computer vision, and more.

Exercises

  1. John wants to build a small golf practice range. He believes the chance of building a successful range is 40%. His friend Alice suggests that he conduct a survey. There is a 0.9 probability that the survey will be favorable if the driving range is a success. There’s a 0.8 probability that the survey will be unfavorable if the driving range is not a success. What is the chance of a successful range if the market survey is favorable? Answer: P(Success|Favorable Survey) = 0.62

  2. In a computer security system, the probability of a genuine user being denied access (false negative) is 0.02, while the probability of an intruder being granted access (false positive) is 0.01. According to system logs, genuine users attempt to access the system 95% of the time, and intruders attempt to access it 5% of the time. Using Bayes’ theorem, calculate the probability that a randomly selected access attempt, which was denied, was made by a genuine user. Provide your answer as a decimal rounded to four decimal places. Answer: P(Genuine|Denied) = 0.9744

  3. A library uses a filtering system to classify newly added books into two categories: fiction (F) and non-fiction (NF). The system has a 90% success rate in correctly identifying fiction books, while it has an 80% success rate in correctly identifying non-fiction books. In the library’s collection, 60% of the books are fiction and 40% are non-fiction. Using Bayes’ theorem, determine the probability that a book classified as fiction by the filtering system is actually a fiction book. Provide your answer as a decimal rounded to four decimal places. Answer: P(Fiction|Classified Fiction) = 0.871

References

None.

Errata

None collected yet. Let us know.