Upon completion of this lesson, you will be able to:
In the realm of mathematics and statistics, there exists a fascinating field that seeks to unravel the mysteries of uncertainty and predictability. This captivating discipline is none other than probability theory. Probability serves as a fundamental pillar upon which numerous scientific and practical endeavors are built, ranging from predicting the outcome of a coin toss to estimating the likelihood of a severe weather event. At the heart of probability lies the exploration of randomness, chance, and the patterns that emerge from uncertain events.
As graduate students, you are embarking on a journey that delves deeper into the realms of advanced mathematics, data analysis, and statistical inference. It is crucial to develop a solid foundation in probability theory to effectively navigate this expansive landscape and make informed decisions in various domains. Through this course, we will embark on an intellectual voyage, beginning with the basic principles of probability and gradually progressing to more sophisticated concepts and applications.
The objective of this lesson is to introduce you to the enchanting world of probability theory and equip you with the essential tools to analyze uncertain phenomena. By mastering the concepts of probability, you will gain the ability to quantify and reason about uncertainty, enabling you to make informed judgments, devise effective strategies, and understand the inherent risks and uncertainties present in various real-world scenarios.
Throughout this course, we will explore a wide range of topics, including but not limited to the foundations of probability theory, discrete and continuous random variables, probability distributions, conditional probability, independence, expected values, and the law of large numbers. Moreover, we will apply these theoretical concepts to practical problems encountered in fields such as finance, engineering, biology, and computer science, highlighting the pervasive nature of probability across disciplines.
It is worth emphasizing that while probability theory offers valuable insights into uncertainty, it does not claim to predict the future with certainty. Instead, it provides a framework to quantify and reason about uncertainty in a systematic manner, empowering us to make informed decisions based on available evidence. This course will equip you with the tools and knowledge necessary to assess probabilities, evaluate risks, and make well-informed choices in the face of uncertainty.
As we embark on this fascinating journey, I encourage you to embrace curiosity, critical thinking, and active engagement. Together, we will unravel the intricacies of probability, discover its elegance, and explore its practical applications. By the end of this course, you will possess a solid understanding of probability theory and be well-prepared to delve into the depths of advanced statistical methodologies, armed with the power to make probabilistically sound decisions.
Let us embark on this intellectual adventure and unlock the hidden world of probability, where uncertainty becomes a realm of exploration, understanding, and informed decision-making.
Probability is a mathematical concept that measures the likelihood or chance of an event occurring, typically expressed as a number between 0 and 1 inclusive. A probability of 0 indicates that an event will not occur, while a probability of 1 suggests that the event is certain to occur.
Tee chalk-talk by Dr. Schedlbauer of Khoury Boston might be a valuable starting point before venturing into the remainder of the lesson.
An experiment is a procedure that can be repeated under the same conditions and produces well-defined outcomes. The set of all possible outcomes of an experiment is called the sample space, often denoted as S.
An event is a subset of the sample space. If the outcome of an experiment is in the event, we say that the event has occurred.
For example, the probability that a person is primarily right-handed is ⅓ as there are three possible outcomes in the sample space: right-handed, left-handed, and ambidextrous. Each event is equally likely if we do not have or consider any other information. So, mathematically,
\(P(RH) = 1 / 3\)
Often, probability is interpreted as chance and expressed as a percentage. For the above example, we can say that the chance a randomly chosen person in a crowd is right-handed in 33%.
The mathematical formulation of probability is based on three fundamental axioms, proposed by the Russian mathematician Andrey Kolmogorov:
Non-Negativity: The probability of an event is a non-negative real number, i.e., \(P(A) ≥ 0\) for any event A.
Certainty: The probability that some elementary event in the entire sample space will occur is 1, i.e., \(P(S) = 1\).
Additivity: If A and B are mutually exclusive events, then the probability of either event occurring is the sum of the probabilities of each event, i.e., if \(A \cap B = \emptyset\), then \(P(A \cup B) = P(A) + P(B)\).
“Mutually exclusive” is a term used in probability theory to describe a situation where two events cannot occur at the same time. That is, the occurrence of one event excludes the possibility of the occurrence of the other event.
Let’s consider an example. Say, we are rolling a six-sided die and define two events:
These events are mutually exclusive because they cannot both occur on a single roll of the die. If the die roll results in an even number, it cannot also result in an odd number, and vice versa.
In terms of probability, if events A and B are mutually exclusive, then the probability of both events occurring (the intersection of A and B, denoted as A ∩ B) is 0, be definition. This leads to a useful property of mutually exclusive events as stated in the probability axioms: The probability of either event A or event B occurring (the union of A and B, denoted as A ∪ B) is equal to the sum of their individual probabilities, as stated above in the Additivity axiom.
If A and B are any two events (not necessarily mutually exclusive), then the probability of A or B occurring is:
\(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)
The term P(A ∩ B) represents the probability that both A and B occur, i.e., the probability of their intersection or their joint probability. If A and B are mutually exclusive (i.e., they cannot both occur), P(A ∩ B) is zero, and the formula simplifies to P(A ∪ B) = P(A) + P(B).
However, if A and B are not mutually exclusive, then we have to subtract P(A ∩ B) to avoid double-counting the cases where both A and B occur.
Example
Let’s consider an example using a deck of 52 unique cards.
The probabilities of these events are P(A) = 26/52 = 0.5 and P(B) = 4/52 = 0.077.
Note that these events are not mutually exclusive, as two queens are red (Queen of Hearts and Queen of Diamonds). Therefore, P(A ∩ B) = 2/52 = 0.038.
To find the probability of drawing either a red card or a queen, we use the formula for the probability of the union of two events:
P(A ∪ B) = P(A) + P(B) - P(A ∩ B) = 0.5 + 0.077 - 0.038 = 0.539.
So, the probability of drawing either a red card or a queen is 0.539.
Events A and B are independent if the occurrence of one does not affect the occurrence of the other. In terms of probability, A and B are independent if:
\(P(A \cap B) = P(A) \times P(B)\)
Events A and B are independent if the occurrence of one does not affect the occurrence of the other. In terms of probability, A and B are independent if the probability of both events occurring, P(A ∩ B), equals the product of their individual probabilities, P(A)P(B).
Let’s consider an example involving a coin flip and a dice roll:
These two events are independent because the outcome of the coin flip does not influence the outcome of the dice roll, and vice versa.
The probabilities of these events are P(A) = 1/2 (because a fair coin has two sides) and P(B) = 1/6 (because a fair six-sided die has six faces).
The probability of both events A and B happening can be calculated by multiplying the individual probabilities because these events are independent:
P(A ∩ B) = P(A)P(B) = (1/2) * (1/6) = 1/12
So, the probability of getting a heads on the coin flip and a 1 on the dice roll at the same time is 1/12.
The probability of an event given that another event has occurred is called conditional probability. If A and B are two events, the conditional probability of A given B (denoted as \(P(A|B)\)) is defined as:
\(P(A|B) = \frac{P(A \cap B)}{P(B)}\),
provided
\(P(B) > 0\)
This formula applies to any two events A and B, regardless of whether they are mutually exclusive or not.
Let’s consider an example involving a deck of 52 cards:
Event A: Drawing a heart (there are 13 hearts in a deck) Event B: Drawing a red card (there are 26 red cards in a deck)
The probabilities of these events are P(A) = 13/52 = 0.25 (since a quarter of the cards are hearts) and P(B) = 26/52 = 0.5 (since half of the cards are red).
Note that these events are not mutually exclusive, as all hearts are red. Thus, P(A ∩ B) = P(A) = 0.25.
Now, let’s compute the conditional probability of drawing a heart given that a red card was drawn:
P(A|B) = P(A ∩ B) / P(B) = 0.25 / 0.5 = 0.5
So, the conditional probability of drawing a heart, given that the card drawn is red, is 0.5, or 50%. This makes sense intuitively because half of the red cards in a deck are hearts.
Bayes’ Theorem is a fundamental theorem in probability theory and statistics that describes how to update the probability of a hypothesis based on evidence. It is named after Thomas Bayes, who introduced an early version of the theorem. If A and B are events, Bayes’ Theorem is written as:
\(P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}\)
where P(A) and P(B) are the probabilities of A and B independently, while P(A|B) is the conditional probability of A given B, and P(B|A) is the conditional probability of B given A.
Bayes’ Theorem is a fundamental concept in probability theory and statistics that describes how to update the probability of a hypothesis based on evidence. The theorem is named after Thomas Bayes, who introduced an early version of the theorem. The theorem is usually stated as follows:
P(A|B) = [P(B|A) * P(A)] / P(B)
It describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
Let’s consider a medical testing example. Assume there’s a disease D that affects 1% of the population (P(D) = 0.01). Also, assume there’s a test T for this disease which has a 95% detection rate for those who have the disease (P(T|D) = 0.95). But, the test also has a false positive rate of 5% (P(T|~D) = 0.05), meaning that 5% of healthy people will test positive.
If a random person tests positive, what is the probability they actually have the disease?
Here, we’re asked to find P(D|T), the probability of having the disease given a positive test result.
Applying Bayes’ Theorem:
P(D|T) = [P(T|D) * P(D)] / P(T)
P(T) can be calculated as the sum of the probability of a true positive and the probability of a false positive:
P(T) = P(T and D) + P(T and ~D) = P(T|D) * P(D) + P(T|~D) * P(~D)
Given that P(~D) = 1 - P(D), we have:
P(T) = 0.95 * 0.01 + 0.05 * (1 - 0.01) = 0.059
Substituting these values into Bayes’ Theorem gives:
P(D|T) = [0.95 * 0.01] / 0.059 ≈ 0.16
So, despite a positive test result, there’s only a 16% chance that the person actually has the disease. This is because the disease is quite rare, and the test has a significant false positive rate.
The probability of an event can be derived through two main approaches:
When each outcome in a sample space is equally likely, the probability of event A, P(A), is given by:
\(P(A) = \frac{\text{Number of outcomes in A}}{\text{Number of outcomes in S}}\)
In other words, the probability is the ratio of desirable (or successful) outcome and all possible outcomes. This approach requires an understanding of the nature of the experiment.
This approach is typically used when it is difficult to list all possible outcomes due to a large sample space, or when the outcomes are not equally likely. The empirical probability of an event is defined as the proportion of times that the event occurs in a large number of trials of a random experiment. It can be expressed as:
\(P(A) = \frac{(\text{Frequency of A})}{(\text{Total number of trials})}\)
This approach requires actual data from experiments or observations.
In the absence of observations from an experiment and knowing that the theoretical probability based on equal likelihood are not correct, probabilities can also be estimated based on intuition and non-quantifiable data. In other words, a person can estimate a probability based on their “gut”.
The Law of Large Numbers (LLN) is a fundamental principle in probability and statistics. It states that as the size of a sample is increased, the empirical probability (relative frequency) of an event approaches the theoretical probability of the event. This forms the basis for frequency-based probabilistic forecasts and the empirical approach to probability.
A probability distribution describes the probabilities of all possible outcomes in the sample space of a random experiment. Probability distributions can be discrete or continuous, depending on whether the sample space is discrete or continuous.
For a discrete random variable, the distribution is described by a probability mass function (pmf), which assigns a probability to each possible outcome. Examples include the Bernoulli, Binomial, and Poisson distributions.
For a continuous random variable, the distribution is described by a probability density function (pdf), which assigns a probability to an interval of outcomes, rather than individual outcomes. Examples include the Normal, Exponential, and Uniform distributions.
An empirical probability distribution is a probability distribution that is derived from observed data. It’s essentially a way to estimate the underlying probability distribution of a random variable based on a sample of observations, rather than knowing the true distribution.
The empirical probability of an event is the ratio of the number of occurrences of the event to the total number of observations. If we have a data set containing n observations, and an event occurs k times, then the empirical probability of the event is k/n.
Let’s understand this with an example.
Example:
Suppose a company tracks the number of customer support calls it receives each day over 30 days, and the data is as follows:
Number of Calls: [20, 22, 21, 23, 20, 22, 21, 24, 23, 23, 20, 21, 22, 23, 24, 24, 21, 20, 22, 23, 23, 21, 20, 20, 22, 24, 23, 23, 22, 21]
To construct an empirical probability distribution, we count the frequency of each unique value, then divide by the total number of observations.
Here are the frequencies: - 20 calls: 6 times - 21 calls: 5 times - 22 calls: 5 times - 23 calls: 8 times - 24 calls: 6 times
So, the empirical probability distribution would be: - P(20 calls) = 6/30 = 0.2 - P(21 calls) = 5/30 = 0.167 - P(22 calls) = 5/30 = 0.167 - P(23 calls) = 8/30 = 0.267 - P(24 calls) = 6/30 = 0.2
This distribution provides a simple visual depiction of the data, allowing the company to see that the most common number of calls is 23 and to assess the relative frequencies of the different call volumes. Note that the probabilities sum to 1 (or very close to 1 allowing for rounding errors), as should be the case in a probability distribution.
Two important characteristics of a probability distribution are its expectation (mean) and variance.
The expected value of a random variable is the long-run average value of many independent repetitions of the experiment. It can be considered as the “center” of the distribution.
The variance measures the dispersion of the random variable around the expected value. The square root of the variance is the standard deviation, which measures the average distance from the mean.
These concepts provide the basis for more advanced topics in probability theory, such as joint distributions, correlation, covariance, moments, and characteristic functions, among others.
Understanding and applying the principles of probability are essential for many disciplines, including statistics, physics, computer science, engineering, economics, and finance, among others. It’s crucial to acknowledge that although these tools allow us to quantify uncertainty, they don’t eliminate it. The wise application of these principles in real-world scenarios often involves thoughtful consideration of the assumptions being made and a careful interpretation of results.
Probability is a mathematical framework that quantifies the uncertainty or chance of an event occurring. Fundamental to this concept are the notions of experiments, sample spaces, and events. Probability rests on three key axioms: non-negativity, certainty, and additivity. Basic rules of probability include calculating the union of events, conditional probability, and the concept of independence.
Bayes’ Theorem provides a method to update the probability of a hypothesis based on evidence. Probabilities can be derived theoretically, assuming each outcome is equally likely, or empirically, based on the relative frequency of outcomes in empirical data.
The Law of Large Numbers states that as the number of trials increases, the empirical probability converges to the theoretical probability. Probability distributions can be discrete or continuous and are characterized by their probability mass function (pmf) or probability density function (pdf) respectively. The characteristics of a probability distribution are often described by its expectation (mean) and variance. Understanding probability is crucial across various fields including statistics, computer science, physics, and economics.