Solution Tips: Study Questions 3.908

Below is a solution guide and key discussion points for instructors, aligned with the study questions and practice exercises provided earlier. This guide can assist with grading, classroom facilitation, or developing lecture supplements.

Instructor’s Solution Guide

Study Questions – Key Discussion Points

Bias-Variance Tradeoff

Bagging reduces variance by averaging over multiple unstable learners trained on bootstrapped samples. It is especially effective when base learners overfit. However, because all learners have similar bias, bagging does not reduce bias. Boosting reduces both bias and variance by adaptively focusing on hard-to-predict examples, which helps correct underfitting and misrepresentation in earlier models.
Condorcet Jury Theorem

The theorem states that if each voter (or model) is better than random and votes independently, then the majority vote becomes increasingly accurate as the group size grows. In ensemble learning, this implies that many weak learners (with accuracy > 0.5) can together produce a strong classifier—assuming their errors are not highly correlated. In practice, models are rarely independent, so diversity must be introduced intentionally.
Homogeneous vs. Heterogeneous Ensembles

Homogeneous ensembles (e.g., bagging, boosting with trees) are easier to implement and tune but rely on induced randomness. Heterogeneous ensembles (e.g., stacking) can combine different perspectives (e.g., linear vs. non-linear models) and tend to perform better on complex tasks. However, they are harder to interpret and manage.
Boosting Strategy

The learning rate (shrinkage) controls how much each tree contributes; a lower value improves generalization but requires more trees. Too many trees or too high a learning rate can lead to overfitting. Early stopping and cross-validation are important for regularization.
Meta-Learning and Overfitting

Training the meta-learner on out-of-fold predictions ensures that it learns from unbiased estimates of base model performance. If the same data used to train the base learners is used for the meta-learner, the model can overfit to training errors rather than learn true generalization patterns.
Comparison of Methods

Boosting is more powerful when the base model underfits (e.g., shallow trees), and when data has complex structure or imbalanced classes. Bagging is preferable when base learners overfit and are sensitive to training data variation. Bagging is also easier to parallelize.
Model Evaluation

Cross-validation provides a general-purpose error estimate suitable for any model, including ensembles. Out-of-bag (OOB) error is specific to bagging/random forests and is faster to compute, but it may be biased if models are not sufficiently diverse. OOB is not available in boosting or stacking.

Practice Exercises – Expected Solutions

Random Forest Tuning

Expect students to try combinations of ntree = 100, 200, 500 and mtry = 2, 4, 6, sqrt(p). Results will show lower OOB error with more trees and appropriate mtry. Students should plot OOB error and select the optimal hyperparameters using the lowest error.
AdaBoost Simulation

Students build a loop to reweight samples after each iteration. Training error should decrease steadily, but test error may plateau or increase if boosting continues beyond the optimal round. Results will show early stopping is useful.
Feature Importance with GBM

Students should identify features like lstat, rm, and crim as highly influential for predicting housing prices. Comparing with random forest may show similar rankings, but GBM might favor fewer features more heavily.
Stacking with caretEnsemble

Expect the stacked model to outperform individual models in accuracy or ROC AUC. Students should interpret meta-learner coefficients or performance metrics and describe how the ensemble improves on weak learners.
XGBoost Experimentation

Lower eta values will usually improve test performance if combined with early stopping and sufficient nrounds. Students should create error curves across nrounds and identify optimal depth–learning rate combinations.
Error Correlation Analysis

Students should compute residuals or prediction errors and calculate correlation matrices. Highly correlated base models offer limited ensemble benefit. Discuss how stacking can help when base model errors are weakly correlated.