Computational Learning Theory

Under what conditions is a learning algorithm guaranteed to perform well?

Probably Approximately Correct

The core philosophy of Probably Approximately Correct (PAC) is that we cannot expect a learner to learn a concept perfectly every single time (because we rely on a random sample of data). Instead, we aim for a learner that will probably (with high confidence) produce a hypothesis that is approximately correct (has very low error).

Formal definitions:

A major goal of PAC theory is determining the bound on m (the number of training samples) required to guarantee learning.

There is a fundamental trade-off between the number of samples m, the allowable error ϵ, and the confidence θ. To get higher confidence or lower error, you typically need more data.

No Free Lunch Theorem: Across all problems, no single learner is better than any other. A learner only performs well if its hypothesis space is suited to the problem.

Weak vs Strong Learners

A Strong learner (PAC-learner) is a learner that can achieve an arbitrarily small error ϵ with arbitrarily high confidence 1δ.

A Weak Learner is a learner that performs only slightly better than random guessing with some fixed error rate and a fixed confidence.

It has been Mathematically proven that if a "weak" learner exists, it can be "boosted" into a strong learner. This means that strict PAC requirements are not always necessary, we just need a learner that is slightly better than a coin toss.

The original idea behind boosting involves resampling the training data.

  1. Train a weak learner
  2. Identify the objects that were misclassified.
  3. Repeat 1-2 with the misclassified objects, some number of times.
  4. The final prediction is made by a majority vote of the collection of weak learners.

Adaptive Boosting (AdaBoost)

AdaBoost is a constructive algorithm designed to implement the theoretical promise of boosting: turning a weak learner into a strong PAC learner. Instead of resampling the data like original boosting, AdaBoost works by explicitly re-weights the training instances.

The full algorithm:

  1. Initialize Weights: Start by assigning every training object an equal weight wi=1.
  2. Train Weak Classifier: Train a new classifier fK that minimizes the weighted error ϵK on the training set. This forces the learner to focus on objects with high weights (those that were previously misclassified).
Powered by Forestry.md