CS-GY 6923: Machine Learning
1.0.0
1.0.0
  • Introduction
  • What is Machine Learning?
  • Types of Machine Learning
    • Supervised Learning
      • Notations
      • Probabilistic Modeling
        • Naive Bayes Classifier
      • Linear Regression
      • Nearest Neighbor
      • Evaluating a Classifier
      • Parametric Estimation
        • Bayesian Approach to Parameter Estimation
        • Parametric Estimation for Simple Linear Regression
        • Parametric Estimation for Multivariate Linear Regression
        • Parametric Estimation for Simple Polynomial Regression
        • Parametric Estimation for Multivariate Polynomial Regression
      • Bias and Variance of an Estimator
      • Bias and Variance of a Regression Algorithm
        • Model Selection
      • Logistic Regression
      • Decision Trees
        • Using Decision Trees for Regression
        • Bias and Variance
      • Dimensionality Reduction
      • Neural Networks
        • Training a Neuron
        • MLP
          • Regression with Multiple Outputs
          • Advice/Tricks and Issues to Train a Neural Network
        • Deep Learning
      • Support Vector Machines
      • Ensemble Learning
    • Unsupervised Learning
      • K-Means Clustering
      • Probabilistic Clustering
    • Reinforcement Learning
Powered by GitBook
On this page

Was this helpful?

  1. Types of Machine Learning

Reinforcement Learning

In this kind of learning, the computer learns to choose the best course of action under a given circumstance, with the aim of maximizing a reward. It makes use of continuous feedback to judge the correctness of its decisions.

One issue is that this feedback is not instant: it can be delayed.

Credit Assignment is another issue: how to determine which action is to be given credit/blame?

State changes are an issue as well, because the state of the problem may/will change after every action is performed.

Reinforcement Learning is commonly used in learning to play games such as Chess, Checkers, Go etc.

Usually, a grid world is used with (st,ats_t,a_tst​,at​) pairs, i.e. state at time t, and action at time t. A policy tells the learner what action to perform in a given state. The best policy is one that maximizes the reward. Therefore, the aim of Reinforcement Learning is to learn this optimal policy.

The Exploration-Exploitation Trade-off: cost is incurred while exploring for better alternatives. This is common in the Bandit Problem.

Note that it isn't necessary to maximize the sum of rewards at each step, but it is important to maximize the final reward.

Q∗(s,a)Q^*(s,a)Q∗(s,a) is the maximum discounted reward that can be obtained if the learner started in state s and did action a as its first step. (* denotes optimal)

Steps (Q-Learning for Deterministic Worlds)

For each (s,a), initialize Q^(s,a)=0\hat{Q}(s,a)=0Q^​(s,a)=0 Observe current state s Do forever:

  • select an action a and execute it

  • receive reward r

  • observe new state s'

  • update the estimate of Q^(s,a)\hat{Q}(s,a)Q^​(s,a) as follows: Q^(s,a)=r+γmaxa′Q^(s′,a′)\hat{Q}(s,a) = r+\gamma max_{a'}\hat{Q}(s',a')Q^​(s,a)=r+γmaxa′​Q^​(s′,a′)

    (γ\gammaγ is used for discounting the reward, the max term is the maximum discounted reward starting in s', assuming Q^=Q∗\hat{Q}=Q^*Q^​=Q∗)

PreviousProbabilistic Clustering

Last updated 5 years ago

Was this helpful?