Deep Learning Specialization - Coursera
main
main
  • Introduction
  • Neural Networks and Deep Learning
    • Introduction to Deep Learning
    • Logistic Regression as a Neural Network (Neural Network Basics)
    • Shallow Neural Network
    • Deep Neural Network
  • Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
    • Practical Aspects of Deep Learning
    • Optimization Algorithms
    • Hyperparameter Tuning, Batch Normalization and Programming Frameworks
  • Structuring Machine Learning Projects
    • Introduction to ML Strategy
    • Setting Up Your Goal
    • Comparing to Human-Level Performance
    • Error Analysis
    • Mismatched Training and Dev/Test Set
    • Learning from Multiple Tasks
    • End-to-End Deep Learning
  • Convolutional Neural Networks
    • Foundations of Convolutional Neural Networks
    • Deep Convolutional Models: Case Studies
      • Classic Networks
      • ResNets
      • Inception
    • Advice for Using CNNs
    • Object Detection
      • Object Localization
      • Landmark Detection
      • Sliding Window Detection
      • The YOLO Algorithm
      • Intersection over Union
      • Non-Max Suppression
      • Anchor Boxes
      • Region Proposals
    • Face Recognition
      • One-Shot Learning
      • Siamese Network
      • Face Recognition as Binary Classification
    • Neural Style Transfer
  • Sequence Models
    • Recurrent Neural Networks
      • RNN Structure
      • Types of RNNs
      • Language Modeling
      • Vanishing Gradient Problem in RNNs
      • Gated Recurrent Units (GRUs)
      • Long Short-Term Memory Network (LSTM)
      • Bidirectional RNNs
    • Natural Language Processing & Word Embeddings
      • Introduction to Word Embeddings
      • Learning Word Embeddings: Word2Vec and GloVe
      • Applications using Word Embeddings
      • De-Biasing Word Embeddings
    • Sequence Models & Attention Mechanisms
      • Sequence to Sequence Architectures
        • Basic Models
        • Beam Search
        • Bleu Score
        • Attention Model
      • Speech Recognition
Powered by GitBook
On this page

Was this helpful?

  1. Sequence Models
  2. Natural Language Processing & Word Embeddings

Introduction to Word Embeddings

PreviousNatural Language Processing & Word EmbeddingsNextLearning Word Embeddings: Word2Vec and GloVe

Last updated 4 years ago

Was this helpful?

Word embeddings are a form of representing words in order for a machine to understand them and extract information from them.

Instead of using one-hot vectors to represent words, we use a featurized representation that consists of scores based on the word having certain features. This is because the dot product of any 2 one-hot vectors is zero, thereby leaving no means to compare words using their vectors.

For example, the following are the word embeddings for man, woman, king, queen, apple and orange:

The above matrix is known as the embedding matrix.

The columns denote word vectors and the number of columns is the size of the vocabulary. The number of rows decides the vector size. These vectors can be learned from a large vocabulary using transfer learning. More on learning word embeddings in the next sub-section.

Note that the word vector for a word can be obtained by multiplying the embedding matrix with the one-hot vector of the word.

It is interesting to know that King - Man + Woman ≈\approx≈ Queen!

Also, cosine similarity can be used to estimate the similarity between two vectors. The result of cosine similarity is between -1 and 1 (similar).

cosine_similarity(u, v) = u.v∣u∣∣v∣\frac{u.v}{|u||v|}∣u∣∣v∣u.v​

The t-SNE algorithm allows us to create a 2D visualization of word vectors, thereby allowing us to determine words that may be grouped together.

An interesting use case for word embeddings is Named Entity Recognition. If we have the word embedding for a certain word, we can comprehend its meaning by comparing it to words in our dataset and understand its use in a given context, even if the word is not in our dataset.