Word Window Classification and Neural Networks
Overview
- Classification background
- Updating word vectors for classificaiton
- Window classification & cross entropy derivation tips
- A single layer neural network
- Max-Margin loss ans backprop
지금까지 Skip-gram & CBOW 등을 이용한 Word2Vec 방법은 Unsupervised Method 였다.
Goal:
Details of softmax
is unnormalized output
For each training example {}, our objective is to maximize the probability of the correct class
Hence, minimize the negative log probability of that class
Maximize probability = Minimize negative log probability
Regularization term
- Really full loss function over any dataset includes regularization over all parameters
- Regularization will prevent overfitting when we have a lot of features(or later a very powerful/deep model)
x-axis: more powerful model or more training iteration
BLUE: training error / RED: test error
Window Classification
- Classifying single words is rarely done
- antonyms, ambiguous named entities, etc.
- So, we need window classification
Window classification means that classify a word in its context window of neighboring words
- Many possibilities exist for classifying one word in context, e.g. averaging all the words in a window but that looses position information
Updating concatenated word vectors
video link 알아서 잘 해보자… 어렵지 않으니
- softmax 함수만 사용해서 학습하는건 좋은 방법은 아니다
- softmax 함수 1개만 있다는 건 linear decision boundary 만 결정한다는 뜻
- 그렇기 때문에 아래의 오른쪽 사진과 같이 Neural Network 사용해야 한다
Max-margin loss
Idea for objective function: make score of true window larger and corrupt window’s score lower: minimize
주로 Multi Layer Perceptron 에서의 backprop 내용이 있어서 크게 정리할 만한 내용은 없는듯
'Class' 카테고리의 다른 글
[cs224n] Lecture 9 Machine Translation and Advanced Recurrent LSTMs and GRUs (0) | 2018.04.11 |
---|---|
[cs224n] Lecture 1 Natural Language Processing with Deep Learning (0) | 2018.03.10 |
Algorithm 정리 (0) | 2015.05.08 |
computer network 정리 (0) | 2015.05.08 |
OS(Operating System) 정리 (2) | 2015.04.29 |