hchang   07 Dec 2021

BATCH NORMALIZATION



0. 인트로

Batch Normalization을 써야한다.

1) 퀴즈

  1. Batch Normalization은 파라미터를 가지고 있지 않다. (O / X)
    2.

2) 의문점

$\bullet$ internal covariate shift?

$\bullet$ Covariate

공변량: 실험에서 보려는 독립변수 외의 종속변수에 영향을 미치는 변수

$\bullet$ Internal Covariate Shift

입력값에 계산되는 가중치에 따라 공변량의 영향이 달라져서 학습에 영향을 미칠 수 있다.

3) 저자의 의도

4) Abstract

Training Deep Neural Networks is complicated by the fact
that the distribution of each layer’s inputs changes during
training, as the parameters of the previous layers change.
This slows down the training by requiring lower learning
rates and careful parameter initialization, and makes it no-
toriously hard to train models with saturating nonlineari-
ties. We refer to this phenomenon as internal covariate
shift, and address the problem by normalizing layer in-
puts. Our method draws its strength from making normal-
ization a part of the model architecture and performing the
normalization for each training mini-batch. Batch Nor-
malization allows us to use much higher learning rates and
be less careful about initialization. It also acts as a regu-
larizer, in some cases eliminating the need for Dropout.
Applied to a state-of-the-art image classification model,
Batch Normalization achieves the same accuracy with 14
times fewer training steps, and beats the original model
by a significant margin. Using an ensemble of batch-
normalized networks, we improve upon the best published
result on ImageNet classification: reaching 4.9% top-5
validation error (and 4.8% test error), exceeding the ac-
curacy of human raters.

더 빠르게 안정적으로 훈련될 수 있다. 특정 경우에는 드롭아웃도 필요없다고 주장하는데, 항상 그런건 아닌 것 같다.

1. Introduction

Batch Normalization을 하면 Dropout도 필요없고, 학습 속도도, 성능도 좋아진다.

2.

image image

3. 사용법

3.2 쳅터 확인