1506.02626, Learning both Weight and Connections for Efficient Neural Networks

Background or Motivation
-> Neural Network는 Computer Vision, Voice Recognition에서 큰 성공을 거두었으나, 모델 크기와 계산량이 폭발적으로 증가하면서 모바일 기기에는 배포하기가 어렵다. Network의 불필요한 연결(Connection)을 제거하여 Model의 크기와 Inference 비용을 크게 줄이면서 정확도는 유지하게 한다.
Problem
-> DNN은 Over Parameterized 되어서 Redundancy 중복이 많다.
저장공간, 메모리 대역폭, 계산 자원 낭비
-> 기존 Model Compression 기법은 정확도 손실이 발생하거나 Convolutional Layer 효과가 제한적이다.
-> 단순 magnitude-based pruning은 retraining없이는 정확도가 좋지 않다.
-> Unstructured Sparsity는 일반 dense 라이브러리에서 SpeedUp이 어렵고, saving overhead가 발생한다.
Method
-> 중요한 connection만 남기고 불필요한 연결을 Pruning하는 Iterative Pruning + Retraining 을 하는 방법을 제안함.