List Comprehension이 빠른 이유를 찾아보자

June 21, 2020

Python을 어느정도 쓰는 사람이면 누구나 듣는 “List Append를 하는 것보다 List Comprehension을 써서 구현하는 것이 더 빠르고 간결하다.”라는 말. 하지만 실제 내부 동작과 더불어 설명하는 사람은 드물다. 실제 구현이 어떻게...

Tags: python

Apex's FusedLayerNorm vs Torch's LayerNorm

June 17, 2020

microsoft/DeepSpeedExamples의 BERT에서 Apex의 FusedLayerNorm을 사용하고 있고, NVIDIA/DeepLearningExamples에서도 Apex의 FusedLayerNorm을 사용하고 있다. 그럼 Apex의 FusedLayerNorm과 torch.nn.LayerNorm의 차이는 무엇일까?

Tags: pytorch

Are Sixteen Heads Really Better than One? 리뷰

May 18, 2020

Multi head attention이 표현력이 좋고 많은 정보를 담을 수 있다지만, 모든 head가 필요한 것은 아니다. 이에 관한 논문이 Are Sixteen Heads Really Better Than One? (Michel et al., 2019)이고, arxiv...

Tags: paper

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned 리뷰

May 18, 2020

이 논문 역시 MHA를 Pruning 하는 논문이다. English-Russian WMT dataset에서 48 encoder heads중 38개를 pruning해도 0.15 BLEU drop만 있었다고 한다. 코드는 GitHub - lena-voita/the-story-of-heads에 공개되어 있고, Arxiv 링크는

Tags: paper

🤗 The Future of Natural Language Processing - Model Size and Computational Efficiency

May 11, 2020

HuggingFace에서 올린 슬라이드/영상인 The Future of Natural Language Processing이 최근 NLP 전반에 대한 오버뷰를 잘 제공하고 있는데, 이 세션에서 나오는 내용들 중 Model Size, Computational Efficiency와 관련된 부분에 대해서 간단한...

Tags: nlp

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation 리뷰

May 2, 2020

GPT를 대화체에 맞도록 학습시킨 모델이다. 마이크로소프트에서 나온 논문이고, arxiv링크는 https://arxiv.org/abs/1911.00536이다. 코드는 GitHub microsoft/DialoGPT에서 볼 수 있다.

Tags: paper

ZeRO: Memory Optimization Towards Training A Trillion Parameter Models 리뷰

May 1, 2020

매우 큰 모델의 학습 프레임워크로 MegaTron을 뛰어넘는 성능을 보여줘 화제였던 논문이다. arvix 링크는 https://arxiv.org/abs/1910.02054이고, pytorch용 구현은 GitHub - microsoft/DeepSpeed에서 볼 수 있다.

Tags: paper

TinyBERT: Distilling BERT For Natual Language Understanding 리뷰

May 1, 2020

TinyBERT는 Under Review 상태인 논문이고, 화웨이 Noah’s Ark Lab에서 나온 논문이다. 코드는 GitHub huawei-noah/Pretrained-Language-Model/TinyBERT에 있다. arxiv 링크는 https://arxiv.org/abs/1909.10351이다.

Tags: paper

Layer Normalization 리뷰

May 1, 2020

Layer Normalization은 BERT에 쓰이는 것 때문에 찾아보게 된 논문이다. arxiv 링크는 https://arxiv.org/abs/1607.06450이다. training시간을 줄이는 것이 큰 기여인데, 이름에서 알 수 있듯이 neuron의 activity를 normalize하는 것이다. Batch Normalization도 비슷한 역할을...

Tags: paper

Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model 리뷰

April 27, 2020

TensorFlow 상에서 FP32를 INT8로 quantization을 해보는 논문이다. 1.5배의 성능 향상을 얻으면서 0.5 BLEU score accuracy만 떨어졌다고 한다. 또한 intel cpu에 최적화를 진행했다. arxiv 링크는 https://arxiv.org/abs/1906.00532이고, intel에서 나온 논문이다.

Tags: paper

Patient Knowledge Distillation for BERT Model Compression 리뷰

April 16, 2020

EMNLP 2019에 Accept된 마이크로소프트에서 나온 PKD (Patient Knowledge Distillation) 방식의 Model Compression 논문이다. arxiv 링크는 https://arxiv.org/abs/1908.09355이고 코드는 GitHub - intersun/PKD-for-BERT-Model-Compression에 있다.

Tags: paper

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding 리뷰

April 16, 2020

이 논문이 나오기 얼마 전에 마이크로 소프트에서 나온 MT-DNN (Liu et al., 2019)에 Knowledge Distillation을 적용한 논문이다. arvix링크는 https://arxiv.org/abs/1904.09482이고 코드는 GitHub - namisan/mt-dnn에서 확인 가능하다. 특이하게 다른...

Tags: paper

Distilling the Knowledge in a Neural Network 리뷰

April 16, 2020

구글에서 Geoffrey Hinton, Oriol Vinyals, Jeff Dean이 작성한 Distillation 개념을 제안한 논문이다. arvix 링크는 https://arxiv.org/abs/1503.02531이고, NIPS 2014 워크샵에 나온 논문이다.

Tags: paper

Q8BERT: Quantized 8Bit BERT 리뷰

April 14, 2020

intel에서 나온 NeurIPS 2019에 발표된 Q8BERT 논문이다. arxiv 링크는 https://arxiv.org/pdf/1910.06188.pdf이다. BERT를 fine tuning phase때 quantization aware training을 적용하여 4배 압축하고, intel CPU의 8bit 연산을 사용해 연산을 가속했다.

Tags: paper

FastBERT: a Self-distilling BERT with Adaptive Inference Time 리뷰

April 14, 2020

이 논문 역시 BERT가 너무 서빙하기 큰 모델이라서 fine tuning 시에 self distillation을 적용해본 것이다. 2019 Tencent Rhino-Bird Elite Training Program에서 펀딩받아서 작성한 것이다. arxiv 링크는 https://arxiv.org/abs/2004.02178이다.

Tags: paper