pytorch 1.0이 release 됐고, 관련 내용들을 적어놓은 blog post에 대한 기록
pytorch 관련 projects
- Horovod - a distributed training framework that makes it easy for developers to take a single-GPU program and quickly train it on multiple GPUs
- Pytorch Geometry - a geometric computer vision library for PyTorch that provides a set of routines and differentiable modules.
- TensorBoardX - a module for logging PyTorch models to TensorBoard, allowing developers to use the visualization tool for model training.
- Translate - a library for training sequence-to-sequence models that's based on Facebook's machine translation systems (FairSeq)
AWS, Azure, Google Cloud Platform 에서도 사용 가능하다.
아래는 관련 논문 (?)
- Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour - Facebook
- Horovod: fast and easy distributed deep learning in TensorFlow - Uber
Figure above: The "data parallel" approach to distributed training involves splitting up the data and training on multiple nodes in parallel. In synchronous cases, the gradients for different batches of data are calculated separately on each node but averaged across nodes to apply consistent updates to the model copy in each node.
재밌는 듯. 처음에 봤던 gpu 병렬처리를 통해서 학습을 하는 내용을 봤던 건 A3C 공부하면서 봤던 Hogwild! 였는데, Hogwild!는 asynchronous 하고 위의 Horovd 는 sync 해서 사용하는 듯 하기도 하고.
PyTorch 1.0에서는 hogwild API에 있던 warning이 사라진 걸 보면 조심스럽게 사용해도 될 듯 하다.
'Programming' 카테고리의 다른 글
React JS, Babel, Webpack (0) | 2017.04.11 |
Python library windows installer (0) | 2017.01.03 |
Nginx, PHP-FPM에서 child process 개수 결정하기 (0) | 2016.08.11 |
React JS Lifecycle Method 소개 (0) | 2015.11.22 |
React JS 소개와 간단한 사용법 및 에제 (2) | 2015.10.18 |