Основы глубокого обучения (примечания)

Шрифт

Интервал

Kuhn D. et al. Handbook of Child Psychology. Vol. 2. Cognition, Perception, and Language. Wiley, 1998.

LeCun Y., Bottou L., Bengio Y., Haffner P. Gradient-Based Learning Applied to Document Recognition // Proceedings of the IEEE. 1998. November. Vol. 86 (11). Pp. 2278–2324.

Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain // Psychological Review. 1958. Vol. 65. No. 6. P. 386.

Bubeck S. Convex optimization: Algorithms and complexity // Foundations and Trends® in Machine Learning. 2015. Vol. 8. No. 3–4. Pp. 231–357.

Restak R. M., Grubin D. The Secret Life of the Brain. Joseph Henry Press, 2001.

McCulloch W. S., Pitts W. A logical calculus of the ideas immanent in nervous activity // The Bulletin of Mathematical Biophysics. 1943. Vol. 5. No. 4. Pp. 115–133.

Mountcastle V. B. Modality and topographic properties of single neurons of cat’s somatic sensory cortex // Journal of Neurophysiology. 1957. Vol. 20. No. 4. Pp. 408–434.

Nair V., Hinton G. E. Rectified Linear Units Improve Restricted Boltzmann Machines // Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010.

Мы можем рассчитать значения неизвестных весов, решив систему линейных уравнений, и получим точное решение. Но такой подход возможен только для линейного нейрона. Для нелинейных составить систему уравнений и получить точное решение невозможно, поэтому необходимо обучение. Прим. науч. ред.

Rosenbloom P. The method of steepest descent // Proceedings of Symposia in Applied Mathematics. 1956. Vol. 6.

Rumelhart D. E., Hinton G. E., Williams R. J. Learning representations by backpropagating errors // Cognitive Modeling. 1988. Vol. 5. No. 3. P. 1.

http://stanford.io/2pOdNhy.

Nelder J. A., Mead R. A simplex method for function minimization // The Computer Journal. 1965. Vol. 7. No. 4. Pp. 308–313.

Tikhonov A. N., Glasko V. B. Use of the regularization method in nonlinear problems // USSR Computational Mathematics and Mathematical Physics. 1965. Vol. 5. No. 3. Pp. 93–107.

Srebro N., Rennie J. D. M., Jaakkola T. S. Maximum-Margin Matrix Factorization // NIPS. 2004. Vol. 17.

Srivastava N. et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting // Journal of Machine Learning Research. 2014. Vol. 15. No. 1. Pp. 1929–1958.

https://www.tensorflow.org/.

http://deeplearning.net/software/theano/ (http://bit.ly/2jtjGea); http://torch.ch/; http://caffe.berkeleyvision.org/; https://www.nervanasys.com/technology/neon/ (http://bit.ly/2r9XugB); https://keras.io/.

В сентябре 2017 года объявлено, что разработка Theano будет прекращена после выпуска версии 1.0 (см. https://groups.google.com/forum/#!msg/theano-users/7Poq8BZutbY/rNCIfvAEAwAJ). Для Torch создали реализацию на Python, названную PyTorch. Эта новая библиотека стремительно набирает популярность. Прим. науч. ред.

https://www.tensorflow.org/install/.

https://www.tensorflow.org/api_docs/python/tf/Variable.

https://www.tensorflow.org/api_docs/python/tf/random_normal.

https://www.tensorflow.org/api_docs/python/tf/assign.

http://bit.ly/2rtqoIA.

https://www.tensorflow.org/api_docs/python/tf/initialize_variables.

Abadi M. et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems // arXiv preprint arXiv: 1603.04467 (2016).

https://www.tensorflow.org/api_docs/python/tf/placeholder.

https://www.tensorflow.org/api_docs/python/tf/Session.

https://www.tensorflow.org/api_docs/python/tf/get_variable.

https://www.tensorflow.org/api_docs/python/tf/variable_scope.

https://www.tensorflow.org/api_docs/python/tf/device.

https://www.tensorflow.org/api_docs/python/tf/ConfigProto.

Cox D. R. The Regression Analysis of Binary Sequences // Journal of the Royal Statistical Society. Series B (Methodological). 1958. Pp. 215–242.

Для каждого экземпляра данных в мини-пакете нейронная сеть выдает вероятность принадлежности данных к каждому классу (то есть вероятность того, что на исходном изображении 0, 1, 2 и так далее до 9). Прим. науч. ред.

https://www.tensorflow.org/api_docs/python/tf/summary/scalar.

https://www.tensorflow.org/api_docs/python/tf/summary/histogram.

https://www.tensorflow.org/api_docs/python/tf/summary/merge_all.

Аккуратность — одна из мер оценки качества работы нейронной сети (и других алгоритмов машинного обучения), показывающая, какая доля экземпляров данных была правильно классифицирована. Прим. науч. ред.

https://www.tensorflow.org/get_started/graph_viz.

He K. et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification // Proceedings of the IEEE International Conference on Computer Vision. 2015.

Bengio Y. et al. Greedy Layer-Wise Training of Deep Networks // Advances in Neural Information Processing Systems. 2007. Vol. 19. P. 153.

Goodfellow I. J., Vinyals O., Saxe A. M. Qualitatively characterizing neural network optimization problems // arXiv preprint arXiv: 1412.6544 (2014).

Dauphin Y. N. et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization // Advances in Neural Information Processing Systems. 2014.

Более строго, мы движемся в направлении, противоположном градиенту, так как градиент указывает направления наиболее быстрого возрастания функции, а нам нужно направление убывания. Прим. науч. ред.

Sutskever I. et al. On the importance of initialization and momentum in deep learning // ICML (3). 2013. Vol. 28. Pp. 1139–1147.

Сейчас импульсный метод Нестерова уже реализован в TensorFlow: https://www.tensorflow.org/api_docs/python/tf/train/MomentumOptimizer. Прим. науч. ред.

Møller M. F. A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning // Neural Networks. 1993. Vol. 6. No. 4. Pp. 525–533.

Broyden C. G. A new method of solving nonlinear simultaneous equations // The Computer Journal. 1969. Vol. 12. No. 1. Pp. 94–99.

Bonnans J.-F. et al. Numerical Optimization: Theoretical and Practical Aspects. Springer Science & Business Media, 2006.

Duchi J., Hazan E., Singer Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization // Journal of Machine Learning Research. 2011. Vol. 12 (Jul.). Pp. 2121–2159.

Tieleman T., Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude // COURSERA: Neural Networks for Machine Learning. 2012. Vol. 4. No. 2.

Kingma D., Ba J. Adam: A Method for Stochastic Optimization // arXiv preprint arXiv: 1412.6980 (2014).

Воксель (voxel) — элемент трехмерного изображения. Название образовано по аналогии с «пиксел» (picture element, элемент изображения), от англ. volume element — объемный элемент. Прим. науч. ред.

Hubel D. H., Wiesel T. N. Receptive fields and functional architecture of monkey striate cortex // The Journal of Physiology. 1968. Vol. 195. No. 1. Pp. 215–243.

Cohen A. I. Rods and Cones // Physiology of Photoreceptor Organs. Springer Berlin Heidelberg, 1972. Pp. 63–110.

Viola P., Jones M. Rapid Object Detection using a Boosted Cascade of Simple Features // Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on. Vol. 1. IEEE, 2001.

Deng J. et al. ImageNet: A Large-Scale Hierarchical Image Database // Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference. IEEE, 2009.

Perronnin F., Sénchez J., Xerox Y. L. Large-scale image categorization with explicit data embedding // Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference. IEEE, 2010.

Krizhevsky A., Sutskever I., Hinton G. E. ImageNet Classification with Deep Convolutional Neural Networks // Advances in Neural Information Processing Systems. 2012.

LeCun Y. et al. Handwritten Digit Recognition with a Back-Propagation Network // Advances in Neural Information Processing Systems. 1990.

Hubel D. H., Wiesel T. N. Receptive fields of single neurones in the cat’s striate cortex // The Journal of Physiology. 1959. Vol. 148. No. 3. Pp. 574–591.

https://www.tensorflow.org/api_docs/python/tf/nn/conv2d.

https://www.tensorflow.org/api_docs/python/tf/nn/max_pool.

Graham B. Fractional Max-Pooling // arXiv Preprint arXiv: 1412.6071 (2014).

Simonyan K., Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition // arXiv Preprint arXiv: 1409.1556 (2014).

Ioffe S., Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift // arXiv Preprint arXiv: 1502.03167. 2015.

Krizhevsky A., Hinton G. Learning Multiple Layers of Features from Tiny Images. 2009.

Maaten L. van der, Hinton G. Visualizing Data using t-SNE // Journal of Machine Learning Research. 2008. Vol. 9 (Nov.). Pp. 2579–2605.

http://cs.stanford.edu/people/karpathy/cnnembed/.

Gatys L. A., Ecker A. S., Bethge M. A Neural Algorithm of Artistic Style // arXiv Preprint arXiv: 1508.06576 (2015).

Karpathy A. et al. Large-scale Video Classification with Convolutional Neural Networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014.

Abdel-Hamid O. et al. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Kyoto, 2012. Pp. 4277–4280.

Hinton G. E., Salakhutdinov R. R. Reducing the Dimensionality of Data with Neural Networks // Science. 2006. Vol. 313. No. 5786. Pp. 504–507.

Vincent P. et al. Extracting and Composing Robust Features with Denoising Autoencoders // Proceedings of the 25th International Conference on Machine Learning. ACM, 2008.

Bengio Y. et al. Generalized Denoising Auto-Encoders as Generative Models // Advances in Neural Information Processing Systems. 2013.

Ranzato M. et al. Efficient Learning of Sparse Representations with an Energy-Based Model // Proceedings of the 19th International Conference on Neural Information Processing Systems. MIT Press, 2006; Ranzato M., Szummer M. Semi-supervised Learning of Compact Document Representations with Deep Networks // Proceedings of the 25th International Conference on Machine Learning. ACM, 2008.

Makhzani A., Frey B. k-Sparse Autoencoders // arXiv preprint arXiv: 1312.5663 (2013).

Mikolov T. et al. Distributed Representations of Words and Phrases and their Compositionality // Advances in Neural Information Processing Systems. 2013.

Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // ICLR Workshop, 2013.

https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup.

Google News: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit.

http://leveldb.org/.

http://www.cnts.ua.ac.be/conll2000/chunking/.

Nivre J. Incrementality in Deterministic Dependency Parsing // Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together. Association for Computational Linguistics, 2004.

Chen D., Manning C. D. A Fast and Accurate Dependency Parser Using Neural Networks // EMNLP. 2014.

https://github.com/tensorflow/models/tree/master/syntaxnet.

Andor D. et al. Globally Normalized Transition-Based Neural Networks // arXiv preprint arXiv: 1603.06042 (2016).

Kilian J., Siegelmann H. T. The dynamic universality of sigmoidal neural networks // Information and computation. 1996. Vol. 128. No. 1. Pp. 48–56.

Если длина рецензии меньше 500 слов, то она дополняется символами-заполнителями, как делалось для сетей с прямым распространением сигнала. Прим. науч. ред.

Kiros R. et al. Skip-Thought Vectors // Advances in neural information processing systems. 2015.

она взяла меня за руку

"давай…"

она потрясла спиной в воздухе

"я думаю, мы у тебя… я не могу тебя заставить…"

он снова закрылся

"нет, она будет…"

кириан покачал головой

Bahdanau D., Cho K., Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate // arXiv preprint arXiv:1409.0473 (2014).

Этот код можно найти здесь: https://github.com/tensorflow/tensorflow/tree/r0.7/tensorflow/models/rnn/translate.

Одна из популярных мер оценки качества языковой модели. Перплексия языковой модели на наборе данных — обратная вероятность этого набора, нормализованная по числу слов. Ее можно понимать как коэффициент «ветвления»: сколько в среднем разных токенов может быть после каждого токена в последовательности. Прим. науч. ред.

https://mostafa-samir.github.io/.

Машина Тьюринга — абстрактная вычислительная машина, предложенная Аланом Тьюрингом в 1936 году. Включает неограниченную в обе стороны ленту, разделенную на ячейки, и управляющее устройство с головками чтения и записи данных на нее. Устройство может находиться в одном из множества состояний, заданных заранее. Прим. науч. ред.

Graves A., Wayne G., Denihelka I. Neural Turing Machines // Cornell University, 2014 // https://arxiv.org/abs/1410.5401.

100

Graves A., Wayne G., Reynolds M. et al. Hybrid computing using a neural network with dynamic external memory // Nature, 2016 // http://go.nature.com/2peM8m2.

101

https://github.com/Mostafa-Samir/DNC-tensorflow.

102

http://nicklocascio.com/.

103

Mnih V. et al. Human-level control through deep reinforcement learning // Nature. 2015. Vol. 518. No. 7540. Pp. 529–533.

104

Brockman G. et al. OpenAI Gym // arXiv preprint arXiv:1606.01540 (2016) // https://gym.openai.com//

105

Sutton R. S. et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation // NIPS. 1999. Vol. 99.

106

Sorokin I. et al. Deep Attention Recurrent Q-Network // arXiv preprint arXiv:1512.01693 (2015).