An anonymous support group for heavy users of gradients and nonlinearities.

Most of our business is conducted in the Google Group. If you're interested in joining, please subscribe!

Fall 2016: Every other Wednesday in GDC 3.816 at 10am. Note that we recently changed to this room since the former room is having A/V issues.

Year | Link / Description |
---|---|

2016 |
Distill: Animated graphics explaining NNs Chris Olah's newer blog for explaining DNN papers. |

2016 |
Deep Learning Summer School 2016 Video Lectures A set of 1-2 hour introductory lectures by impressive people. |

2016 |
The Goodfellow et al. Deep Learning book A nice comprehensive textbook about modern deep learning. |

2015-2016 |
A Udacity MOOC by Vincent Vanhoucke Vincent Vanhoucke from Google did a pretty good MOOC on Deep Learning with TensorFlow. |

2015 |
Yoav Goldberg's "A Primer on Neural Network Models for Natural Language Processing" The lay of the land of NNs for NLP in late 2015. |

2014-2015 |
Course lectures by Nando de Freitas Nando de Freitas (from Oxford and DeepMind) gave some pretty nice lectures about deep learning with Torch. |

2014-2015 |
A great set of blog posts with visualizations for intuition. |

Here are the previous papers we've read in FLARE. This list often grows stale.

Date Read | Published | Title / Suggestor Notes |
---|---|---|

2016-11-30 | 2015 | |

2016-11-16 | 2015 |
Continuous control with deep reinforcement learning Second reading. Slides. |

2016-11-16 | 2015 |
Human-level control through deep reinforcement learning First Reading |

2016-10-26 | 2016 |
WaveNet: A Generative Model for Raw Audio Modern paper |

2016-10-26 | 2016 |
Pixel Recurrent Neural Networks Classic paper |

2016-10-19 | 2015 |
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Modern paper Slides |

2016-10-19 | 2014 |
Classic paper: GANs. Slides |

2016-09-28 | 2015 |
Modern paper: Variational Inference with Normalizing Flows Presented by Wesley Slides |

2016-09-28 | 2013 |
Auto-Encoding Variational Bayes Classic paper: AE VB |

2016-09-14 | 2016 |
Modern paper: Layer Norm |

2016-09-14 | 2015 |
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Classic paper: Batchnorm |

2016-05-12 | 2015 |
Deep Residual Learning for Image Recognition Resnet |

2016-04-29 | 2016 |
SR: Apply BN after *every* timestep in RNN. Works dramatically. |

2016-04-15 | 2016 |
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks SR: they present a deep-Q+LSTM which is able to solve some classic riddles. First combination of deep-Q reinforcement with an LSTM that I've seen |

2016-03-04 | 2015 |
Net2Net: Accelerating Learning via Knowledge Transfer Dinesh: On the lines of the "Model Compression" idea, but now trying to transfer information from a small easy-to-train network to accelerate the training of a larger network |

2016-02-19 | 2015 |
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing Eddie: General NN framework for explicitly representing input as a knowledge base that can be iterated over multiple times. Motivated by question-answering and transitive reasoning. General enough to be instantiated for many different tasks. |

2016-02-05 | 2016 |
Mastering the game of Go with deep neural networks and tree search SR: First AI to beat a professional go player |

2015-02-27 | 2014 |
Teaching Deep Convolutional Neural Networks to Play Go Matthew: Cool looking paper outlining a neural network approach to a game has been dominated by UCT planning. |

2015-02-13 | 2014 |
Do Deep Nets Really Need to be Deep? Karl: An extended abstract (4pp) providing empirical evidence that shallow nets can do as well as deep nets with the same number of parameters |

2014-12-05 | 2014 |
SR: Fresh off the press. |

2014-11-14 | 2010 |
Why Does Unsupervised Pre-training Help Deep Learning? Craig: An in-depth discussion of the title question, with experiments that illuminate what is going on in an intuitive way. The short answer seems to be data-dependent regularization. |

2014-10-31 | 2012 |
Multimodal Deep Learning with Deep Boltzman Machine Aditya: Interesting to read how it combines sparse word frequencies from text with SIFT features from images. Since it builds generative models on text and image features, it can work even when data from one of input sources is missing. |

2014-10-17 | 2013 |
Visualizing and Understanding Convolutional Networks Matthew: Presents a method for visualizing the different layers in convolutional networks. Analyzes and improves upon Krizevsky's Imagenet network. |

2014-10-03 | 2013 |
Leif: Apparently a maxout unit is a piecewise linear combination of its inputs, combining some of the benefits of rectified linear activations with dropouts (?) |

2014-09-19 | 2013 |
Provably Efficient Algorithm for Training Deep Networks Aditya: Proposes a principled way of learning deep polynomial features. It works well on small dataset like MNIST with minimal parameter tuning. I am evaluating this for larger datasets. |

2014-04-18 | 2013 |
Efficient Estimation of Word Representations in Vector Space Aditya: Recent paper from Google folks (along with code) where they constructed high level vector representations of text. These features can help find semantic similarities in words. |

2014-04-04 | 2011 |
Generating text with recurrent neural networks Leif: This paper describes a recurrent network that was trained on character sequences from Wikipedia, and is capable of generating character sequences that have remarkably sophisticated structure. Also interesting because the model includes multiplicative interactions. |

2014-03-21 | 2011 |
Contractive Auto-Encoders: Explicit Invariance During Feature Extraction Craig: Proposes a new way of regularizing autoencoders: an analytical penalty on the Jacobian of the encoder activations wrt the input. This is motivated by and related to the denoising auto-encoder, which has had empirical success. |

2014-02-28 | 2011 |
Multiscale scattering for audio classification Leif: Developed by Stephane Mallat (creator of Matching Pursuit and a wavelet pioneer), this paper describes a "deep" wavelet transform that provides a nice representation for audio data classification. Particularly interesting is the use of features at all levels simultaneously. |

2014-02-14 | 2013 |
Playing Atari with Deep Reinforcement Learning Leif: A deep model is used to map from pixels in atari games to some sort of Q-learning-like reward signal. |

2013-12-06 | 2009 |
A Deep Non-Linear Feature Mapping for Large-Margin kNN Classification Elad: Using deep models to enhance kNN |

2013-11-15 | 2011 |
Craig: Presents an interesting alternative to learning deep probabilistic models: only let the top layer be probabilistic and all lower layers create a deterministic feed-forward neural net. This avoids complicated sampling or variational techniques used to train previous deep probabilistic models (ex. DBN,DBM) |

2013-11-01 | 2010 |
Modeling Pixel Means and Covariances Using Factorized Third-Order Boltzmann Machines Leif: An important addition to the canon of RBM / energy-based models. This model (mcRBM) explicitly captures the mean and the covariance of the visible units, and gives good results on a wide variety of datasets. |

2013-10-18 | 2006 |
A fast learning algorithm for deep belief networks Leif: A foundational paper that opened up the "deep" learning field. This paper puts a nice probabilistic background on deep architectures based on RBMs. |

2013-10-04 | 2013 |
Representation Learning: A Review and New Perspectives Craig: Quite a long one, but a good overview of a field that is changing rapidly by Bengio, one of the leaders in the field. Might be a good one to start off with or have here as background reading. |

- | 2016 |
Karl: Like Neural Turing Machines, but parallelizable |

- | 2015 |
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks Karl: Usually, when we train sequence-decoders, we feed the gold-standard output as the next step's input. This is KIND OF strange since the model doesn't necessarily know how to generate the gold output yet, so there's a mismatch between what the model knows how to predict at each timestep and what it's input at the next timestep. This paper describes a method of training output sequence models in a way that reduces this mismatch. |

- | 2015 |
Equilibrated adaptive learning rates for non-convex optimization Karl: An adaptive learning-rate scheme that takes curvature information into account |

- | 2015 |
A Recurrent Latent Variable Model for Sequential Data Karl: The authors introduce Variational Recurrent Neural Nets, which explicitly model dependencies between latent random variables across timesteps. Think, like, a Kalman filter, but with learned nonlinear dynamics between the hidden states. |

- | 2015 |
Deep Residual Learning for Image Recognition Dinesh: The 152-layer network from MSRA that won (most categories of) the Imagenet challenge in 2015. They introduce a neat trick to training such deep networks, without running into vanishing gradient issues. |

- | 2015 |
The Loss Surfaces of Multilayer Networks Edward: It is shown (under certain assumptions) that local minima do not pose an issue to deep networks because the chance of finding a bad local minima decreases exponentially with the depth of the network. I don't pretend to understand the math here, but I think the result is important. |

- | 2016 |
Adaptive Computation Time for Recurrent Neural Networks Karl: Alex Graves introduces a method of allowing an RNN to dynamically learn to perform a differing number of Iterations, based on the input (instead of performing the exact same structured computation at each input). |

- | 2016 |
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models SV: Folks from DeepMind do image interpretation as inference in a generative model. Given an image you want to identify what are the objects, where they are, relative positions etc. But take a generative approach to this (like DPMs - deformable parts model - but generative.) They use RNNs, attend to one object at a time and learn to use "appropriate number" of inference steps. |

- | 2015 |
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks SR: Suggest using the same dropout noise through all recurrent layers of the mask. They actually justify it theoretically. Emperically it works |

- | 2016 |
SR: Techniques for compressing neural networks, including pruning, quantization, and huffman coding. Accuracy perserved, two orders smaller model, ~5x faster. |

- | 2016 |
Sparse Word Embeddings Using L1 Regularized Online Learning SR: They add L1 regularization to w2v, get more interpretable dimensions. probably not worth reading, but it's nice |