Feature Learning and Representation Engineering

Year	Link / Description
2016	Distill: Animated graphics explaining NNs Chris Olah's newer blog for explaining DNN papers.
2016	Deep Learning Summer School 2016 Video Lectures A set of 1-2 hour introductory lectures by impressive people.
2016	The Goodfellow et al. Deep Learning book A nice comprehensive textbook about modern deep learning.
2015-2016	A Udacity MOOC by Vincent Vanhoucke Vincent Vanhoucke from Google did a pretty good MOOC on Deep Learning with TensorFlow.
2015	Yoav Goldberg's "A Primer on Neural Network Models for Natural Language Processing" The lay of the land of NNs for NLP in late 2015.
2014-2015	Course lectures by Nando de Freitas Nando de Freitas (from Oxford and DeepMind) gave some pretty nice lectures about deep learning with Torch.
2014-2015	Colah's blog A great set of blog posts with visualizations for intuition.

Previously Read Papers

Here are the previous papers we've read in FLARE. This list often grows stale.

Date Read	Published	Title / Suggestor Notes
2016-11-30	2015	End-to-end Memory Networks Slides
2016-11-16	2015	Continuous control with deep reinforcement learning Second reading. Slides.
2016-11-16	2015	Human-level control through deep reinforcement learning First Reading
2016-10-26	2016	WaveNet: A Generative Model for Raw Audio Modern paper
2016-10-26	2016	Pixel Recurrent Neural Networks Classic paper
2016-10-19	2015	Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Modern paper Slides
2016-10-19	2014	Generative Adversarial Nets Classic paper: GANs. Slides
2016-09-28	2015	Modern paper: Variational Inference with Normalizing Flows Presented by Wesley Slides
2016-09-28	2013	Auto-Encoding Variational Bayes Classic paper: AE VB
2016-09-14	2016	Layer Normalization Modern paper: Layer Norm
2016-09-14	2015	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Classic paper: Batchnorm
2016-05-12	2015	Deep Residual Learning for Image Recognition Resnet
2016-04-29	2016	Recurrent Batch Normalization SR: Apply BN after every timestep in RNN. Works dramatically.
2016-04-15	2016	Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks SR: they present a deep-Q+LSTM which is able to solve some classic riddles. First combination of deep-Q reinforcement with an LSTM that I've seen
2016-03-04	2015	Net2Net: Accelerating Learning via Knowledge Transfer Dinesh: On the lines of the "Model Compression" idea, but now trying to transfer information from a small easy-to-train network to accelerate the training of a larger network
2016-02-19	2015	Ask Me Anything: Dynamic Memory Networks for Natural Language Processing Eddie: General NN framework for explicitly representing input as a knowledge base that can be iterated over multiple times. Motivated by question-answering and transitive reasoning. General enough to be instantiated for many different tasks.
2016-02-05	2016	Mastering the game of Go with deep neural networks and tree search SR: First AI to beat a professional go player
2015-02-27	2014	Teaching Deep Convolutional Neural Networks to Play Go Matthew: Cool looking paper outlining a neural network approach to a game has been dominated by UCT planning.
2015-02-13	2014	Do Deep Nets Really Need to be Deep? Karl: An extended abstract (4pp) providing empirical evidence that shallow nets can do as well as deep nets with the same number of parameters
2014-12-05	2014	Neural Turing Machines SR: Fresh off the press.
2014-11-14	2010	Why Does Unsupervised Pre-training Help Deep Learning? Craig: An in-depth discussion of the title question, with experiments that illuminate what is going on in an intuitive way. The short answer seems to be data-dependent regularization.
2014-10-31	2012	Multimodal Deep Learning with Deep Boltzman Machine Aditya: Interesting to read how it combines sparse word frequencies from text with SIFT features from images. Since it builds generative models on text and image features, it can work even when data from one of input sources is missing.
2014-10-17	2013	Visualizing and Understanding Convolutional Networks Matthew: Presents a method for visualizing the different layers in convolutional networks. Analyzes and improves upon Krizevsky's Imagenet network.
2014-10-03	2013	Maxout networks Leif: Apparently a maxout unit is a piecewise linear combination of its inputs, combining some of the benefits of rectified linear activations with dropouts (?)
2014-09-19	2013	Provably Efficient Algorithm for Training Deep Networks Aditya: Proposes a principled way of learning deep polynomial features. It works well on small dataset like MNIST with minimal parameter tuning. I am evaluating this for larger datasets.
2014-04-18	2013	Efficient Estimation of Word Representations in Vector Space Aditya: Recent paper from Google folks (along with code) where they constructed high level vector representations of text. These features can help find semantic similarities in words.
2014-04-04	2011	Generating text with recurrent neural networks Leif: This paper describes a recurrent network that was trained on character sequences from Wikipedia, and is capable of generating character sequences that have remarkably sophisticated structure. Also interesting because the model includes multiplicative interactions.
2014-03-21	2011	Contractive Auto-Encoders: Explicit Invariance During Feature Extraction Craig: Proposes a new way of regularizing autoencoders: an analytical penalty on the Jacobian of the encoder activations wrt the input. This is motivated by and related to the denoising auto-encoder, which has had empirical success.
2014-02-28	2011	Multiscale scattering for audio classification Leif: Developed by Stephane Mallat (creator of Matching Pursuit and a wavelet pioneer), this paper describes a "deep" wavelet transform that provides a nice representation for audio data classification. Particularly interesting is the use of features at all levels simultaneously.
2014-02-14	2013	Playing Atari with Deep Reinforcement Learning Leif: A deep model is used to map from pixels in atari games to some sort of Q-learning-like reward signal.
2013-12-06	2009	A Deep Non-Linear Feature Mapping for Large-Margin kNN Classification Elad: Using deep models to enhance kNN
2013-11-15	2011	Learning Deep Energy Models Craig: Presents an interesting alternative to learning deep probabilistic models: only let the top layer be probabilistic and all lower layers create a deterministic feed-forward neural net. This avoids complicated sampling or variational techniques used to train previous deep probabilistic models (ex. DBN,DBM)
2013-11-01	2010	Modeling Pixel Means and Covariances Using Factorized Third-Order Boltzmann Machines Leif: An important addition to the canon of RBM / energy-based models. This model (mcRBM) explicitly captures the mean and the covariance of the visible units, and gives good results on a wide variety of datasets.
2013-10-18	2006	A fast learning algorithm for deep belief networks Leif: A foundational paper that opened up the "deep" learning field. This paper puts a nice probabilistic background on deep architectures based on RBMs.
2013-10-04	2013	Representation Learning: A Review and New Perspectives Craig: Quite a long one, but a good overview of a field that is changing rapidly by Bengio, one of the leaders in the field. Might be a good one to start off with or have here as background reading.
-	2016	Neural GPUs Learn Algorithms Karl: Like Neural Turing Machines, but parallelizable
-	2015	Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks Karl: Usually, when we train sequence-decoders, we feed the gold-standard output as the next step's input. This is KIND OF strange since the model doesn't necessarily know how to generate the gold output yet, so there's a mismatch between what the model knows how to predict at each timestep and what it's input at the next timestep. This paper describes a method of training output sequence models in a way that reduces this mismatch.
-	2015	Equilibrated adaptive learning rates for non-convex optimization Karl: An adaptive learning-rate scheme that takes curvature information into account
-	2015	A Recurrent Latent Variable Model for Sequential Data Karl: The authors introduce Variational Recurrent Neural Nets, which explicitly model dependencies between latent random variables across timesteps. Think, like, a Kalman filter, but with learned nonlinear dynamics between the hidden states.
-	2015	Deep Residual Learning for Image Recognition Dinesh: The 152-layer network from MSRA that won (most categories of) the Imagenet challenge in 2015. They introduce a neat trick to training such deep networks, without running into vanishing gradient issues.
-	2015	The Loss Surfaces of Multilayer Networks Edward: It is shown (under certain assumptions) that local minima do not pose an issue to deep networks because the chance of finding a bad local minima decreases exponentially with the depth of the network. I don't pretend to understand the math here, but I think the result is important.
-	2016	Adaptive Computation Time for Recurrent Neural Networks Karl: Alex Graves introduces a method of allowing an RNN to dynamically learn to perform a differing number of Iterations, based on the input (instead of performing the exact same structured computation at each input).
-	2016	Attend, Infer, Repeat: Fast Scene Understanding with Generative Models SV: Folks from DeepMind do image interpretation as inference in a generative model. Given an image you want to identify what are the objects, where they are, relative positions etc. But take a generative approach to this (like DPMs - deformable parts model - but generative.) They use RNNs, attend to one object at a time and learn to use "appropriate number" of inference steps.
-	2015	A Theoretically Grounded Application of Dropout in Recurrent Neural Networks SR: Suggest using the same dropout noise through all recurrent layers of the mask. They actually justify it theoretically. Emperically it works
-	2016	Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding SR: Techniques for compressing neural networks, including pruning, quantization, and huffman coding. Accuracy perserved, two orders smaller model, ~5x faster.
-	2016	Sparse Word Embeddings Using L1 Regularized Online Learning SR: They add L1 regularization to w2v, get more interpretable dimensions. probably not worth reading, but it's nice

FLARE

Current Meetings

Deep Learning at UT

Some resources to get a grounding in Deep Learning

Previously Read Papers