An anonymous support group for heavy users of gradients and nonlinearities.

Current Meetings

Most of our business is conducted in the Google Group. If you're interested in joining, please subscribe!

Fall 2016: Every other Wednesday in GDC 3.816 at 10am. Note that we recently changed to this room since the former room is having A/V issues.

Deep Learning at UT

Here is a list of some of the Deep Learning work going on within the AI Lab. A buch of Kristen Grauman's lab uses NNs for stuff too, as does the NNRG.

Some resources to get a grounding in Deep Learning

Year Link / Description

Distill: Animated graphics explaining NNs

Chris Olah's newer blog for explaining DNN papers.


Deep Learning Summer School 2016 Video Lectures

A set of 1-2 hour introductory lectures by impressive people.


The Goodfellow et al. Deep Learning book

A nice comprehensive textbook about modern deep learning.


A Udacity MOOC by Vincent Vanhoucke

Vincent Vanhoucke from Google did a pretty good MOOC on Deep Learning with TensorFlow.


Yoav Goldberg's "A Primer on Neural Network Models for Natural Language Processing"

The lay of the land of NNs for NLP in late 2015.


Course lectures by Nando de Freitas

Nando de Freitas (from Oxford and DeepMind) gave some pretty nice lectures about deep learning with Torch.


Colah's blog

A great set of blog posts with visualizations for intuition.

Previously Read Papers

Here are the previous papers we've read in FLARE. This list often grows stale.

Date Read Published Title / Suggestor Notes
2016-11-30 2015

End-to-end Memory Networks


2016-11-16 2015

Continuous control with deep reinforcement learning

Second reading. Slides.

2016-11-16 2015

Human-level control through deep reinforcement learning

First Reading

2016-10-26 2016

WaveNet: A Generative Model for Raw Audio

Modern paper

2016-10-26 2016

Pixel Recurrent Neural Networks

Classic paper

2016-10-19 2015

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Modern paper Slides

2016-10-19 2014

Generative Adversarial Nets

Classic paper: GANs. Slides

2016-09-28 2015

Modern paper: Variational Inference with Normalizing Flows

Presented by Wesley Slides

2016-09-28 2013

Auto-Encoding Variational Bayes

Classic paper: AE VB

2016-09-14 2016

Layer Normalization

Modern paper: Layer Norm

2016-09-14 2015

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Classic paper: Batchnorm

2016-05-12 2015

Deep Residual Learning for Image Recognition


2016-04-29 2016

Recurrent Batch Normalization

SR: Apply BN after *every* timestep in RNN. Works dramatically.

2016-04-15 2016

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

SR: they present a deep-Q+LSTM which is able to solve some classic riddles. First combination of deep-Q reinforcement with an LSTM that I've seen

2016-03-04 2015

Net2Net: Accelerating Learning via Knowledge Transfer

Dinesh: On the lines of the "Model Compression" idea, but now trying to transfer information from a small easy-to-train network to accelerate the training of a larger network

2016-02-19 2015

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

Eddie: General NN framework for explicitly representing input as a knowledge base that can be iterated over multiple times. Motivated by question-answering and transitive reasoning. General enough to be instantiated for many different tasks.

2016-02-05 2016

Mastering the game of Go with deep neural networks and tree search

SR: First AI to beat a professional go player

2015-02-27 2014

Teaching Deep Convolutional Neural Networks to Play Go

Matthew: Cool looking paper outlining a neural network approach to a game has been dominated by UCT planning.

2015-02-13 2014

Do Deep Nets Really Need to be Deep?

Karl: An extended abstract (4pp) providing empirical evidence that shallow nets can do as well as deep nets with the same number of parameters

2014-12-05 2014

Neural Turing Machines

SR: Fresh off the press.

2014-11-14 2010

Why Does Unsupervised Pre-training Help Deep Learning?

Craig: An in-depth discussion of the title question, with experiments that illuminate what is going on in an intuitive way. The short answer seems to be data-dependent regularization.

2014-10-31 2012

Multimodal Deep Learning with Deep Boltzman Machine

Aditya: Interesting to read how it combines sparse word frequencies from text with SIFT features from images. Since it builds generative models on text and image features, it can work even when data from one of input sources is missing.

2014-10-17 2013

Visualizing and Understanding Convolutional Networks

Matthew: Presents a method for visualizing the different layers in convolutional networks. Analyzes and improves upon Krizevsky's Imagenet network.

2014-10-03 2013

Maxout networks

Leif: Apparently a maxout unit is a piecewise linear combination of its inputs, combining some of the benefits of rectified linear activations with dropouts (?)

2014-09-19 2013

Provably Efficient Algorithm for Training Deep Networks

Aditya: Proposes a principled way of learning deep polynomial features. It works well on small dataset like MNIST with minimal parameter tuning. I am evaluating this for larger datasets.

2014-04-18 2013

Efficient Estimation of Word Representations in Vector Space

Aditya: Recent paper from Google folks (along with code) where they constructed high level vector representations of text. These features can help find semantic similarities in words.

2014-04-04 2011

Generating text with recurrent neural networks

Leif: This paper describes a recurrent network that was trained on character sequences from Wikipedia, and is capable of generating character sequences that have remarkably sophisticated structure. Also interesting because the model includes multiplicative interactions.

2014-03-21 2011

Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Craig: Proposes a new way of regularizing autoencoders: an analytical penalty on the Jacobian of the encoder activations wrt the input. This is motivated by and related to the denoising auto-encoder, which has had empirical success.

2014-02-28 2011

Multiscale scattering for audio classification

Leif: Developed by Stephane Mallat (creator of Matching Pursuit and a wavelet pioneer), this paper describes a "deep" wavelet transform that provides a nice representation for audio data classification. Particularly interesting is the use of features at all levels simultaneously.

2014-02-14 2013

Playing Atari with Deep Reinforcement Learning

Leif: A deep model is used to map from pixels in atari games to some sort of Q-learning-like reward signal.

2013-12-06 2009

A Deep Non-Linear Feature Mapping for Large-Margin kNN Classification

Elad: Using deep models to enhance kNN

2013-11-15 2011

Learning Deep Energy Models

Craig: Presents an interesting alternative to learning deep probabilistic models: only let the top layer be probabilistic and all lower layers create a deterministic feed-forward neural net. This avoids complicated sampling or variational techniques used to train previous deep probabilistic models (ex. DBN,DBM)

2013-11-01 2010

Modeling Pixel Means and Covariances Using Factorized Third-Order Boltzmann Machines

Leif: An important addition to the canon of RBM / energy-based models. This model (mcRBM) explicitly captures the mean and the covariance of the visible units, and gives good results on a wide variety of datasets.

2013-10-18 2006

A fast learning algorithm for deep belief networks

Leif: A foundational paper that opened up the "deep" learning field. This paper puts a nice probabilistic background on deep architectures based on RBMs.

2013-10-04 2013

Representation Learning: A Review and New Perspectives

Craig: Quite a long one, but a good overview of a field that is changing rapidly by Bengio, one of the leaders in the field. Might be a good one to start off with or have here as background reading.

- 2016

Neural GPUs Learn Algorithms

Karl: Like Neural Turing Machines, but parallelizable

- 2015

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Karl: Usually, when we train sequence-decoders, we feed the gold-standard output as the next step's input. This is KIND OF strange since the model doesn't necessarily know how to generate the gold output yet, so there's a mismatch between what the model knows how to predict at each timestep and what it's input at the next timestep. This paper describes a method of training output sequence models in a way that reduces this mismatch.

- 2015

Equilibrated adaptive learning rates for non-convex optimization

Karl: An adaptive learning-rate scheme that takes curvature information into account

- 2015

A Recurrent Latent Variable Model for Sequential Data

Karl: The authors introduce Variational Recurrent Neural Nets, which explicitly model dependencies between latent random variables across timesteps. Think, like, a Kalman filter, but with learned nonlinear dynamics between the hidden states.

- 2015

Deep Residual Learning for Image Recognition

Dinesh: The 152-layer network from MSRA that won (most categories of) the Imagenet challenge in 2015. They introduce a neat trick to training such deep networks, without running into vanishing gradient issues.

- 2015

The Loss Surfaces of Multilayer Networks

Edward: It is shown (under certain assumptions) that local minima do not pose an issue to deep networks because the chance of finding a bad local minima decreases exponentially with the depth of the network. I don't pretend to understand the math here, but I think the result is important.

- 2016

Adaptive Computation Time for Recurrent Neural Networks

Karl: Alex Graves introduces a method of allowing an RNN to dynamically learn to perform a differing number of Iterations, based on the input (instead of performing the exact same structured computation at each input).

- 2016

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

SV: Folks from DeepMind do image interpretation as inference in a generative model. Given an image you want to identify what are the objects, where they are, relative positions etc. But take a generative approach to this (like DPMs - deformable parts model - but generative.) They use RNNs, attend to one object at a time and learn to use "appropriate number" of inference steps.

- 2015

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

SR: Suggest using the same dropout noise through all recurrent layers of the mask. They actually justify it theoretically. Emperically it works

- 2016

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

SR: Techniques for compressing neural networks, including pruning, quantization, and huffman coding. Accuracy perserved, two orders smaller model, ~5x faster.

- 2016

Sparse Word Embeddings Using L1 Regularized Online Learning

SR: They add L1 regularization to w2v, get more interpretable dimensions. probably not worth reading, but it's nice