Arxiv Sanity

A great website  to sort out research papers in Machine Learning is arxiv sanity. This was developed by Andrej Karpathy now at Tesla. Look at the introductory video:

You can save papers that you like and look for similar papers as ranked by their tf-idf statistics. What is missing is probably a social score of these papers such as which paper is popular at the moment though there are the top saved papers which can serve as a proxy for popularity.

Natural Language Processing with Deep Learning

Communicating and understanding are usually taken as a sign of intelligence and is part of the Turing test. Indeed, the machine needs to communicate and appear to understand the questions of the interrogator to appear human and pass the test.  Natural Language Processing (NLP) has made great progress in the past 20 years. Stanford has an excellent course (CS224d) on Natural Language Processing with Deep Learning taught by Chris Manning and Richard Socher (now at Salesforce). It is available here:

The course material is also online here.

An interesting criticism of machine translation engines such as Google Translate (and that use techniques taught in these NLP lectures) appears in the article The Shallowness of Google Translate from the Atlantic.

Machine Learning: an Applied Econometric Approach

Susan Athey’s article discussed machine learning and causal inference. The article Machine learning: an applied econometric approach by Harvard Professor Sendhil Mullainathan and Jann Spiess is focused on machine learning as an econometric tool.


Machines are increasingly doing “intelligent” things. Face recognition algorithms use a large dataset of photos labeled as having a face or not to estimate a function that predicts the presence y of a face from pixels x. This similarity to econometrics raises questions: How do these new empirical tools fit with what we know? As empirical economists, how can we use them? We present a way of thinking about machine learning that gives it its own place in the econometric toolbox. Machine learning not only provides new tools, it solves a different problem. Specifically, machine learning revolves around the problem of prediction, while many economic applications revolve around parameter estimation. So applying machine learning to economics requires finding relevant tasks. Machine learning algorithms are now technically easy to use: you can download convenient packages in R or Python. This also raises the risk that the algorithms are applied naively or their output is misinterpreted. We hope to make them conceptually easier to use by providing a crisper understanding of how these algorithms work, where they excel, and where they can stumble—and thus where they can be most usefully applied.

Sendhil also recently gave an interesting talk at the Stanford Center on Global Poverty and Development on applying machine learning to poverty alleviation:

History of Deep Learning

This paper presents a history of deep learning from Aristotle to the present time. The different milestones are summarized in this table:

So it is clear that many ideas date to several decades ago. In the article, the authors conclude:

This paper could serve two goals: 1) First, it documents the major milestones in the science history that have impacted the current development of deep learning. These milestones are not limited to the development in computer science fields. 2) More importantly, by revisiting the evolutionary path of the major milestone, this paper should be able to suggest the readers that how these remarkable works are developed among thousands of other contemporaneous publications. Here we briefly summarize three directions that many of these milestones pursue:

  • Occam’s razor: While it seems that part of the society tends to favor more complex models by layering up one architecture onto another and hoping backpropagation can find the optimal parameters, history says that masterminds tend to think simple: Dropout is widely recognized not only because of its performance, but more because of its simplicity in implementation and intuitive (tentative) reasoning. From Hopfield Network to Restricted Boltzmann Machine, models are simplified along the iterations until when RBM is ready to be piled-u
  • Be ambitious: If a model is proposed with substantially more parameters than contemporaneous ones, it must solve a problem that no others can solve nicely to be remarkable. LSTM is much more complex than traditional RNN, but it bypasses the vanishing gradient problem nicely. Deep Belief Network is famous not due to the fact the they are the first one to come up with the idea of putting one RBM onto another, but due to that they come up an algorithm that allow deep architectures to be trained effectively.
  • Widely read: Many models are inspired by domain knowledge outside of machine learning or statistics field. Human visual cortex has greatly inspired the development of convolutional neural networks. Even the recent popular Residual Networks can find corresponding mechanism in human visual cortex. Generative Adversarial Network can also find some connection with game theory, which was developed fifty years ago.

Coming from the field of economics and game theory, we cannot agree more especially when we read the literature of reinforcement learning (RL) or generative adversarial network. Once we talk about strategic agents with objectives and payoffs to maximize it is very similar to economics. There are differences between economics and machine learning in the approach of solving these problems and we will discuss some research that studies them.

Why Should I Trust You?

A challenge of complex machine learning models is to develop trust in the models. If it is a black box some users might not be feel comfortable using them. Models need to be interpretable, meaning that users should be able to understand how the outputs (predictions) are generated from the inputs (features).

Different approaches have been suggested. A recent one is a technique called Local Interpretable Model-agnostic Explanations (LIME). LIME approximates a model with an interpretable model locally. An interpretable model is a model such as linear models with a limited number of features.

A short video introduces the approach.

You can read the paper here.


Gradient Boosting Machine Learning

Machine learning has a long list of methods to learn from data. Among them is gradient boosting machine learning as taught here by Professor Trevor Hastie from Stanford University.  In this video, he introduces and compares decision trees, bagging, random forests and boosting.

He has authored an excellent book, The Elements of Statistical Learning than you can download here.