AI at Amazon Web Services (AWS)

AWS is the cloud services subsidiary of Amazon. It provides many tools and services to develop AI and Machine Learning models on its platform, from data ingestion, data exploration, data transformation, to model training, tuning, optimization, and deployment.

Data ingestion

Amazon Athena

Amazon Athena is a serverless fast, efficient, highly available, durable, and secured database query engine for big data. It is based on Presto, an open-source query engine originally created by Facebook to query its own databases with low latency. It gets data from Amazon S3 in different formats such as CSV, JSON, ORC, Avro, or Parquet using standard SQL queries. It can also execute join queries from JDBC-compliant databases such as MySQL, and other databases such as Amazon Redshift.

Amazon Redshift

Amazon Redshift is a cloud-based data warehouse and a relational database management system. It replaces on-site data warehouses and database systems. It is based on the open-source PostgreSQL project but works very differently as it is focused on very large databases. 

It works with clusters of nodes and slices of nodes to process the SQL queries and retrieve the structured data stored in the nodes. A cluster can have a leader node distributing tasks to the worker nodes and make them work in parallel. Amazon Redshift is highly scalable with added nodes when required and can run very fast queries on petabytes of data. It can be linked to ETL processes and feed analytical workloads (dashboard, visualization, and business intelligence tools) at the enterprise level.

Amazon Kinesis

Amazon Kinesis is a data streaming platform to ingest, process, analyze, and store real-time, high-throughput streaming data. It is based on the open-source project Apache Kafka, initially developed by LinkedIn for its own needs. Streaming data can be video data, transaction data, time-series data, and any data that are produced continuously. Contrary to batch analytics, streaming analytics allow an almost immediate reaction to new events and constantly refreshed data outputs and instances for end-users and customers. It is for instance ideal for price data, fraud detection, and system monitoring data.

Amazon Kinesis offers four capabilities: Kinesis Video Streams for video data captured by cameras, Kinesis Data Streams to capture, process, and store streaming data from multiple sources, Kinesis FireHose for continuous ETL jobs and data transfer to AWS databases, and Kinesis Data Analytics for transforming and analyzing streaming data.  

Data exploration

Amazon SageMaker Notebooks

Amazon SageMaker Notebooks are Jupyter style notebooks that can import, process, and analyze the data from AWS data stores. Usually, only small data samples can be analyzed in a SageMaker Python notebook. If necessary Spark jobs using SparkMagic notebooks and an EMR Spark cluster can be run to process the data or even Redshift or Athena are used directly to explore the data.

Amazon Athena

Since Amazon Athena is a database query engine, it can be used for data exploration like in a normal relational database.

Amazon QuickSight

Amazon QuickSight is a business intelligence tool to create interactive dashboards that can be embedded into websites, analytics reports, emails to share ML insights with the entire organization. It connects seamlessly with all AWS storage and database solutions. It is serverless and therefore scalable, as the number of users grows, it can grow along with them. It allows quick iteration when developing new ML models as results can quickly be shared with all stakeholders.  

AWS Glue

AWS Glue is a serverless extract, transform, and load (ETL) tool to prepare data and identify useful metadata and data transformation from an AWS data lake or data source (Amazon S3, Redshift,..). The metadata and table definitions are stored in an AWS Glue Metadata Catalog. It can load the final data into a data store such as Amazon Redshift. It is built with Apache Spark and generates ETL and visualization and automatic ETL modifiable code in Scala or Python.   

Data preparation

Amazon SageMaker Processing Jobs

Amazon SageMaker provides Notebooks that a user can use to write Python scripts and access the standard data science and machine learning libraries (Pandas, Matplotlib, Seaborn, Sklearn, TensorFlow..). Athena and Redshift can also be accessed through these notebooks thanks to the Athena client library (PyAthena) and SQL libraries (SQLAlchemy). Complex queries can be sent directly from the notebooks.  

Amazon SageMaker Processing is used when the whole production data needs to be processed and transformed into useful features at scale. The type and the number of instances need to be defined to perform the processing step.

Amazon Elastic MapReduce (EMR)

Amazon EMR is a scalable data processing engine built on Hadoop or Apache Spark. Apache Spark is a very popular distributed processing and analytics engine for big data. Workloads are automatically deployed to clusters and nodes. A SageMaker Notebook can run Spark commands and process data on a Spark cluster. The data can be analyzed and tested with the Amazon DeeQu API. Data can be tested for missing or Null values, range, correct formatting, completeness, uniqueness, consistency, size, correlation, etc..

Model training

Amazon SageMaker Notebooks

Amazon SageMaker Notebooks can use standard machine learning libraries such as Scikit-Learn, TensorFlow, MXNet, or PyTorch to transform the data, do feature engineering, split the data, and train the models on samples. The libraries are accessed by loading containers with pre-defined environments, through scripts, or customized containers. 

Some objective metrics such as accuracy have to be defined to evaluate the model performance. Model hyperparameters and parameters can be saved to be examined for model review and evaluation.

Amazon SageMaker Training Jobs Debugger

Amazon SageMaker Training Jobs Debugger uses rules to check for issues such as overfitting, data imbalance, or vanishing gradients. If the rules are triggered, the training stops to allow debugging of the model and inspection of intermediary steps and objects. 

Model tuning and optimization

Amazon SageMaker Hyper-Parameter Optimizer

Amazon SageMaker Hyper-Parameter Optimizer can find the best hyperparameters within some ranges to optimize some objective metrics using different methods such as grid search, random search, or Bayesian optimization. 

Amazon SageMaker AutoPilot

Amazon SageMaker AutoPilot is the AutoML tool of SageMaker. It analyzes the raw data and the target to be predicted. It chooses the best algorithm candidates, processes the data to create the best features, and automatically trains and tunes the models. The best hyperparameters are automatically selected for each algorithm. 

Amazon SageMaker Experiment Tracking

Amazon SageMaker Experiments track the multiple model runs and provide auditability, traceability, and reproducibility of these runs. Data, parameters, hyperparameters, models can be accessed historically to review and reproduce feature engineering, training, tuning, and deployment results. Each experiment includes trials, and each trial includes steps, each step includes tracking information. Versioning and lineage are kept across all the trials.

Model deployment

Amazon SageMaker Model Endpoints

Amazon SageMaker Model Endpoints allow the user to interface with a model to get inference results on production data. It requires the location of the data and model artifacts (e.g. an S3 bucket), the container of the model, and some parameters and compute resource configurations to get inferences from the model. Different variants of the model can be requested to run in parallel. Endpoints are accessed through REST APIs.  

Amazon SageMaker Model Monitoring

Amazon SageMaker Model Monitoring is used for monitoring the model and identifies any deviations from a baseline. A baseline is created from the training data using a tool such as Amazon DeeQu in Apache Spark. Model Monitoring captures the data and model inference results and checks that all the constraints are verified, if not, Amazon CloudWatch gets triggered and sends warnings about the deviation. Amazon CloudTrails will save all the model logs to perform model reviews and debugging. 

Amazon SageMaker A/B Tests

A/B Tests are used to improve production models and test models and hypotheses on production data. Amazon SageMaker A/B Tests can be performed using Endpoints. Different training data, model versions, compute resource configurations can be tested with Amazon SageMaker Model Endpoints. After reviewing the different model results, an improved model can be selected and replace the current one.

Amazon SageMaker Canary Rollouts

With Amazon SageMaker Canary Rollouts, a new model with a different production variant than the current model can be deployed through Endpoints to a limited number of customers and progressively be expanded to more customers if the model performance is satisfactory. 

Amazon SageMaker Batch inference

Amazon SageMaker Batch inference is an alternative to Endpoints if real-time results are not necessary. Amazon SageMaker reads the batch data from an S3 bucket location, runs inference from a model, and delivers the results to another S3 bucket location.  

Model Pipeline

AWS Step Functions

No alt text provided for this image

Figure 1. AWS Step Functions. Source: Amazon

AWS Step Functions is an orchestration tool to coordinate the tasks of a machine learning workflow such as processing the data and running AWS Lambda functions or pre-trained models. It can be used for extract, transform, and load (ETL) processes, for breaking down complex machine learning codebase and makes it more modular, for coordinating batch processing jobs, for triggering events and notifications. AWS Step Functions is presented through a visual workflow graph. 

Amazon EventBridge

No alt text provided for this image

Figure 2. AWS Event Bridge. Source: Amazon

 Amazon EventBridge connects events (changes of states) to workflows. The events can come from SaaS applications (Datadog, OneLogin, PagerDuty, Savyint, Segment, SignalFX, SugarCRM, Symantec, Whispir, and Zendesk), customized applications, or AWS Services. They trigger workflows that can include connecting to applications, microservices or databases, AWS Lambda functions, and other AWS applications, or communicating results. 

AI at Tencent Holdings

Figure 1. WeChat payment

Tencent Holdings (“Tencent”) is a technology conglomerate firm based in China. It offers products and services in consumer internet, online gaming, social networks, media and entertainment, fintech, and cloud.  Its most well-known products are QQ, an instant messaging app for teenagers, and WeChat (Weixin in mainland China), a mobile messaging app that offers also other services such as digital payment, peer-to-peer payment, shopping, and games. 

WeChat Pay is the digital payment service of WeChat. WeChat also includes mini-programs which are apps within WeChat developed for third-party businesses. WeChat Pay competes directly with AliPay of Ant Financial. WeChat Pay can be used in-store at point-of-sales with a WeChat Pay barcode or merchant QR code, on websites, on mobile apps, on WeChat official merchant accounts, or mini-programs hosted in WeChat. A WeChat Pay account is most commonly linked to payment cards and today can be linked to an international credit card such as Visa, Mastercard or American Express.  

Like AliPay, WeChat Pay offers wealth management services such as savings and investment products through its platform LiCaitong and is partnering with banks, mutual fund, and wealth management bank subsidiaries or companies including Blackrock. 

No alt text provided for this image

Figure 2. WeChat Pay on mobile phone

Reasons to invest in AI

WeChat has more than 1.1 billion users and WeChat Pay has more than 900 million users. Tencent’s business is all digital and consumer-oriented. Given its size, it needs to leverage AI to support its products and services at scale. IT and cloud infrastructure management, customer support and enhanced customer engagement, payment fraud prevention and detection, digital content management and monitoring, product innovation, all require advanced AI to grow.

AI initiatives

Tencent has three labs dedicated to AI: Tencent AI Lab, Youtu Lab, and WeChat AI.  Tencent AI Lab is focused on fundamental research. Youtu Lab is developing applications in image processing, pattern recognition, and deep learning. WeChat AI is focused on Speech Recognition, Natural Language Processing, Computer Vision, Data Mining, and Machine Learning for WeChat.

Tencent also invests in many AI accelerators and AI startups. It has invested in over 800 companies and 70 have gone public. It has an office in Palo Alto, CA to invest in non-Chinese startups. It has invested inTesla, Spotify, and Snap.

Tencent is also involved in agriculture, healthcare, industry, and manufacturing applications of AI. 

Tencent AI Lab

Social AI

Social AI aims at developing better interactions between humans and machines. For instance, the lab has developed a smart chat application using natural language processing and understanding. The chat can be customized and used by businesses on the WeChat App or other platforms to interact with their customers.

Game AI

Game AI facilitates the interaction between the real world and the virtual world of games and continuously enhances the players’ game experience. It supports the numerous online games offered by Tencent and its partners (Riot Games’ League of Legends, Epic Games’ Fortnite,BlueHole’s PlayerUnknown’s Battlegrounds).  It has recently developed an AI player, named Wukong AI which learned how to play games such as Honor of Kings through reinforcement learning, the same way AlphaGo of DeepMind learned to play Go (Tencent has its own AI go player named Fine Art). Humans can play against Wukong AI and average players have difficulty beating it at higher levels. 

Content AI

Content AI focuses on search, personalized recommendation, and content generation for its users. It improves the contents and recommendation of online video subscription services (drama series, anime series, variety shows, and short videos in the Weishi app), music (paid streaming music), reading subscription platforms (Weixin Reading app), and news (WeChat Moments newsfeed). 

Platform AI

Platform AI provides tools to develop AI applications using OCR, machine translation, conversation bot, speech recognition, natural language processing, sentiment analysis, computer vision, human body and face recognition, image and video processing and enhancement.  

Intelligent Titanium Machine Learning is a one-stop cloud-based machine learning for machine learning engineers and data scientists to perform model training, evaluation, and prediction. Tencent Yunzhi Tianshu Artificial Intelligence Service Platform is an AI Platform service to deploy AI applications in enterprises. It connects edge devices, AI algorithms, and data through data connectors. 

Youtu Lab

Youtu Lab specializes in computer vision and offers different applications in policing, person search and identification, vehicle traffic control and monitoring, face verification, graphical content monitoring, and censoring. 

WeChat AI

WeChat AI supports all the applications of AI on the WeChat platform. They include voice recognition, usage of image scanning QR code, machine translation, chatbots to entertain users, music/TV and voice lock security. It uses speech recognition and audio processing, natural language processing, image and video processing, data mining and text understanding, and distributed machine learning.

Challenges

The most important challenges include government regulation, reputation, and competition risks. 

Tencent is exposed to a lot of regulatory and compliance risks as the consumer internet and AI are becoming more scrutinized in most countries including China. Privacy, data protection, consumer protection laws apply to Tencent in its social networks and gaming activities. Another set of laws and regulations in the financial sector such as banking laws, investor protections, financial regulations and compliance, and risk regulations apply to Tencent’s activities.

The Chinese government seems to have some control over WeChat and could present some potential risk for Tencent’s international activities. The US government is for instance attempting to ban WeChat for American users because it might expose them to some security risk. 

Internet and gaming activities can sometimes be perceived as damaging for humans if it leads to psychological problems such as addiction especially among young customers and Tencent has to be careful at evaluating the social impact of its businesses. Furthermore, some activities in AI such as policing or surveillance can be controversial in some countries and present some reputation risk for Tencent. 

Business competition is another challenge as consumers can change their behaviors and adopt new platforms, new products and services offered by other firms. If Tencent does not keep up with innovation, it might lose users and market share. In fintech, Ant Financial with AliPay is for instance a significant competitor. Tencent is very dominant in gaming but consumer taste can change quickly and large investments are required to keep up with the latest technologies such as augmented reality (AR) and virtual reality (VR).

AI at Netflix

Netflix is the largest video streaming service company in the world, present in 190 countries and serving around 195 million customers. It has annual revenues of close to 20 billion dollars and a market capitalization of over 200 billion dollars. It started in 1997 with DVD rentals and sales by mail and started to video streaming in 2007. Netflix is available on many platforms including TVs, phones, and tablets. Netflix is also involved in the production of original content and in movie production with Netflix Studio. 

Reasons to Invest in AI

Netflix is mostly a digital company with its infrastructure run in the cloud with AWS. It streams billions of hours of content every month in many countries and many languages, it collects a large amount of data from its users and thrives to provide them with real-time recommendations based on viewing and preferences. Its objective is to keep its users watching the most enjoyable shows on its platform. It needs AI to operate at this scale. 

AI initiatives

On its excellent blog, Netflix describes how AI and machine learning are used in different areas of its business.

Personalization and Recommendations

Netflix needs to help its customers find contents to watch on its platform. A customer can watch a film she enjoys but then will be looking to find another one to watch with maybe the same theme (action, romantic comedy, science-fiction..), same director, or same cast.   

Each user has a personalized page with recent views, trends, and recommendations by category, as well as original Netflix content. Everything on the page is customized to the viewer including the suggested categories, films or series, and their even visuals. The image representing the film can show a particular actor or graphic that will attract the attention of the viewer. 

Netflix uses several machine learning algorithms to select the content to show on the user home age. In particular, it is using A/B testing and contextual bandits. It is running experiments in real-time of different page configurations and collects information on which configuration is getting the most clicks. It knows which film the user is ending up watching and knows if the user has watched it to the end. It is mixing predictions based on the user’s characteristics, preferences, and history with more randomized suggestions to uncover more information on the user’s preferences. 

Content and Studio

Netflix has to constantly purchase the rights or produce new content for its platform. For TV series, it will often agree to stream the full season without a pilot. It also needs to know how much to invest in new productions. It is using predictive modelling to forecast the demand for new shows. It looks for instance at similar shows, the similarity being measured by some distance between show attributes. Because it has detailed information on shows which have been popular and have found an audience it knows with some probability which new show will be successful.

Netflix is producing its own movies with Netflix Studio. It has optimized the movie creation life-cycle from pre-production, planning, scheduling, production, post-production to marketing using data science and machine learning. For instance, scheduling is treated as an optimization with constraints problem. Given the availability of the film crew, director, actors, location it can generate an optimal schedule in a very short time. It also chooses which film to produce and how much to invest in each film given the likelihood that it will attract sufficient viewers on its platform. Netflix has borrowed 20 billion dollars to finance its original productions.

Streaming

Streaming is a technical challenge as Netflix is using over a third of the national internet bandwidth in the US. It has to monitor the quality of the streaming experience for each individual user who is at a different location, has a specific device, specific bandwidth, and specific internet provider. Even before a new content is streamed, Netflix is controlling its quality and tries to predict if some content will have quality issues. 

Marketing and Sales

Marketing messages are individualized so that they are more likely to convince non-subscribers to sign up. Netflix has to choose the marketing channel such as YouTube or Facebook or others and what content to show to a potential new member. It is using causal modelling to evaluate the effectiveness of its marketing spending. 

Challenges

A challenge for Netflix is to keep licensing and producing attractive content for its customer base. If tastes change, its models have to capture them and quickly recommend appropriate new content. Netflix competes for customer attention and have to compete with other activities such as VR video games or social networks. Netflix is not paying for the internet infrastructure per se but if it continues to be a significant user of the national bandwidth it might be asked to pay for it or to reduce the quality of its video streaming.

AI at DBS Bank (Singapore)

Figure 1. DBS Bank Marina Bay

DBS Bank is the largest bank in Singapore and Southeast Asia with an international presence in China, Taiwan, Hong Kong, India, Indonesia. It operates in consumer banking, wealth management (10.8 million customers in 2019), and institutional and SME banking (240,000 customers) across 18 markets globally.

It started a digital transformation process in 2014 to modernize its business operations and become a fully digital bank. It has since then received many awards as the best digital bank and the best bank.  

Reasons to invest in AI

It is not clear when AI became prevalent at DBS Bank but it has embraced digital transformation, the cloud, data, and analytics very early on to become more competitive and disrupt itself before being disrupted by competition from other banks, fintech companies, and foreign tech conglomerates such as Alibaba. In its early years, DBS Bank had a reputation for poor customer service.

Among the Singaporean banks, it offers more mobile apps (Figure 2) and has adopted mobile and digital banking to acquire, retain, and engage its customer base. 

No alt text provided for this image

Figure 2. DBS mobile apps

Analytics and AI initiatives 

DBS Bank has numerous initiatives that leverage AI and analytics:

Digital payments

DBS Bank owns DBS PayLah!, a digital wallet used by its 1.6 million customers to make payments in stores, pay bills, order meals online, book shows, travels, taxis and make transfers to other users. It has many platform partners and uses its insights on its users for cross-marketing initiatives. 

Contextualized marketing

DBS Bank uses contextualized marketing to sell products to its customers. It calls it hyper-personalization and is very similar to recommendation systems (for products or ads) seen in other industries. This kind of personalized service used to be available to high-net-worth individuals in private banking but can now be offered to all its clients thanks to technology.

Sentiment analysis

DBS Bank uses sentiment analysis to understand its clients better and address their needs and requests. This lowers the cost of customer support and increases customer satisfaction. Sentiment analysis leverages recent progress in Natural Language Processing by identifying positive and negative keywords and sentences in text and speech.

Algorithmic credit underwriting

In India and Indonesia, DBS Bank uses data-driven algorithmic credit underwriting models to approve small ticket-size loans to individuals through their mobile phones. These markets are much larger than Singapore and DBS Bank has to rely on automation and algorithms to service such markets. Mobile phones are also key to success because large shares of the population are under-banked. 

Credit risk assessment and monitoring

DBS Bank is developing automation and data-driven capabilities for credit risk assessment and monitoring of the credit-sensitive assets in its portfolios and help reduce their downgrade risk. 

DBS Bank is also building a credit platform for its Institutional Banking Group to manage and modernize its credit workflow. It has rolled out the platform to several regions in Asia.

Wealth management

In Wealth Management, DBS Bank is using robo-advisors with human advice with its DBS digiPortfolio product. It is offering customized market research and insights on its DBSiWealth platform.

Financial crime

DBS Bank is using artificial intelligence models to manage financial crime risks through dashboards, advanced customer and counterparty network monitoring, and priority ranking of financial crime risk.

Platform operating model

DBS Bank has used a Platform Operating Model strategy in 2018. These platforms let business and technology collaborate on common projects, share data, models and analytical tools, predictive analytics and workflow processes. DBS Bank has deployed over 33 such platforms across its business.

Call centers

DBS Bank is using AI models in its call centers in Singapore and India to predict customer issues and route the calls more efficiently and address the issues automatically.

APIs

DBS Bank has adopted builts APIs in real estate, education, healthcare, insurance, transport, logistics, and e-commerce sectors to connect with its ecosystem partners and cross-sell its services to their shared customers using contextualized marketing.

Challenges

As a financial institution, DBS Bank is exposed to credit and financial risk, financial crime risk, data governance and protection risk, cybersecurity risk, regulatory and reputational risk. As it expands its digital footprint, cybersecurity and data protection are becoming fundamental to the credibility of its digital operations.

Artificial Intelligence is the new PC

Artificial Intelligence is making breakthrough advancements in image recognition, natural language processing, robotics, and machine learning in general. The company DeepMind, a leader in AI research, has created AlphaZero in 2018, an AI program that has reached superhuman performance in many games including the game of Go, and more recently in 2020, AlphaFold that solved a protein folding problem that has preoccupied researchers for 50 years. For Stanford Professor Andrew Ng, AI is the new electricity.

A more useful comparison would be the advent of the personal computer, in particular, the IBM PC in 1981, and its first killer application, the spreadsheet software Lotus 123. IBM didn’t introduce the first computer. Home computers were already available for hobbyists since 1977 from companies such as Commodore, Tandy, and Apple. The Apple II with Visicalc was especially already very popular but the IBM PC was the first affordable personal computer enthusiastically adopted by the business community.  


Figure 1. IBM PC

Figure 2. Lotus 123

The novel spreadsheet software allowed flexible free-form calculations, the automation of calculations, the use of custom functions, graphics, references, and data management. Excel, the dominant spreadsheet software is still in use more than thirty years after its first introduction (with more functionalities). Before the spreadsheets, people used calculators and reported results on paper. More intensive calculations were done with mainframe computers in a language such as FORTRAN and results were printed on paper.

Today, AI is the new PC. Not adopting AI is like forgoing the PC in 1981. The impact is already very profound among the native digital companies and should be as significant for the rest of the companies.

Today, business leaders need to think about an AI strategy as they have to think about their information technology strategy. Like the PC and the spreadsheet, they should expect all their employees to become at some point users of AI at work. As the home computer, AI is already present at home with personal assistants such as Amazon Alexa, on phones with Apple Siri, and the internet with Google. All these AI applications are now possible thanks to increasing computer power, the development of the cloud, the availability of big data, and the new machine deep learning paradigm.

The AI Strategy Handbook was written to help you adopt AI in your business strategy so that it creates a long-term sustainable competitive advantage for your customers, your company, your employees, and your investors.

Building Machines that Learn and Think Like People

MIT Prof. Josh Tenenbaum gave a talk on Building Machines that Learn and Think Like People at the ICML 2018. His insight is that it is possible to teach a machine to learn like a child by using:

  • Game engine intuitive physics
  • Intuitive psychology
  • Probabilistic programs
  • Program induction
  • Program synthesis

This agenda is more ambitious that the current state of machine learning though it resembles more old-style machine learning an there is no guarantee that it will succeed. Still it is refreshing that we can learn from young humans to teach machines.

Imitation Learning

At the ICML 2018 conference there was a very interesting tutorial on Imitation Learning by Yisong Yue and Hoang Le from CalTech. It is quite similar to Reinforcement Learning but with an expert that the machine wants to imitate by inferring a policy that links states to actions. Imitation Learning can be applied to sequential decision making problem made by humans or other algorithms.

There are different categories of Imitation Learning:

  • Behavioral Cloning which is supervised learning on the state-action pairs of the expert
  • Direct Policy Learning (Interactive Imitation Learning) with interaction with an expert
  • Inverse Reinforcement Learning which is reinforcement learning applied to an inferred reward function from demonstrations

Direct Policy Learning can use Sequential Learning Reduction algorithms such as Data Aggregations (DAgger) and Policy Aggregations (SEARN & SMILe).

According to the presenters Imitation Learning seems to be easier to implement that Reinforcement Learning. A limitation is that the machine cannot do better than the expert. The talk is here:

 

Reproducibility, Reusability, and Robustness in Deep Reinforcement Learning

Mc Gill Professor Joelle Pineau has an insightful presentation on reproducibility in machine learning and especially in deep reinforcement learning. This is a general trend in science that some results sometimes cannot be fully reproduced. In deep reinforcement learning, there is a stochastic component to the results such as the present value of future rewards. She observes that results can vary for reasons that should not matter such as picking up a random seed (to generate random variables) and that the implementation of base cases by different researchers can yield different outcomes. Making the code and the data available for other researchers to reproduce paper results could alleviate some of these problems. She has introduced the Reproducibility Challenge that could be adopted by other scientific conferences.

Is Interpretability Necessary for Machine Learning?

At the NIPS 2017 conference there was a fascinating debate on the necessity of interpretability in machine learning. Without interpretability, mistakes can be made for instance when correlation is just used as a proxy for causation as Rich Caruana illustrates with a medical example. Yann Lecun on the other hand thinks that it is not necessary, it just needs to work. According to Lecun, people are not really interested in looking into the intimate details of a machine learning model, they just want their models to work. Kilian Weinberger argues that between an interpretable model with high error rate and a non-interpretable model with low error rate, people would choose the latter.

Interpretability is closer to what an economist would require, having a model that can be explained with model parameters that can be estimated and interpreted. If the model cannot be explained there is always a risk of capturing spurious correlations, having an omitted variable bias (when an important explanatory variable is missing) or endogeneity problems (when an explanatory variable is correlated with the error term) and really be wrong. At the same time, for real life applications such as forecasting or medical diagnostics, model accuracy is probably more important.

Without interpretability, there is a risk that a machine learning model will make mistakes that a human with “common sense” would not make. A more serious risk that the actual error rate will be higher on real world data when it is deployed in production (The model is wrong). Economists use interpretable economic models to limit this risk. In the current state of machine learning, there seems to be a trade-off between interpretability and accuracy (or effectiveness). Some promising approaches have been suggested to make machine learning models more interpretable for instance by approximating them by simpler local models such LIME. You can read this post.

More rigorous testing of the models can also be used and sometimes confronting the models with “common sense” or the current state of knowledge of the field can be useful.  In domains where machine ave reached superhuman skills (think of Alpha Go) the latter approach might not however be possible.

We encourage you to follow this debate: