Machine Learning Up to Date #18

A comparison of TF Lite NLP models [Source]

Here's ML UTD #18 from the LifeWithData blog! We help you separate the signal from the noise in today's hectic front lines of software engineering and machine learning.

LifeWithData strives to deliver curated machine learning & software engineering updates that point the reader to key developments without superfluous details. This enables frequent, concise updates across the industry without information overload.



What’s new in TensorFlow Lite for NLP

A comparison of TF Lite NLP models [Source]
TensorFlow Lite has been widely adopted in many applications to provide machine learning features on edge devices such as mobile phones, microcontroller units, and Edge TPUs. Among all popular applications that make people’s life easier and more productive, Natural Language Understanding is one of the key areas that attracts much attention from both the research community and the industry. After the [demo](https://youtu.be/zxd3Q2gdArY?t=3000) of the on-device question-answering use case at TensorFlow World in 2019, we got a lot of interest and feedback from the community on making more such NLP models available for on-device inference. Inspired by that feedback, today we are delighted to announce an end-to-end support for NLP tasks based on TensorFlow Lite. With this infrastructure work, more and more NLP models are able to run on mobile phones, and users can enjoy the advantage of NLP models, while keeping their personal data on-device. In this blog, we will introduce the new features that allow: (1) Using new pre-trained NLP Models, (2) Creating your own NLP models, (3) Better support for converting TensorFlow NLP Models to TensorFlow Lite format and (4) Deploying these models on mobile devices.
... keep reading

Dagster: The Data Orchestrator

A block diagram of a Dagster installation [Source]
As machine learning, analytics, and data processing become more complex and central to organizations, improving the software behind them becomes more urgent. Data within organizations is disorganized and not trusted. Engineers and practitioners feel unproductive and mired in drudgery. Collaboration between data scientists, data engineers, analysts, and other roles that build complex data systems is painful. The software that processes and produces data is unreliable and resistant to change. This state of affairs is why Dagster exists, which we first discussed publicly [a year ago](https://medium.com/dagster-io/introducing-dagster-dbd28442b2b7).
... keep reading

Imaginaire: NVIDIA + PyTorch = GANs

A quick look at Imaginaire’s offerings [Source]
Imaginaire is a [pytorch](https://pytorch.org/) library that contains optimized implementation of several image and video synthesis methods developed at [NVIDIA](https://www.nvidia.com/en-us/). We have a tutorial for each model. Click on the model name, and your browser should take you to the tutorial page for the project.
... keep reading

Introducing Dynabench: Rethinking the way we benchmark AI

An overview of the tasks that Dynabench analyzes for [Source]
Benchmarks — from [MNIST](https://l.facebook.com/l.php?u=http%3A%2F%2Fyann.lecun.com%2Fexdb%2Fmnist%2F%3Ffbclid%3DIwAR3Z4KRP3CHZYx133ztDYhNFrgTYMKQXEhUEirgsQIdUsZgSJACgxtpGG3c&h=AT1jsHhbu1UmuJ1ewO-T6zLFBQbzQaLmmAznF9JNxuelTEaofzj-VguOqeLw6cIiCrnRPU85kgUKcUXNU54Ez5IhYN5RxurzvAqwELqSMNMkq_Bs-h2Gz1MI2m-dryiV2t3nWqH-) to [ImageNet](https://l.facebook.com/l.php?u=http%3A%2F%2Fwww.image-net.org%2F%3Ffbclid%3DIwAR24gtjDAzKtJCENA5rxO_u9ss2wDSq89QsfzygLpaO1CyAMy-2ANXmk_A0&h=AT0Ue6XdH85G8sFbhjFBvzUqwUdik7D2z9cYauyneNjambYv8Gpct6m-_pwK24FV0hPQXjXfvlmeeoFlZYvbYgtVrn1y5jNkQ01fWocs76YCH9q_AzYoJhezasEU82dg5PIyftGu) to [GLUE](https://l.facebook.com/l.php?u=https%3A%2F%2Fgluebenchmark.com%2F%3Ffbclid%3DIwAR0LOIxoo6MB5RSJGzFpG_ugnX-gbAOPkon-UzcJw4VX8U1l2CPZZLaU-O4&h=AT0mod4jPPhDbWNnXXhANFCHlOvvVvpL8c32FuzcSZaoKdJR9L5mbAqg8tlsIDi0BuGiu2XlfD_P_DTWsWhN7InerjQOJJVUgY89amoftvvIyYwY_vzQ1hU3lzsSj8i845CM1d1T) — have played a hugely important role in driving progress in AI research. They provide a target for the community to work toward; a common objective to exchange ideas around; and a clear, quantitative way to compare model performance. It is hard to imagine the progress we have made in AI in a world without these shared data sets to focus our efforts. However, benchmarks have been saturating faster and faster — especially in natural language processing (NLP). While it took the research community about 18 years to achieve human-level performance on MNIST and about six years to surpass humans on ImageNet, it took only about a year to beat humans on the GLUE benchmark for language understanding. Researchers in NLP will readily concede that while we have made good progress, we are far from having machines that can truly understand natural language. So something is amiss: While models quickly achieve human-level performance on specific NLP benchmarks, we still are far from AI that can understand language at a human level.
... keep reading

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

The layout of the transformer architecture applied to an image recognition task [Source]
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer can perform very well on image classification tasks when applied directly to sequences of image patches. When pre-trained on large amounts of data and transferred to multiple recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc), Vision Transformer attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
... keep reading
The Rundown

What Can We Learn from Collective Human Opinions on Natural Language Inference Data?

Accuracy on different bins of data points whose entropy values are within specific quantile ranges [Source]
Despite the subjective nature of many NLP tasks, most NLU evaluations have focused on using the majority label with presumably high agreement as the ground truth. Less attention has been paid to the distribution of human opinions. We collect ChaosNLI, a dataset with a total of 464,500 annotations to study Collective HumAn OpinionS in oft-used NLI evaluation sets. This dataset is created by collecting 100 annotations per example for 3,113 examples in SNLI and MNLI and 1,532 examples in αNLI. Analysis reveals that: (1) high human disagreement exists in a noticeable amount of examples in these datasets; (2) the state-of-the-art models lack the ability to recover the distribution over human labels; (3) models achieve near-perfect accuracy on the subset of data with a high level of human agreement, whereas they can barely beat a random guess on the data with low levels of human agreement, which compose most of the common errors made by state-of-the-art models on the evaluation sets. This questions the validity of improving model performance on old metrics for the low-agreement part of evaluation datasets. Hence, we argue for a detailed examination of human agreement in future data collection efforts, and evaluating model outputs against the distribution over collective human opinions.
... keep reading
The Rundown