Machine Learning Up to Date #30

A workflow showing how feature changes would affect the user experience. [Source]

Here's ML UTD #30 from the LifeWithData blog! We help you separate the signal from the noise in today's hectic front lines of software engineering and machine learning.

LifeWithData strives to deliver curated machine learning & software engineering updates that point the reader to key developments without superfluous details. This enables frequent, concise updates across the industry without information overload.



Designing Data Science Tools at Spotify

A workflow showing how feature changes would affect the user experience. [Source]
Up until recently, the tools Spotify’s data scientists used every day were designed mostly by engineers. There was no one dedicated to looking at the problems data scientists were experiencing holistically. This meant that a lot of the time, the tools were strung together with inefficient hacky workarounds. Throughout the past year, a design team was created to rethink the existing stack and weed out these bad practices. I’m a product designer in the R&D Community at Spotify, and I’ve been working in the data tools space for about a year — which makes me one of the longest-serving designers in the team. I was brought in to pair up with engineering squads working on platforms and experiences for data scientists. Most recently, I helped to create and launch a new data science tool that would expedite insights production, and eliminate those old, inefficient ways of working.
... keep reading

Machine learning is going real-time

A diagram showing the fundamental differences that EDAs have from request-response [Source]
After talking to machine learning and infrastructure engineers at major Internet companies across the US, Europe, and China, I noticed two groups of companies. One group has made significant investments (hundreds of millions of dollars) into infrastructure to allow real-time machine learning and has already seen returns on their investments. Another group still wonders if there’s value in real-time ML. There seems to be little consensus on what real-time ML means, and there hasn’t been a lot of in-depth discussion on how it’s done in the industry. In this post, I want to share what I’ve learned after talking to about a dozen companies that are doing it.
... keep reading

Salesforce: Why We’re Investing (Even More) in Automation

MuleSoft Composer for Salesforce automatically brings in data from disparate apps [Source]
The problem is: businesses aren’t equipped to move at this breakneck pace. Whether they can adapt, and do it quickly, will define success. To make rapid progress towards their digital transformation goals, they need to increase employee speed of work to keep customers happy. That requires smart automation. When automation is done right, it accelerates productivity and empowers employees. It also frees up humans to do the work they’re uniquely meant to do by taking away “the tedious, repetitive steps that are soul sucking, and that people don’t enjoy doing,” says Kucera. “Automation gives people the bandwidth and breathing room to do more interesting, more inspiring, and more valuable work that moves the business forward such as building customer relationships or making hard decisions on what to do next.”
... keep reading

Natural Language Processing Tutorials for Deep Learning

A snapshot of the current contents of the repo [Source]
`nlp-tutorial` is a tutorial repository for whoever is studying NLP(Natural Language Processing) using Pytorch or Tensorflow. Most of the models in NLP were implemented with less than 100 lines of code.
... keep reading
The Rundown

Knocking on Turing’s door: Quantum Computing and Machine Learning

The possible transformations applied on a qubit can be represented by rotations of a Bloch sphere [Source]
Zeros and ones. Bits and pieces. Positives and negatives. And above all, switches, some on and others off. We have all grown accustomed to seeing and using a contemporary computer. Each year, industry behemoths like Intel, AMD, ARM, and NVIDIA, release the next generation of their top-of-the-line silicon, locking horns, and pushing the envelope of the traditional computers that we know today. If we critically evaluate these multitudes of new multi-core CPUs, GPUs, and mammoth compute clusters hosted on the cloud, we will soon realize that faster processors do not necessarily result in increased computational power. Granted, the speed of computation has increased exponentially in the past decades, so has the amount of data we can handle and process. We can store and analyze exabytes of data on the internet, train deep learning models like OpenAI’s GPT-3, and enable the computational intelligence needed to defeat champions and grandmasters at complex games like Go and Chess. But have all these technological advances expanded what we can fundamentally do with computers beyond where we started out with? Or simply put, have we changed our traditional model of computing?
... keep reading
The Rundown

Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

A taxonomy of training-only data poisoning attacks [Source]
As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy human supervision over the data collection process exposes organizations to security vulnerabilities; training data can be manipulated to control and degrade the downstream behaviors of learned models. The goal of this work is to systematically categorize and discuss a wide range of dataset vulnerabilities and exploits, approaches for defending against these threats, and an array of open problems in this space. In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
... keep reading
The Rundown