Machine Learning Up to Date #28

Using Redis hashes on benchmarks results in a five times improvement in CPU efficiency [Source]

Here's ML UTD #28 from the LifeWithData blog! We help you separate the signal from the noise in today's hectic front lines of software engineering and machine learning.

LifeWithData strives to deliver curated machine learning & software engineering updates that point the reader to key developments without superfluous details. This enables frequent, concise updates across the industry without information overload.

Application

Building a Gigascale ML Feature Store
Pinterest Visual Signals Infrastructure: Evolution from Lambda to Kappa Architecture
Managing Uber’s Data Workflows at Scale

Theory

The Far-Reaching Impact of Dr. Timnit Gebru

AlphaFold: a solution to a 50-year-old grand challenge in biology

Research at Microsoft 2020: Addressing the present while looking to the future

Building a Gigascale ML Feature Store ☝

Using Redis hashes on benchmarks results in a five times improvement in CPU efficiency [Source]

When a company with millions of consumers such as DoorDash builds machine learning (ML) models, the amount of feature data can grow to billions of records with millions actively retrieved during model inference under low latency constraints. These challenges warrant a deeper look into selection and design of a feature store — the system responsible for storing and serving feature data. The decisions made here can prevent overrunning cost budgets, compromising runtime performance during model inference, and curbing model deployment velocity. [...] Below, we will explain the challenges posed in the task of operating a large scale feature store. Then, we will review how we were able to quickly identify Redis as the right key-value store for this task. We will then dive into the optimizations we did on Redis to triple its capacity, while also uplifting read performance by choosing a custom serialization scheme around strings, protocol buffers, and Snappy compression algorithm.
... keep reading

The Rundown

Pinterest Visual Signals Infrastructure: Evolution from Lambda to Kappa Architecture ☝

The stream-focused "Kappa Architecture" adopted [Source]

With the growing need for machine learning signals from Pinterest’s huge visual dataset, we decided to take a closer look at our infrastructure that produces and serves these signals. A few parameters we were particularly interested in were signal availability, infra complexity and cost optimization, tech integration, developer velocity, and monitoring. In this post, we will describe our journey from a Lambda architecture to the new real-time signals infrastructure inspired by Kappa architecture.
... keep reading

The Rundown

Managing Uber’s Data Workflows at Scale ☝

The distributed "Piper" architecture designed by Uber [Source]

[...] Making this data actionable involves ingestion, transformation, dispersal, and orchestration so that it can be applied widely across areas such as traditional business intelligence, machine learning, model training, visualization, and reporting. However, during Uber’s early years of rapid growth, we onboarded a broad range of data workflow systems, with users having to choose from several overlapping tools for each use case. While this large toolbox allowed for agile and responsive growth, it proved difficult to manage and maintain, requiring engineers to learn duplicative data workflow systems as they took on different projects. We needed a central tool that could author, manage, schedule, and deploy data workflows. Leveraging a variety of previously deployed tools at Uber, including an Airflow-based platform, we began developing a system in line with Uber’s scale. This work led us to develop Piper, Uber’s centralized workflow management system, which has allowed us to democratize data workflows at Uber and enable everyone from city operations teams to machine learning engineers to carry out their work faster and more efficiently.
... keep reading

The Rundown

Article

The Far-Reaching Impact of Dr. Timnit Gebru ☝

An image from one of Google's sample Model Cards, for a facial recognition model [Source]

[...]Her impact goes far beyond her own research. She is one of the founders of the ACM Conference on Fairness, Accountability, and Transparency (FAccT), one of the most prestigious and well-known conferences related to machine learning ethics. As co-founder of Black In AI, she helped increase the number of Black attendees at NeurIPS from just 6 in 2016 to 500 in 2017, a nearly 100-fold increase in just one year. After more than half of Black in AI speakers could not get visas to Canada for NeurIPS 2018, she successfully advocated and organized to have ICLR 2020 held in Ethiopia, which would have been one of the first major AI conferences to be held on the African continent (it had to switch to remote due to covid). While Gebru was already well-known by academics working on computer vision, AI ethics, and fairness, a much broader audience has learned her name in the past week after Google fired her from her role as a manager of their AI ethics research team, a move covered by outlets including BBC, NBC, Guardian, and New York Times. As of the time of this post, 2,278 Googlers and 3,114 academic, industry, and civil society supporters have signed a letter protesting Google’s actions and supporting Gebru. While her termination has sparked crucial discussions regarding industry censorship of unfavorable research, racial discrimination in tech, corporate diversity efforts, and the failings of our current AI ethics framing, here I will focus primarily on Gebru’s research and contributions to machine learning.
... keep reading

The Rundown

AlphaFold: a solution to a 50-year-old grand challenge in biology ☝

The main NN model architecture for AlphaFold [Source]

Proteins are essential to life, supporting practically all its functions. They are large complex molecules, made up of chains of amino acids, and what a protein does largely depends on its unique 3D structure. Figuring out what shapes proteins fold into is known as the “protein folding problem”, and has stood as a grand challenge in biology for the past 50 years. In a major scientific advance, the latest version of our AI system AlphaFold has been recognized as a solution to this grand challenge by the organizers of the biennial Critical Assessment of protein Structure Prediction (CASP). This breakthrough demonstrates the impact AI can have on scientific discovery and its potential to dramatically accelerate progress in some of the most fundamental fields that explain and shape our world.
... keep reading

The Rundown

Research at Microsoft 2020: Addressing the present while looking to the future ☝

Microsoft's graphic used in the article [Source]

Microsoft researchers pursue the big questions about what the world will be like in the future and the role technology will play. Not only do they take on the responsibility of exploring the long-term vision of their research, but they must also be ready to react to the immediate needs of the present. This year in particular, they were asked to use their roles as futurists to address pressing societal challenges. In early 2020, as countries began responding to COVID-19 with stay-at-home orders and business operations moved from offices into homes, researchers sprang into action to identify ways their skills and projects could help while also making personal and professional adjustments of their own. In some cases, they pivoted to directly address the pandemic. A team from Microsoft Research Asia developed the COVID Insights website to promote scientific analysis and understanding of the disease, while the Socially Intelligent Meetings program expanded its work in telepresence technologies to include the Meetings During COVID-19 project. From responses provided by employee volunteers, these researchers are piecing together the effects of taking meetings almost entirely via screens.
... keep reading

The Rundown

Machine Learning Up to Date #28

Application

Theory

Building a Gigascale ML Feature Store ☝

Pinterest Visual Signals Infrastructure: Evolution from Lambda to Kappa Architecture ☝

Managing Uber’s Data Workflows at Scale ☝

The Far-Reaching Impact of Dr. Timnit Gebru ☝

AlphaFold: a solution to a 50-year-old grand challenge in biology ☝

Research at Microsoft 2020: Addressing the present while looking to the future ☝

Find Me Elsewhere

Software | Cloud | Machine Learning

Get the Weekly Newsletter

Email Address*