In this tutorial, we’ll give a basic overview of Ray core and a new beta library called Ray Datasets. First, we’ll use Ray core to parallelize a simple Python script. Next, we’ll explain some of the current challenges in data loading for ML frameworks, and we’ll see how we can use Ray Datasets to simplify and scale this application. [Slides][Tutorial]
Tune is an open-source framework that easily enables ML developers to use state of the art hyperparameter tuning methods at scale without having to worry about resource allocation or time-consuming code changes. We will be going through some of our recent research in hyperparameter tuning systems for the cloud along with how to easily integrate Tune into your existing training code. [Slides][Tutorial]
Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available as part of the open source Ray project. [Slides][Tutorial]
In this talk, we’ll present ralf, a feature store for computing and serving features for ML pipelines. We’ll provide background on ML features, feature stores, and ralf, then have a tutorial for setting up a simple featurization pipeline with streaming data in ralf. [Slides][Tutorial]
Melih will present the NumS project — What is NumS and what NumS is NOT, key concepts, and benchmarks in NumS, followed by a hands-on tutorial that uses NumS to address a Kaggle challenge. [Slides][Tutorial] [Feedback Form]
Attendees will learn how to use MC2 for secure collaborative learning. That is, mutually distrustful data owners can use MC2 to jointly train a model on their data, but without revealing their individual data to each other. [Slides][Tutorial]
Software organizations are increasingly incorporating machine learning (ML) into their product offerings, driving a need for new data management tools. Many of these tools facilitate the initial development and deployment of ML applications, contributing to a crowded landscape of disconnected solutions targeted at different stages, or components, of the ML lifecycle. A lack of end-to-end ML pipeline visibility makes it hard to address any issues that may arise after a production deployment, such as unexpected output values or lower-quality predictions. We introduce our prototype and our vision for mltrace, a platform-agnostic system that provides observability to ML practitioners by (1) executing predefined tests and monitoring ML-specific metrics at component runtime, (2) tracking end-to-end data flow, and (3) allowing users to ask arbitrary post-hoc questions about pipeline health. This tutorial specifically focuses on using mltrace to build and execute tests for ML pipelines. [Slides][Demo]