Friday, November 5th (Times listed are in PDT)
8:45 AM – 12:00 PM: Opening Remarks by Ion Stoica and Ray Tutorial
- 9 AM – 9:45 AM: Ray Core Tutorial with Stephanie Wang
- In this tutorial, we’ll give a basic overview of Ray core and a new beta library called Ray Datasets. First, we’ll use Ray core to parallelize a simple Python script. Next, we’ll explain some of the current challenges in data loading for ML frameworks, and we’ll see how we can use Ray Datasets to simplify and scale this application. [Slides] [Tutorial]
- 9:45 AM – 10:15 AM: Ray Tune Tutorial with Lisa Dunlap
- Tune is an open-source framework that easily enables ML developers to use state of the art hyperparameter tuning methods at scale without having to worry about resource allocation or time-consuming code changes. We will be going through some of our recent research in hyperparameter tuning systems for the cloud along with how to easily integrate Tune into your existing training code. [Slides] [Tutorial]
- 10:15 AM – 10:30 AM: Break
- 10:30 AM – 11:15 AM: RLlib Tutorial with Michael Luo
- Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available as part of the open source Ray project. [Slides] [Tutorial]
- 11:15 AM – 12:00 PM: ralf Feature Store Tutorial with Sarah Wooders and Simon Mo
- In this talk, we’ll present ralf, a feature store for computing and serving features for ML pipelines. We’ll provide background on ML features, feature stores, and ralf, then have a tutorial for setting up a simple featurization pipeline with streaming data in ralf. [Slides] [Tutorial]
12 PM – 12:30 PM: Lunch Break
12:30 PM – 1:30 PM: Modin Overview and Tutorial
- 12:30 PM – 12:45 PM: Modin: Introduction and New Development, with Dixin Tang
- Dixin Tang will give an overview of the Modin project, and latest development since the last RISECamp. [Slides]
- 12:45 PM – 1:30 PM: Modin Tutorial with Rehan Durrani
- Rehan Durrani will present an interactive demo to introduce Modin and its core functionality. [Slides] [Tutorial]
1:30 PM – 2:30 PM: New Project: NumS
- 1:30 PM – 2:20 PM: NumS Tutorial with Melih Elibol
- Melih will present the NumS project — What is NumS and what NumS is NOT, key concepts, and benchmarks in NumS, followed by a hands-on tutorial that uses NumS to address a Kaggle challenge. [Slides] [Tutorial] [Feedback Form]
- 2:20 PM – 2:30 PM: NumS Demo with Melih Elibol
- Melih will show a demo of using NumS to scale out numpy operations at a larger-scale clusters (8x r5 nodes).
2:30 PM – 3:30 PM: MC2 Tutorial
- 2:30 PM – 3:30 PM: MC2 Tutorial with Rishabh Poddar and Chester Leung
3:30 PM – 3:45 PM: Break
3:45 – 4:45 PM: New Project: mltrace
- 3:45 PM – 4:45 PM: mltrace Demo with Shreya Shankar
- Software organizations are increasingly incorporating machine learning (ML) into their product offerings, driving a need for new data management tools. Many of these tools facilitate the initial development and deployment of ML applications, contributing to a crowded landscape of disconnected solutions targeted at different stages, or components, of the ML lifecycle. A lack of end-to-end ML pipeline visibility makes it hard to address any issues that may arise after a production deployment, such as unexpected output values or lower-quality predictions. We introduce our prototype and our vision for mltrace, a platform-agnostic system that provides observability to ML practitioners by (1) executing predefined tests and monitoring ML-specific metrics at component runtime, (2) tracking end-to-end data flow, and (3) allowing users to ask arbitrary post-hoc questions about pipeline health. This tutorial specifically focuses on using mltrace to build and execute tests for ML pipelines. [Slides] [Demo]