Part 1: Why Machine Learning in Systems?
-
Making the case
-
Learned Operating Systems, Yiying Zhang
and Yutong Huang, SIGOPS OSR 2019
-
On Foundation Models for Operating Systems, Divyanshu Saxena, Nihal Sharma, Donghyun Kim, Rohit
Dwivedula, Jiayi Chen, Chenxi Yang, Sriram Ravula, Zichao Hu, Aditya Akella, Joydeep Biswas, Swarat
Chaudhurim, Isil Dillig, Alex Dimakis, Daehyeok Kim, Chris Rossbach
-
Toward
ML-Centric Cloud Platforms, Ricardo Bianchini, Marcus Fontoura, Eli Cortez, Anand Bonde, Alexandre
Muzio, Ana-Maria Constantin, Thomas Moscibroda, Gabriel Magalhaes, Girish Bablani, Mark Russinovich, CACM
2020
- On the Promise and Challenges of Foundation Models for Learning-based Cloud Systems Management, Haoran Qiu, Weichao Mao, Chen Wangm Hubertus Frankem Zbigniew T. Kalbarczykm Tamer Basarm Ravishankar K. Iyer
- Architecture 2.0 Workshop: How Machine Learning Will Redefine Computer Architecture and Systems
-
Machine Learning for Databases,
Guoliang Li, Xuanhe Zhou, and Lei Cao, AIMLSystems '21
-
Self-Supervised Learning
Part 2: Use Cases of Learning in Systems
-
Learned systems data structures
-
Indices
-
The Case for Learned Index
Structures, Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis. SIGMOD'18
-
ALEX: An Updatable Adaptive Learned
Index, Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian
Zhang,
Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, and Tim Kraska. SIGMOD'20
(Optional)
-
OS data structures
-
Learned algorithmic decisions
-
Scheduling
-
Resource allocation
-
Routing and flow problems
-
DOTE: Rethinking (Predictive) WAN
Traffic Engineering, Yarin Perry, Felipe Vieira Frujeri, Chaim Hoch, Srikanth Kandula, Ishai
Menache, Michael Schapira, and Aviv Tamar. USENIX NSDI 2023
-
Teal: Learning-Accelerated Optimization of
WAN Traffic Engineering, Zhiying Xu, Francis Y. Yan, Rachee Singh, Justin T. Chiu, Alexander
M.
Rush, and Minlan Yu. SIGCOMM 2023. (Optional)
-
Prefetching
-
Learning Memory Access Patterns, Milad Hashemi, Kevin Swersky, Jamie Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, Parthasarathy Ranganathan. ICML 2018
-
Caching
-
HALP: Heuristic Aided Learned
Preference Eviction Policy for YouTube Content Delivery Network, Zhenyu Song, Kevin Chen,
Nikhil
Sarda, Deniz Altinbüken, Eugene Brevdo, Jimmy Coleman, Xiao Ju, Pawel Jurczyk, Richard Schooler, and
Ramki Gummadi. NSDI 2023
-
Learning configurations
-
Cloud and big data
-
SelfTune: Learning-based
Cluster
Managers, Ajaykrishna Karthikeyan, Nagarajan Natarajan, Gagan Somashekar, Lei Zhao, Ranjita
Bhagwan, Rodrigo Fonseca, Tatiana Racheva, and Yogesh Bansal. NSDI 2023
-
CherryPick:
Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics, Omid Alipourfard,
Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. NSDI'17
(Optional)
-
CDN Caches
-
Learned controllers
-
Congestion control
-
Microservice controllers
-
FIRM: An Intelligent Fine-grained
Resource Management Framework for SLO-Oriented Microservices, Haoran Qiu, Subho S. Banerjee,
Saurabh Jha, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer. OSDI'20.
-
Sinan: ML-based and QoS-aware resource management for cloud microservices
, Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, G. Edward Suh, Christina Delimitrou, ASPLOS'21.
(Optional)
-
Learning to manage
-
NetVigil: Robust and Low-Cost Anomaly Detection for East-West Data Center Security
, Kevin Hsieh, Mike Wong, Santiago Segarra, Sathiya Kumaran Mani, Trevor Eberl, Anatoliy Panasyuk, Ravi Netravali, Ranveer Chandra, and Srikanth Kandula, NSDI'24.
-
Murphy: Performance Diagnosis of Distributed Cloud Applications
, Vipul Harsh, Wenxuan Zhou, Sachin Ashok, Radhika Niranjan Mysore, P. Brighten Godfrey, Sujata Banerjee, SIGCOMM'23.
-
Learning to mimic
-
Trace generation
-
Practical GAN-based synthetic IP header
trace generation using NetShare, Yucheng Yin, Zinan Lin, Minhao Jin, Giulia Fanti, Vyas Sekar.
SIGCOMM'22
-
Generating Complex, Realistic Cloud
Workloads using Recurrent Neural Networks, Shane Bergsma, Timothy Zeyl, Arik Senderovich, J.
Christopher Beck. SOSP'21
-
Simulation
-
MimicNet: fast performance estimates for
data center networks with machine learning, Qizhen Zhang, Kelvin K. W. Ng, Charles Kazer, Shen
Yan, João Sedoc, Vincent Liu. SIGCOMM'21
-
CausalSim: A Causal Framework for
Unbiased Trace-Driven Simulation, Abdullah Alomar, Pouya Hamadanian, Arash Nasr-Esfahany, Anish
Agarwal, Mohammad Alizadeh, Devavrat Shah. NSDI'23
Part 3: Why Does Today's Use of ML in Systems Fall Short?
-
System support for learning in systems
-
Towards a Machine Learning-Assisted Kernel
with LAKE, Henrique Fingler, Isha Tarte, Hangchen Yu, Ariel Szekely, Bodun Hu, Aditya Akella,
Christopher J. Rossbach.
-
LiteFlow: towards high-performance adaptive
neural networks for kernel datapath, Junxue Zhang, Chaoliang Zeng, Hong Zhang, Shuihai Hu, Kai Chen.
SIGCOMM'22
-
Learning and guarantees
-
Verifying learning systems
-
Integrating with training
-
Learning composable decisions