Suggested Term Project Topics

Note

You can pick one topic that sounds most interesting to you. All the following topics describe open research problems. You should aim at developing those ideas into conference papers or even MS thesis. You can also suggest any proper topic to the instructor.
For all projects, you are required to report your experience (e.g., any problems, failures, bugs) with any infrastructure (VCL, Amazon EC2, Google AppEngine) you choose to use. You will receive extra credits for each specific bug you report.

Research Projects Supervised by Dr.Gu

  • Automatic System Management using Unsupervised Machine Learning: [slides]
  • A Hybrid Approach to Cloud System Performance Bug Detection and Diagnosis: [slides]

Topic 1: Virtual Machine Management in Distributed Computing Environments

  • Project description: Virtualization is one of the basic technologies for modern data centers and cloud computing systems such as Amazon EC2. The goal of this project is to explore the virtualization techniques (i.e., Xen) to achieve various system management goals such as resourcement management for distributed computing environments such as VCL.
  • References:
  1. AGILE: elastic distributed resource scaling for Infrastructure-as-a-Service“,
    Hiep Nguyen, Zhiming Shen, Xiaohui Gu, Sethuraman Subbiah, John Wilkes,
    Proc. of USENIX International Conference on Autonomic Computing (ICAC), San Jose, CA, June, 2013.
  2. CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems
    Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes,
    Proc. of ACM Symposium on Cloud Computing (SOCC) in conjunction with SOSP, Cascais, Portugal, October, 2011.
  3. PRESS: PRedictive Elastic ReSource Scaling for Cloud Systems“,
    Zhenhuan Gong, Xiaohui Gu, John Wilkes
    IEEE International Conference on Network and Services Management (CNSM), Niagara Falls, Canada, October, 2010.
  4. PAC: Pattern-driven Application Consolidation for Efficient Cloud Computing“,
    Zhenhuan Gong, Xiaohui Gu,
    IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), Miami Beach, Florida, August, 2010.
  5. Xen and the Art of Virtulization,
    Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauery, Ian Pratt, Andrew Wareld,
    Proc. of SOSP, 2003.
  • Experiment environment: VCL
  • Related software: KVM, Xen, Hadoop, RUBiS, IBM System S

Topic 2: System Monitoring & Behavior Learning & Anomaly Management

  • Project description: The goal of this project is to collect monitoring data for one system anomaly and develop anomaly prediction or diagnosis algorithm.
  • References:
  1. FChain: Toward Black-box Online Fault Localization for Cloud Systems
    Hiep Nguyen, Zhiming Shen, Yongmin Tan, Xiaohui Gu
    Proc. of IEEE International Conference on Distributed Computing Systems (ICDCS), Philadelphia, PA, July, 2013.
  2. UBL: Unsupervised Behavior Learning for Predicting Performance Anomalies in Virtualized Cloud Systems
    Daniel Dean, Hiep Nguyen, Xiaohui Gu,
    Proc. of ACM International Conference on Autonomic Computing (ICAC), San Jose, CA, September, 2012.
  3. PREPARE: Predictive Performance Anomaly Prevention for Virtualized Cloud Systems
    Yongmin Tan, Hiep Nguyen, Zhiming Shen, Xiaohui Gu, Chitra Venkatramani, Deepak Rajan,
    Proc. of IEEE International Conference on Distributed Computing Systems (ICDCS), Macau, China, June, 2012
  4. Adaptive Runtime Anomaly Prediction for Dynamic Hosting Infrastructures“,
    Yongmin Tan, Xiaohui Gu, Haixun Wang,
    ACM Symposium on Principles of Distributed Computing (PODC), Zurich, Switzerland, July, 2010. (acceptance rate: 21%)

Topic 3: System  diagnosis using console logs or traces

  • Project description: The goal of this project is to detect and diagnose runtime system problems using logs or system traces.
  • References:
  1. ELT: Efficient Log-based Troubleshooting System for Cloud Computing Infrastructures”,
    Kamal Kc, Xiaohui Gu,
    Proc. of IEEE International Symposium on Reliable Distributed Systems (SRDS), Madrid, Spain, October, 2011.
  2. Detecting Large-Scale System Problems by Mining Console Logs
    Wei Xu, Ling Huang, Armando Fox, David Patterson, Michael Jordan,
    Proc. of SOSP 2009.
  3. DScope: Detecting Real-World Data Corruption Hang Bugs in Cloud Server Systems
    Ting Dai, Jingzhu He, Xiaohui Gu, Shan Lu, Peipei Wang
    Proc. of SOCC 2018.
  4. TScope: Automatic Timeout Bug Identification for Server Systems
    Jingzhu He, Ting Dai, Xiaohui Gu
    Proc. of ICAC 2018.
  5. Hytrace: A Hybrid Approach to Performance Bug Diagnosis in Production Cloud Infrastructures
    Ting Dai, Daniel Dean, Peipei Wang, Xiaohui Gu, Shan Lu
    IEEE Transactions on Parallel and Distributed Systems (TPDS), 2018
  6. TFix: Automatic Timeout Bug Fixing in Production Server Systems
    Jingzhu He, Ting Dai, Xiaohui Gu
    Proc. of ICDCS 2019.
  7. HangFix: Automatically Fixing Software Hang Bugs for Production Cloud Systems
    Jingzhu He, Ting Dai, Xiaohui Gu and Guoliang Jin
    Proc. of SOCC 2020.
  8. CDL: Classified Distributed Learning for Detecting Security Attacks in Containerized Applications(opens in new window)
    Yuhang Lin, Olufogorehan Tunde-Onadele, and Xiaohui Gu
    Proc. of ACSAC 2020
  9. Self-Patch: Beyond Patch Tuesday for Containerized Applications(opens in new window)
    Olufogorehan Tunde-Onadele, Yuhang Lin, Jingzhu He, and Xiaohui Gu
    Proc. of ACSOS 2020
  10. SHIL: Self-Supervised Hybrid Learning for Security Attack Detection in Containerized Applications
    Yuhang Lin, Olufogorehan Tunde-Onadele, Xiaohui Gu, Jingzhu He, and Hugo Latapie
    Proc. of ACSOS 2022
  • Experiment environment: VCL, Amazon EC2, Google AppEngine
  • Related software: Hadoop, VCL,