Hot Fixing Software

A hot fix is an unplanned improvement to a specific time-critical issue deployed to a software system in production.

This project is maintained by carolhanna01

This website hosts the referenced publications from the paper "Hot Fixing Software: A Comprehensive Review of Terminology, Techniques, and Applications" by Carol Hanna, Justyna Petke, David Clark, and Federica Sarro from University College London.

While hot fixing is an essential and common activity in software maintenance, it has never been surveyed as a research activity. Thus, such a review is long overdue. In this work, we conduct a comprehensive literature review of work on hot fixing. We highlight the fields where this topic has been addressed, inconsistencies we identified in the terminology, gaps in the literature, and directions for future work. Our search concluded with 87 papers on the topic between the year 2000 and 2022. The papers found encompass many different research areas such as log analysis, runtime patching, and automated repair, as well as varying application domains such as security, mobile, and video games.

We find that there are many directions that can take hot fix research forward such as unifying existing terminology, establishing a benchmark set of hot fixes, researching costs and frequency of hot fixes, and researching the possibility of end-to-end automation of detection, mitigation, and propagation. We discuss these avenues in detail to inspire the community to systematize hot fixing as a software engineering activity.

We hope that this work streamlines the existing body of work and drives research in the area forward.

ID Title Authors Year Venue Tags
1 A case study of measuring degeneration of software architectures from a defect perspective Li, Zude and Long, Jun 2011 Proceedings - Asia-Pacific Software Engineering Conference, APSEC Empirical
2 A Case Study of Open Source Software Development: The Apache Server Mockus, Audris and Fielding, Roy T and Herbsleb, James and Labs, Bell and Blvd, Shuman 2000 ICSE '00: Proceedings of the 22nd international conference on Software engineering Empirical
3 A debugging approach for live Big Data applications Marra, Matteo and Polito, Guillermo and Gonzalez Boix, Elisa 2020 Science of Computer Programming Debug Assistance,Detection
4 A Scalable Framework for Provisioning Large-Scale IoT Deployments Vogler, Michael and Schleicher, Johannes M. and Inzinger, Christian and Dustdar, Schahram 2016 ACM Transactions on Internet Technology (TOIT) Propagation
5 A Semi-Distributed Self-Healing Protocol for Run-Time Repairs of Time-Triggered Schedules Pozo, Francisco and Rodriguez-Navas, Guillermo 2019 IEEE International Conference on Emerging Technologies and Factory Automation, ETFA Hotpatch Generation
6 Advanced Tools for Operators at Amazon.com Bodik, Peter and Fox, Armando and Jordan, Michael I and Patterson, David and Banerjee, Ajit and Jagannathan, Ramesh and Su, Tina and Tenginakai, Shivaraj and Turner, Ben and Ingalls, Jon and Lab, Rad and Berkeley, U C and University, Stanford 2006 Hot Topics in Autonomic Computing (HotAC) Detection,Operator Tooling
7 An automation framework for configuration management to reduce manual intervention Karale, Supriya V. and Kaushal, Vishal 2016 ACM International Conference Proceeding Series Configuration Management
8 An empirical study of emergency updates for top android mobile apps Hassan, Safwat and Shang, Weiyi and Hassan, Ahmed E. 2017 Empirical Software Engineering Empirical
9 An Empirical Study on Quality Issues of Production Big Data Platform Zhou, Hucheng and Lou, Jian-Guang and Zhang, Hongyu and Lin, Haibo and Lin, Haoxiang and Qin, Tingting 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering Empirical
10 An entropy evaluation approach for triaging field crashes: A case study of Mozilla Firefox Khomh, Foutse and Chan, Brian and Zou, Ying and Hassan, Ahmed E. 2011 Proceedings - Working Conference on Reverse Engineering, WCRE Detection
11 App Store 2.0: From Crowdsourced Information to Actionable Feedback in Mobile Ecosystems Gomez, Maria and Adams, Bram and Maalej, Walid and Monperrus, Martin and Rouvoy, Romain 2017 IEEE Software E2E Tool
12 Applicable Micropatches and Where to Find Them: Finding and Applying New Security Hot Fixes to Old Software Malone, Mac and Wang, Yicheng and Snow, Kevin and Monrose, Fabian 2021 Proceedings - 2021 IEEE 14th International Conference on Software Testing, Verification and Validation, ICST 2021 Empirical
13 AppSealer: Automatic Generation of Vulnerability-Specific Patches for Preventing Component Hijacking Attacks in Android Applications Zhang, Mu and Yin, Heng 2014 NDSS Patch Generation,Security
14 Automated atomicity-violation fixing Jin, Guoliang and Song, Linhai and Zhang, Wei and Lu, Shan and Liblit, Ben 2011 Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) E2E Tool
15 Automated known problem diagnosis with event traces Yuan, Chun and Lao, Ni and Wen, Ji Rong and Li, Jiwei and Zhang, Zheng and Wang, Yi Min and Ma, Wei Ying 2006 Proceedings of the 2006 EuroSys Conference Detection
16 Automatic Hot Patch Generation for Android Kernels Xu, Zhengzi and Zhang, Yulong and Zheng, Longri and Xia, Liangzhao and Bao, Chenfu and X-Lab, Baidu and Wang, Zhi and Liu, Yang and Longri, Baidu X-Lab and Baidu, Zheng and Liangzhao, X-Lab and Baidu, Xia and Chenfu, X-Lab and Baidu, Bao and Wang, X-Lab Zhi 2020 SEC'20: Proceedings of the 29th USENIX Conference on Security Symposium Binary-level,Hotpatch Generation,Runtime,Security
17 Automatically patching errors in deployed software Perkins, Jeff H. and Kim, Sunghun and Larsen, Sam and Amarasinghe, Saman and Bachrach, Jonathan and Carbin, Michael and Pacheco, Carlos and Sherwood, Frank and Sidiroglou, Stelios and Sullivan, Greg and Wong, Weng Fai and Zibin, Yoav and Ernst, Michael D. and Rinard, Martin 2009 SOSP'09 - Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles E2E Tool
18 Autonomous hot patching for web-based applications Huang, Hai and Tsai, Wei Tek and Chen, Yinong 2005 Proceedings - International Computer Software and Applications Conference Binary-level,E2E Tool,Patch Generation
19 AutoPaG: Towards automated software patch generation with source code root cause identification and repair Lin, Zhiqiang and Jiang, Xuxian and Xu, Dongyan and Mao, Bing and Xie, Li 2007 eProceedings of the 2nd ACM Symposium on Information, Computer and Communications Security, ASIACCS '07 Patch Generation,Security
20 Auto-patching DOM-based XSS at scale Parameshwaran, Inian and Budianto, Enrico and Shinde, Shweta and Dang, Hung and Sadhu, Atul and Saxena, Prateek 2015 2015 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2015 - Proceedings Patch Generation,Security
21 Band-aid Patching Sidiroglou, Stelios and Ioannidis, Sotiris and Keromytis, Angelos D. 2007 Third Workshop on Hot Topics in System Dependability (HotDep'07) Binary-level,Hotpatch Generation
22 Binary Quilting to Generate Patched Executables without Compilation Saieva, Anthony and Kaiser, Gail 2020 FEAST 2020 - Proceedings of the 2020 ACM Workshop on Forming an Ecosystem Around Software Transformation Binary-level,Hotpatch Generation
23 Building a Reactive Immune System for Software Services Sidiroglou, Stelios and Locasto, Michael E. and Boyd, Stephen W. and Keromytis, Angelos D. 2005 FEAST 2020 - Proceedings of the 2020 ACM Workshop on Forming an Ecosystem Around Software Transformation E2E Tool
24 Continuous deployment at Facebook and OANDA Savor, Tony and Douglas, Mitchell and Gentili, Michael and Williams, Laurie and Beck, Kent and Stumm, Michael 2016 Proceedings - International Conference on Software Engineering Empirical
25 Continuous release and upgrade of component-based software Van Der Storm, Tijs 2005 Proceedings of the 12th International Workshop on Software Configuration Management, SCM 2005 Propagation
26 CRANE: Failure prediction, change analysis and test prioritization in practice - Experiences from windows Czerwonka, Jacek and Das, Rajiv and Nagappan, Nachiappan and Tarvo, Alex and Teterev, Alex 2011 Proceedings - 4th IEEE International Conference on Software Testing, Verification, and Validation, ICST 2011 Detection
27 Cross-Stack Threat Sensing for Cyber Security and Resilience Araujo, Frederico and Taylor, Teryl and Zhang, Jialong and Stoecklin, Marc Ph 2018 Proceedings - 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, DSN-W 2018 Detection
28 Dataflow analysis for known vulnerability prevention system Qin, Lifang and Li, Yichao and Yue, Cao 2008 2008 IEEE International Conference on Cybernetics and Intelligent Systems, CIS 2008 Detection,Hotpatch Generation,Security
29 Debugging in the (very) large: Ten years of implementation and experience Glerum, Kirk and Kinshumann, Kinshuman and Greenberg, Steve and Aul, Gabriel and Orgovan, Vince and Nichols, Greg and Grant, David and Loihle, Gretchen and Hunt, Galen 2009 SOSP'09 - Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles Bug reporting,Operator Tooling
30 Do faster releases improve software quality? An empirical case study of Mozilla Firefox Khomh, Foutse and Dhaliwal, Tejinder and Zou, Ying and Adams, Bram 2012 IEEE International Working Conference on Mining Software Repositories Empirical
31 Embroidery: Patching vulnerable binary code of fragmentized android devices Zhang, Xuewen and Zhang, Yuanyuan and Li, Juanru and Hu, Yikun and Li, Huayi and Gu, Dawu 2017 Proceedings - 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017 Binary-level,Hotpatch Generation,Security
32 Ensembles of models for automated diagnosis of system performance problems Zhang, Steve and Cohen, Ira and Goldszmidt, Moises and Symons, Julie and Fox, Armando 2005 Proceedings of the International Conference on Dependable Systems and Networks Detection
33 Exploring the relationship of a file's history and its fault-proneness: An empirical study Illes-Seifert, Timea and Paech, Barbara 2008 Proceedings - Testing: Academic and Industrial Conference Practice and Research Techniques, TAIC PART 2008 Empirical
34 Exterminator: Automatically correcting memory errors with high probability Novark, Gene and Berger, Emery D. and Zorn, Benjamin G. 2007 Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) Detection,Patch Generation
35 Field studies of computer system administrators: analysis of system management tools and practices Barrett, Rob and Kandogan, Eser and Maglio, Paul P. and Haber, Eben M. and Takayama, Leila A. and Prabaker, Madhu 2004 CSCW '04: Proceedings of the 2004 ACM conference on Computer supported cooperative work Empirical
36 Fingerprinting the datacenter: Automated classification of performance crises Bodik, Peter and Goldszmidt, Moises and Fox, Armando and Woodard, Dawn B. and Andersen, Hans 2010 EuroSys'10 - Proceedings of the EuroSys 2010 Conference Detection,Operator Tooling
37 First-aid: Surviving and preventing memory management bugs during production runs Gao, Qi and Zhang, Wenbin and Tang, Yan and Qin, Feng 2009 Proceedings of the 4th ACM European Conference on Computer Systems, EuroSys'09 Operator Tooling,Patch Generation
38 Handling vulnerabilities with mobile agents in order to consider the delay and disruption tolerant characteristic of military networks Aurisch, Thorsten and Jacke, Andreas 2018 2018 International Conference on Military Communications and Information Systems, ICMCIS 2018 Detection,Patch Generation,Security
39 Healing online service systems via mining historical issue repositories Ding, Rui and Fu, Qiang and Lou, Jian Guang and Lin, Qingwei and Zhang, Dongmei and Shen, Jiajun and Xie, Tao 2012 2012 27th IEEE/ACM International Conference on Automated Software Engineering, ASE 2012 - Proceedings Patch Generation
40 High-impact defects: A study of breakage and surprise defects Shihab, Emad and Mockus, Audri and Kamei, Yasutaka and Adams, Bram and Hassan, Ahmed E. 2011 SIGSOFT/FSE 2011 - Proceedings of the 19th ACM SIGSOFT Symposium on Foundations of Software Engineering Detection
41 Hot-patching a web server: A case study of ASAP code repair Payer, Mathias and Gross, Thomas R. 2013 2013 11th Annual Conference on Privacy, Security and Trust, PST 2013 Propagation,Runtime
42 Identifying impactful service system problems via log analysis He, Shilin and Lin, Qingwei and Lou, Jian Guang and Zhang, Hongyu and Lyu, Michael R. and Zhang, Dongmei 2018 ESEC/FSE 2018 - Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering Bug reporting,Detection
43 Improving cybersecurity hygiene through JIT patching Araujo, Frederico and Taylor, Teryl 2020 ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering Propagation,Runtime,Security
44 InstaGuard: Instantly Deployable Hot-patches for Vulnerable System Programs on Android Chen, Yaohui and Li, Yuping and Lu, Long and Lin, Yueh-Hsun and Vijayakumar, Hayawardh and Wang, Zhi and {Xinming Ou 2018 Network and Distributed System Security Symposium (NDSS'18) Hotpatch Generation,Propagation,Security
45 Katana: A hot patching framework for ELF executables Ramaswamy, Ashwin and Bratus, Sergey and Smith, Sean W. and Locasto, Michael E. 2010 ARES 2010 - 5th International Conference on Availability, Reliability, and Security Binary-level,Hotpatch Generation,Propagation
46 Keepers of the Machines: Examining How System Administrators Manage Software Updates Li, Frank and Chetty, Marshini and Rogers, Lisa and Mathur, Arunesh and Malkin, Nathan 2019 Fifteenth Symposium on Usable Privacy and Security (SOUPS 2019) Empirical,Operator Tooling
47 LEONORE - Large-scale provisioning of resource-constrained IoT deployments Vogler, Michael and Schleicher, Johannes M. and Inzinger, Christian and Nastic, Stefan and Sehic, Sanjin and Dustdar, Schahram 2015 Proceedings - 9th IEEE International Symposium on Service-Oriented System Engineering, IEEE SOSE 2015 Propagation
48 Mining historical issue repositories to heal large-scale online service systems Ding, Rui and Fu, Qiang and Lou, Jian Guang and Lin, Qingwei and Zhang, Dongmei and Xie, Tao 2014 Proceedings of the International Conference on Dependable Systems and Networks Patch Generation
49 Onion: Identifying incident-indicating logs for cloud systems Zhang, Xu and Xu, Yong and Qin, Si and He, Shilin and Qiao, Bo and Li, Ze and Zhang, Hongyu and Li, Xukun and Dang, Yingnong and Lin, Qingwei and Chintalapati, Murali and Rajmohan, Saravanakumar and Zhang, Dongmei 2021 ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering Bug reporting,Operator Tooling
50 Online Model-Based Clustering for Crisis Identification in Distributed Computing Woodard, Dawn B. and Goldszmidt, Moises 2012 Journal of the American Statistical Association Bug reporting,Detection
51 Patch me if you can: A study on the effects of individual user behavior on the end-host vulnerability state Sarabi, Armin and Zhu, Ziyun and Xiao, Chaowei and Liu, Mingyan and Dumitras, Tudor 2017 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Empirical
52 PatchDroid: Scalable third-party security patches for android devices Mulliner, Collin and Oberheide, Jon and Robertson, William and Kirda, Engin 2013 ACM International Conference Proceeding Series Propagation,Security
53 Performance issue diagnosis for online service systems Fu, Qiang and Lou, Jian Guang and Lin, Qing Wei and Ding, Rui and Zhang, Dongmei and Ye, Zihao and Xie, Tao 2012 Proceedings of the IEEE Symposium on Reliable Distributed Systems Detection
54 Poster AutoPatch: Automatic Hotpatching of Real-Time Embedded Devices Salehi, Mohsen and Pattabiraman, Karthik 2022 Proceedings of the ACM Conference on Computer and Communications Security Hotpatch Generation,Security
55 Precise and Accurate Patch Presence Test for Binaries Zhang, Hang and Qian, Zhiyun 2018 27th USENIX Security Symposium (USENIX Security 18) Binary-level,Detection,Security
56 Predicting bug-fixing time: An empirical study of commercial software projects Zhang, Hongyu and Gong, Liang and Versteeg, Steve 2013 Proceedings - International Conference on Software Engineering Empirical
57 ProbeGuard: Mitigating Probing Attacks Through Reactive Program Transformations Bhat, Koustubha and Van Der Kouwe, Erik and Bos, Herbert and Giuffrida, Cristiano 2019 International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS Detection,Hotpatch Generation,Security
58 R2C: Robust Rolling-Upgrade in Clouds Sun, Daniel and Fekete, Alan and Gramoli, Vincent and Li, Guoqiang and Xu, Xiwei and Zhu, Liming 2018 IEEE Transactions on Dependable and Secure Computing Detection,Propagation
59 Recovery from failures due to Mandelbugs in IT systems Trivedi, Kishor S. and Mansharamani, Rajesh and Kim, Dong Seong and Grottke, Michael and Nambiar, Manoj 2011 Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC Empirical
60 Recursive restartability: Turning the reboot sledgehammer into a scalpel Candea, George and Fox, Armando 2001 Proceedings of the Workshop on Hot Topics in Operating Systems - HOTOS Propagation,Runtime
61 ReDAC - Dynamic reconfiguration of distributed component-based applications with cyclic dependencies Rasche, Andreas and Polze, Andreas 2008 Proceedings - 11th IEEE Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2008 Configuration Management,Runtime
62 Rx: Treating bugs as allergies - A safe method to survive software failures Qin, Feng and Tucek, Joseph and Sundaresan, Jagadeesan and Zhou, Yuanyuan 2005 Proceedings of the 20th ACM Symposium on Operating Systems Principles, SOSP 2005 Symptom Mitigation
63 Security vulnerabilities in Javascript hotpatching in iOS with a commercial and open-source tool Ford, Sarah and Olmsted, Aspen 2018 International Conference on Information Society, i-Society 2017 Propagation,Security
64 Security-related vulnerability life cycle analysis Marconato, Geraldine Vache and Nicomette, Vincent and Kaaniche, Mohamed 2012 7th International Conference on Risks and Security of Internet and Systems, CRiSIS 2012 Empirical
65 ShieldGen: Automatic data patch generation for unknown vulnerabilities with informed probing Cui, Weidong and Peinado, Marcus and Wang, Helen J. and Locasto, Michael E. 2007 Proceedings - IEEE Symposium on Security and Privacy Detection,Patch Generation,Security
66 Software analytics for incident management of online services: An experience report Lou, Jian Guang and Lin, Qingwei and Ding, Rui and Fu, Qiang and Zhang, Dongmei and Xie, Tao 2013 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013 - Proceedings Bug reporting,Operator Tooling,Symptom Mitigation
67 Source Code and Binary Level Vulnerability Detection and Hot Patching Xu, Zhengzi 2020 Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering Detection,Hotpatch Generation,Runtime,Security
68 SPIDER: Enabling fast patch propagation in related software repositories MacHiry, Aravind and Redini, Nilo and Camellini, Eric and Kruegel, Christopher and Vigna, Giovanni 2020 Proceedings - IEEE Symposium on Security and Privacy Patch Generation,Security
69 STOP: Socio-temporal opportunistic patching of short range mobile malware Tang, John and Kim, Hyoungshick and Mascolo, Cecilia and Musolesi, Mirco 2012 2012 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks, WoWMoM 2012 - Digital Proceedings Propagation,Security
70 Striving for Failure: An Industrial Case Study About Test Failure Prediction Anderson, Jeff and Salem, Saeed and Do, Hyunsook 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering Empirical
71 Studying the urgent updates of popular games on the Steam platform Lin, Dayi and Bezemer, Cor Paul and Hassan, Ahmed E. 2017 Empirical Software Engineering Empirical
72 Sweeper: A lightweight end-to-end system for defending against fast worms Tucek, Joseph and Lu, Shan and Huang, Chengdu and Xanthos, Spiros and Zhou, Yuanyuan and Newsome, James and Brumley, David and Song, Dawn 2007 Operating Systems Review (ACM) Detection,Security,Symptom Mitigation
73 Synergistic debug-repair of heap manipulations Verma, Sahil and Roy, Subhajit 2017 Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering Debug Assistance,Hotpatch Generation
74 Tales of software updates: The process of updating software Vaniea, Kami and Rashidi, Yasmeen 2016 Conference on Human Factors in Computing Systems - Proceedings Empirical
75 Talos: Neutralizing Vulnerabilities with Security Workarounds for Rapid Response Huang, Zhen and Dangelo, Mariana and Miyani, Dhaval and Lie, David 2016 Proceedings - 2016 IEEE Symposium on Security and Privacy, SP 2016 Security,Symptom Mitigation
76 The empirical commit frequency distribution of open source projects Kolassa, Carsten and Riehle, Dirk and Salim, Michel A. 2013 Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 Empirical
77 TJOSConf: Automatic and Safe System Environment Operations Platform Wang, Yida and Jiang, Shuangshuang and Cui, Bin 2022 ACM International Conference Proceeding Series Operator Tooling
78 Toward Just-in-Time Patching for Containerized Applications Tunde-Onadele, Olufogorehan and Carolina, North and Lin, Yuhang and He, Jingzhu and Gu, Xiaohui 2020 Proceedings of the 7th Symposium on Hot Topics in the Science of Security Detection,Patch Generation,Security
79 Towards release strategy optimization for apps in Google Play Shen, Sheng and Lu, Xuan and Hu, Ziniu and Liu, Xuanzhe 2017 ACM International Conference Proceeding Series Empirical
80 Triage: Diagnosing production run failures at the user's site Tucek, Joseph and Lu, Shan and Huang, Chengdu and Xanthos, Spiros and Zhou, Yuanyuan 2007 Operating Systems Review (ACM) Detection,Patch Generation
81 Troubleshooting Transiently-Recurring Errors in Production Systems with Blame-Proportional Logging Troubleshooting Transiently-Recurring Problems in Production Systems with Blame-Proportional Logging Luo, Liang and Nath, Suman and Sivalingam, Ravindranath and Musuvathi, Madan and Ceze, Luis 2018 USENIX Annual Technical Conference (USENIX ATC 18) Detection,Operator Tooling
82 Update with care: Testing candidate bug fixes and integrating selective updates through binary rewriting Saieva, Anthony and Kaiser, Gail 2022 Journal of Systems and Software Bug reporting,Detection,Propagation
83 Using pre-release test failures to build early post-release defect prediction models Herzig, Kim 2014 Proceedings - International Symposium on Software Reliability Engineering, ISSRE Empirical
84 Virtual machine preserving host updates for zero day patching in public cloud Russinovich, Mark and Govindaraju, Naga and Raghuraman, Melur and Hepkin, David and Schwartz, Jamie and Kishan, Arun 2021 EuroSys 2021 - Proceedings of the 16th European Conference on Computer Systems Propagation,Security
85 VPatcher: VMI-based transparent data patching to secure software in the cloud Zhang, Hao and Zhao, Lei and Xu, Lai and Wang, Lina and Wu, Deming 2015 Proceedings - 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2014 Detection,Security
86 We'll Fix It in Post: What Do Bug Fixes in Video Game Update Notes Tell Us? Truelove, Andrew and Santana de Almeida, Eduardo and Ahmed, Iftekhar 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Empirical
87 When App Stores Listen to the Crowd to Fight Bugs in the Wild Gomez, Maria and Martineza, Matias and Monperrus, Martin and Rouvoy, Romain 2015 Proceedings - International Conference on Software Engineering E2E Tool
88 Binary change set composition Tijs Van Der Storm 2007 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Propagation
89 Self-patch: Beyond patch tuesday for containerized applications Olufogorehan Tunde-Onadele and Yuhang Lin and Jingzhu He and Xiaohui Gu 2020 Proceedings - 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems, ACSOS 2020 Detection,Hotpatch Generation,Security
90 Microreboot-A Technique for Cheap Recovery George Candea and Shinichi Kawamoto and Yuichi Fujiki and Greg Friedman and Armando Fox 2004 arXiv Propagation
91 Production-driven patch generation Thomas Durieux and Youssef Hamadi and Martin Monperrus 2017 Proceedings - 2017 IEEE/ACM 39th International Conference on Software Engineering: New Ideas and Emerging Results Track, ICSE-NIER 2017 E2E Tool
92 Recovery from Software Failures Caused by Mandelbugs Grottke, Michael and Kim, Dong Seong and Mansharamani, Rajesh and Nambiar, Manoj and Natella, Roberto and Trivedi, Kishor S. 2016 IEEE Transactions on Reliability Empirical
93 An Empirical Investigation of Incident Triage for Online Service Systems Chen, Junjie and He, Xiaoting and Lin, Qingwei and Xu, Yong and Zhang, Hongyu and Hao, Dan and Gao, Feng and Xu, Zhangwei and Dang, Yingnong and Zhang, Dongmei 2019 Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2019 Empirical
94 An Empirical Study on Change-induced Incidents of Online Service Systems Wu, Yifan and Chai, Bingxu and Li, Ying and Liu, Bingchang and Li, Jianguo and Yang, Yong and Jiang, Wei 2023 Proceedings - International Conference on Software Engineering Empirical
95 Assess and Summarize: Improve Outage Understanding with Large Language Models Jin, Pengxiang and Zhang, Shenglin and Ma, Minghua and Li, Haozhe and Kang, Yu and Li, Liqun and Liu, Yudong and Qiao, Bo and Zhang, Chaoyun and Zhao, Pu and He, Shilin and Sarro, Federica and Dang, Yingnong and Rajmohan, Saravan and Lin, Qingwei and Zhang, Dongmei 2023 ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering Debug Assistance,Empirical,Operator Tooling
96 Automatically and Adaptively Identifying Severe Alerts for Online Service Systems Zhao, Nengwen and Jin, Panshi and Wang, Lixin and Yang, Xiaoqin and Liu, Rong and Zhang, Wenchi and Sui, Kaixin and Pei, Dan 2020 Proceedings - IEEE INFOCOM Operator Tooling,Triage and Response
97 AutoTSG: learning and synthesis for incident troubleshooting Shetty, Manish and Bansal, Chetan and Upadhyayula, Sai Pramod and Radhakrishna, Arjun and Gupta, Anurag 2022 ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering Empirical,Operator Tooling
98 Why Do Computers Stop and What Can Be Done About It? Gray, J 1986 Symposium on Reliability in Distributed Software and Database Systems Empirical
99 Why Do Internet Services Fail, and What Can Be Done About It? Oppenheimer, David and Ganapathi, Archana and Patterson, David A 2003 Symposium on Reliability in Distributed Software and Database Systems Empirical
100 Why does the cloud stop computing? Lessons from hundreds of service outages Gunawi, Haryadi S. and Hao, Mingzhe and Suminto, Riza O. and Laksono, Agung and Satria, Anang D. and Adityatama, Jeffry and Eliazar, Kurnia J. 2016 Proceedings of the 7th ACM Symposium on Cloud Computing, SoCC 2016 Empirical
101 ConfSeer: leveraging customer support knowledge bases for automated misconfiguration detection Potharaju, Rahul and Chan, Joseph and Hu, Luhui and Nita-Rotaru, Cristina and Wang, Mingshi and Zhang, Liyuan and Jain, Navendu 2015 Proceedings of the VLDB Endowment Detection
102 Continuous incident triage for large-scale online service systems Chen, Junjie and He, Xiaoting and Lin, Qingwei and Zhang, Hongyu and Hao, Dan and Gao, Feng and Xu, Zhangwei and Dang, Yingnong and Zhang, Dongmei 2019 Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019 Triage and Response
103 Critical event prediction for proactive management in large-scale computer clusters Sahoo, R K and Oliner, A J and Rish, I and Gupta, M and Moreira, J E and Ma, S and Vilalta, R and Sivasubramaniam, A 2003 Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Detection
104 DeCaf: Diagnosing and triaging performance issues in large-scale cloud services Bansal, Chetan and Renganathan, Sundararajan and Asudani, Ashima and Midy, Olivier and Janakiraman, Mathru 2020 Proceedings - International Conference on Software Engineering Debug Assistance,Triage and Response
105 DeepTriage: Automated Transfer Assistance for Incidents in Cloud Services Pham, Phuong and Jain, Vivek and Dauterman, Lukas and Ormont, Justin and Jain, Navendu 2020 Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Triage and Response
106 Efficient customer incident triage via linking with system incidents Gu, Jiazhen and Wen, Jiaqi and Wang, Zijian and Zhao, Pu and Luo, Chuan and Kang, Yu and Zhou, Yangfan and Yang, Li and Sun, Jeffrey and Xu, Zhangwei and Qiao, Bo and Li, Liqun and Lin, Qingwei and Zhang, Dongmei 2020 ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering Triage and Response
107 Empowering Practical Root Cause Analysis by Large Language Models for Cloud Incidents Chen, Yinfang and Xie, Huaibing and Ma, Ming-Jie and Kang, Yu and Gao, Xin and Shi, Liu and Cao, Yunjie and Gao, Xue-Chao and Fan, Hao and Wen, Ming and Zeng, Jun and Ghosh, Supriyo and Zhang, Xuchao and Zhang, Chaoyun and Lin, Qingwei and Rajmohan, S and Zhang, Dongmei 2023 arXiv.org Operator Tooling,Triage and Response
108 Experience report on applying software analytics in incident management of online service Lou, Jian Guang and Lin, Qingwei and Ding, Rui and Fu, Qiang and Zhang, Dongmei and Xie, Tao 2017 Automated Software Engineering Operator Tooling,Triage and Response
109 Failure Recovery: When the Cure Is Worse Than the Disease Guo, Zhenyu and McDirmid, Sean and Yang, Mao and Zhuang, Li and Zhang, Pu and Luo, Yingwei and Bergan, Tom and Musuvathi, Madan and Zhang, Zheng and Zhou, Lidong 2013 Automated Software Engineering Empirical
110 Fast outage analysis of large-scale production clouds with service correlation mining Wang, Yaohui and Li, Guozheng and Wang, Zijian and Kang, Yu and Zhou, Yangfan and Zhang, Hongyu and Gao, Feng and Sun, Jeffrey and Yang, Li and Lee, Pochian and Xu, Zhangwei and Zhao, Pu and Qiao, Bo and Li, Liqun and Zhang, Xu and Lin, Qingwei 2021 Proceedings - International Conference on Software Engineering Debug Assistance,Triage and Response
111 Fighting the Fog of War: Automated Incident Detection for Cloud Systems Li, Liqun and Zhang, Xu and Zhao, Xin and Zhang, Hongyu and Kang, Yu and Zhao, Pu and Qiao, Bo and He, Shilin and Lee, Pochian and Sun, Jeffrey and Gao, Feng and Yang, Li and Lin, Qingwei and Rajmohan, Saravanakumar and Xu, Zhangwei and Zhang, Dongmei 2021 Proceedings - International Conference on Software Engineering Detection
112 Hot-patching Platform for Executable and Linkable Format Binary Application for System Resilience Jeong, Haegeon and Kang, Kyungtae and An, Jinsung 2023 Proceedings of the ACM Symposium on Applied Computing Hotpatch Generation
113 How Incidental are the Incidents? Characterizing and Prioritizing Incidents for Large-Scale Online Service Systems Chen, Junjie and Zhang, Shu and He, Xiaoting and Lin, Qingwei and Zhang, Hongyu and Hao, Dan and Kang, Yu and Gao, Feng and Xu, Zhangwei and Dang, Yingnong and Zhang, Dongmei 2020 Proceedings - 2020 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020 Empirical,Triage and Response
114 How Long Will it Take to Mitigate this Incident for Online Service Systems? Wang, Weijing and Chen, Junjie and Yang, Lin and Zhang, Hongyu and Zhao, Pu and Qiao, Bo and Kang, Yu and Lin, Qingwei and Rajmohan, Saravanakumar and Gao, Feng and Xu, Zhangwei and Dang, Yingnong and Zhang, Dongmei 2021 Proceedings - International Symposium on Software Reliability Engineering, ISSRE Empirical,Operator Tooling
115 How to Fight Production Incidents? An Empirical Study on a Large-scale Cloud Service Ghosh, Supriyo and Shetty, Manish and Bansal, Chetan and Nath, Suman 2022 SoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing Empirical
116 How to Manage Change-Induced Incidents? Lessons from the Study of Incident Life Cycle Zhao, Yujin and Jiang, Ling and Tao, Ye and Zhang, Songlin and Wu, Changlong and Wu, Yifan and Jia, Tong and Li, Ying and Wu, Zhonghai 2023 Proceedings - International Symposium on Software Reliability Engineering, ISSRE Empirical
117 How to mitigate the incident? an effective troubleshooting guide recommendation technique for online service systems Jiang, Jiajun and Lu, Weihai and Chen, Junjie and Lin, Qingwei and Zhao, Pu and Kang, Yu and Zhang, Hongyu and Xiong, Yingfei and Gao, Feng and Xu, Zhangwei and Dang, Yingnong and Zhang, Dongmei 2020 ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering Empirical,Operator Tooling
118 How to tame your online services Lin, Qingwei and Lou, Jian Guang and Zhang, Hongyu and Zhang, Dongmei 2016 Perspectives on Data Science for Software Engineering Triage and Response
119 IDice: Problem identification for emerging issues Lin, Qingwei and Lou, Jian Guang and Zhang, Hongyu and Zhang, Dongmei 2016 Proceedings - International Conference on Software Engineering Debug Assistance,Operator Tooling
120 IFeedback: Exploiting user feedback for real-time issue detection in large-scale online service systems Zheng, Wujie and Lu, Haochuan and Zhou, Yangfan and Liang, Jianming and Zheng, Haibing and Deng, Yuetang 2019 Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019 Detection
121 Learning a hierarchical monitoring system for detecting and diagnosing service issues Nair, Vinod and Raul, Ameya and Khanduja, Shwetabh and Bahirwani, Vikas and Shao, Qihong and Sundararajan, S. and Keerthi, Sathiya and Herbert, Steve and Dhulipalla, Sudheer 2015 Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Detection
122 Leveraging Large Language Models for the Auto-remediation of Microservice Applications: An Experimental Study Sarda, Komal and Namrud, Zakeya and Litoiu, Marin and Shwartz, Larisa and Watts, Ian 2024 FSE Companion - Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering Patch Generation
123 LogFlash: Real-time Streaming Anomaly Detection and Diagnosis from System Logs for Large-scale Software Systems Jia, Tong and Wu, Yifan and Hou, Chuanjia and Li, Ying 2021 Proceedings - International Symposium on Software Reliability Engineering, ISSRE Detection
124 Xpert: Empowering Incident Management with Query Recommendations via Large Language Models Jiang, Yuxuan and Zhang, Chaoyun and He, Shilin and Yang, Zhihao and Ma, Minghua and Qin, Si and Kang, Yu and Dang, Yingnong and Rajmohan, Saravan and Lin, Qingwei and Zhang, Dongmei 2023 Proceedings - International Conference on Software Engineering Operator Tooling
125 Mining Root Cause Knowledge from Cloud Service Incident Investigations for AIOps Saha, Amrita and Hoi, Steven C. H. 2022 IEEE Aerospace Conference Proceedings Debug Assistance,Triage and Response
126 Neural knowledge extraction from cloud service incidents Shetty, Manish and Bansal, Chetan and Kumar, Sumit and Rao, Nikitha and Nagappan, Nachiappan and Zimmermann, Thomas 2021 Proceedings - International Conference on Software Engineering Triage and Response
127 Not as easy as just update: Survey of System Administrators and Patching Behaviours Jenkins, Adam and Wolters, Maria and Liu, Linsen and Vaniea, Kami 2024 Conference on Human Factors in Computing Systems - Proceedings Empirical
128 Outage prediction and diagnosis for cloud service systems Chen, Yujun and Yang, Xian and Lin, Qingwei and Zhang, Dongmei and Dong, Hang and Xu, Yong and Li, Hao and Kang, Yu and Zhang, Hongyu and Gao, Feng and Xu, Zhangwei and Dang, Yingnong 2019 The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019 Detection,Operator Tooling
129 Predicting remediations for hardware failures in large-scale datacenters Lin, Fred and Davoli, Antonio and Akbar, Imran and Kalmanje, Sukumar and Silva, Leandro and Stamford, John and Golany, Yanai and Piazza, Jim and Sankar, Sriram 2020 Proceedings - 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks: Supplemental Volume, DSN-S 2020 Patch Generation
130 Predictive and Adaptive Failure Mitigation to Avert Production Cloud VM Interruptions Zheng, Lianmin and Berkeley, Uc and Jia, Chengfan and Sun, Minmin and Wu, Zhao and Group, Alibaba and Yu, Cody Hao and Haj-Ali, Ameer and Wang, Yida and Yang, Jun and Zhuo, Danyang and Sen, Koushik and Gonzalez, Joseph E and Stoica, Ion 2020 Proceedings - 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks: Supplemental Volume, DSN-S 2020 Detection
131 RAPID: Real-Time Alert Investigation with Context-aware Prioritization for Efficient Threat Discovery Liu, Yushan and Shu, Xiaokui and Sun, Yixin and Jang, Jiyong and Mittal, Prateek 2022 ACM International Conference Proceeding Series Operator Tooling,Triage and Response
132 Real-time incident prediction for online service systems Zhao, Nengwen and Chen, Junjie and Wang, Zhou and Peng, Xiao and Wang, Gang and Wu, Yong and Zhou, Fang and Feng, Zhen and Nie, Xiaohui and Zhang, Wenchi and Sui, Kaixin and Pei, Dan 2020 ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering Detection
133 Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models Ahmed, Toufique and Ghosh, Supriyo and Bansal, Chetan and Zimmermann, Thomas and Zhang, Xuchao and Rajmohan, Saravan 2023 Proceedings - International Conference on Software Engineering Triage and Response
134 RESIN: A Holistic Service for Dealing with Memory Leaks in Production Cloud Infrastructure Lou, Chang and Chen, Cong and Huang, Peng and Dang, Yingnong and Qin, Si and Yang, Xinsheng and Li, Xukun and Lin, Qingwei and Chintalapati, Murali 2022 Proceedings - International Conference on Software Engineering Detection,Patch Generation
135 Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed {Data-Intensive Yuan, Ding and Luo, Yu and Zhuang, Xin and Rodrigues, Guilherme Renna and Zhao, Xu and Zhang, Yongle and Jain, Pranay U. and Stumm, Michael 2014 Proceedings - International Conference on Software Engineering Detection
136 Six Years and 184 Tickets: The Vast Scope of the Mars Science Laboratory's Ultimate Flight Software Release Holloway, Alexandra and Denison, Jonathan and Patel, Neel and Maimone, Mark and Rankin, Arturo 2023 IEEE Aerospace Conference Proceedings Empirical
137 Testing Configuration Changes in Context to Prevent Production Failures Zheng, Lianmin and Berkeley, Uc and Jia, Chengfan and Sun, Minmin and Wu, Zhao and Group, Alibaba and Yu, Cody Hao and Haj-Ali, Ameer and Wang, Yida and Yang, Jun and Zhuo, Danyang and Sen, Koushik and Gonzalez, Joseph E and Stoica, Ion 2020 IEEE Aerospace Conference Proceedings Detection
138 Towards intelligent incident management: why we need it and how we make it Chen, Zhuangbin and Kang, Yu and Li, Liqun and Zhang, Xu and Zhang, Hongyu and Xu, Hui and Zhou, Yangfan and Yang, Li and Sun, Jeffrey and Xu, Zhangwei and Dang, Yingnong and Gao, Feng and Zhao, Pu and Qiao, Bo and Lin, Qingwei and Zhang, Dongmei and Lyu, Michael R. 2020 Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering Empirical
139 TraceArk: Towards Actionable Performance Anomaly Alerting for Online Service Systems Zeng, Zhengran and Zhang, Yuqun and Xu, Yong and Ma, Minghua and Qiao, Bo and Zou, Wentao and Chen, Qingjun and Zhang, Meng and Zhang, Xu and Zhang, Hongyu and Gao, Xuedong and Fan, Hao and Rajmohan, Saravan and Lin, Qingwei and Zhang, Dongmei 2023 Proceedings - International Conference on Software Engineering Detection
140 What bugs cause production cloud incidents? Liu, Haopeng and Lu, Shan and Musuvathi, Madan and Nath, Suman 2019 Proceedings of the Workshop on Hot Topics in Operating Systems, HotOS 2019 Empirical