This website hosts the referenced publications from the paper "Hot Fixing Software: A Comprehensive Review of Terminology, Techniques, and Applications" by Carol Hanna, Justyna Petke, David Clark, and Federica Sarro from University College London.
While hot fixing is an essential and common activity in software maintenance, it has never been surveyed as a research activity. Thus, such a review is long overdue. In this work, we conduct a comprehensive
literature review of work on hot fixing. We highlight the fields where this topic has been addressed, inconsistencies we identified in the terminology, gaps in the literature, and directions for future work.
Our search concluded with 87 papers on the topic between the year 2000 and 2022. The papers found encompass many different research areas such as log analysis, runtime patching, and automated repair,
as well as varying application domains such as security, mobile, and video games.
We find that there are many directions that can take hot fix research forward such as unifying existing terminology, establishing a benchmark set of hot fixes, researching costs and
frequency of hot fixes, and researching the possibility of end-to-end automation of detection, mitigation, and propagation. We discuss these avenues in detail to inspire the community to
systematize hot fixing as a software engineering activity.
We hope that this work streamlines the existing body of work
and drives research in the area forward.
ID | Title | Authors | Year | Venue | Tags |
---|---|---|---|---|---|
1 | A case study of measuring degeneration of software architectures from a defect perspective | Li, Zude and Long, Jun | 2011 | Proceedings - Asia-Pacific Software Engineering Conference, APSEC | Empirical |
2 | A Case Study of Open Source Software Development: The Apache Server | Mockus, Audris and Fielding, Roy T and Herbsleb, James and Labs, Bell and Blvd, Shuman | 2000 | ICSE '00: Proceedings of the 22nd international conference on Software engineering | Empirical |
3 | A debugging approach for live Big Data applications | Marra, Matteo and Polito, Guillermo and Gonzalez Boix, Elisa | 2020 | Science of Computer Programming | Debug Assistance,Detection |
4 | A Scalable Framework for Provisioning Large-Scale IoT Deployments | Vogler, Michael and Schleicher, Johannes M. and Inzinger, Christian and Dustdar, Schahram | 2016 | ACM Transactions on Internet Technology (TOIT) | Propagation |
5 | A Semi-Distributed Self-Healing Protocol for Run-Time Repairs of Time-Triggered Schedules | Pozo, Francisco and Rodriguez-Navas, Guillermo | 2019 | IEEE International Conference on Emerging Technologies and Factory Automation, ETFA | Hotpatch Generation |
6 | Advanced Tools for Operators at Amazon.com | Bodik, Peter and Fox, Armando and Jordan, Michael I and Patterson, David and Banerjee, Ajit and Jagannathan, Ramesh and Su, Tina and Tenginakai, Shivaraj and Turner, Ben and Ingalls, Jon and Lab, Rad and Berkeley, U C and University, Stanford | 2006 | Hot Topics in Autonomic Computing (HotAC) | Detection,Operator Tooling |
7 | An automation framework for configuration management to reduce manual intervention | Karale, Supriya V. and Kaushal, Vishal | 2016 | ACM International Conference Proceeding Series | Configuration Management |
8 | An empirical study of emergency updates for top android mobile apps | Hassan, Safwat and Shang, Weiyi and Hassan, Ahmed E. | 2017 | Empirical Software Engineering | Empirical |
9 | An Empirical Study on Quality Issues of Production Big Data Platform | Zhou, Hucheng and Lou, Jian-Guang and Zhang, Hongyu and Lin, Haibo and Lin, Haoxiang and Qin, Tingting | 2015 | IEEE/ACM 37th IEEE International Conference on Software Engineering | Empirical |
10 | An entropy evaluation approach for triaging field crashes: A case study of Mozilla Firefox | Khomh, Foutse and Chan, Brian and Zou, Ying and Hassan, Ahmed E. | 2011 | Proceedings - Working Conference on Reverse Engineering, WCRE | Detection |
11 | App Store 2.0: From Crowdsourced Information to Actionable Feedback in Mobile Ecosystems | Gomez, Maria and Adams, Bram and Maalej, Walid and Monperrus, Martin and Rouvoy, Romain | 2017 | IEEE Software | E2E Tool |
12 | Applicable Micropatches and Where to Find Them: Finding and Applying New Security Hot Fixes to Old Software | Malone, Mac and Wang, Yicheng and Snow, Kevin and Monrose, Fabian | 2021 | Proceedings - 2021 IEEE 14th International Conference on Software Testing, Verification and Validation, ICST 2021 | Empirical |
13 | AppSealer: Automatic Generation of Vulnerability-Specific Patches for Preventing Component Hijacking Attacks in Android Applications | Zhang, Mu and Yin, Heng | 2014 | NDSS | Patch Generation,Security |
14 | Automated atomicity-violation fixing | Jin, Guoliang and Song, Linhai and Zhang, Wei and Lu, Shan and Liblit, Ben | 2011 | Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) | E2E Tool |
15 | Automated known problem diagnosis with event traces | Yuan, Chun and Lao, Ni and Wen, Ji Rong and Li, Jiwei and Zhang, Zheng and Wang, Yi Min and Ma, Wei Ying | 2006 | Proceedings of the 2006 EuroSys Conference | Detection |
16 | Automatic Hot Patch Generation for Android Kernels | Xu, Zhengzi and Zhang, Yulong and Zheng, Longri and Xia, Liangzhao and Bao, Chenfu and X-Lab, Baidu and Wang, Zhi and Liu, Yang and Longri, Baidu X-Lab and Baidu, Zheng and Liangzhao, X-Lab and Baidu, Xia and Chenfu, X-Lab and Baidu, Bao and Wang, X-Lab Zhi | 2020 | SEC'20: Proceedings of the 29th USENIX Conference on Security Symposium | Binary-level,Hotpatch Generation,Runtime,Security |
17 | Automatically patching errors in deployed software | Perkins, Jeff H. and Kim, Sunghun and Larsen, Sam and Amarasinghe, Saman and Bachrach, Jonathan and Carbin, Michael and Pacheco, Carlos and Sherwood, Frank and Sidiroglou, Stelios and Sullivan, Greg and Wong, Weng Fai and Zibin, Yoav and Ernst, Michael D. and Rinard, Martin | 2009 | SOSP'09 - Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles | E2E Tool |
18 | Autonomous hot patching for web-based applications | Huang, Hai and Tsai, Wei Tek and Chen, Yinong | 2005 | Proceedings - International Computer Software and Applications Conference | Binary-level,E2E Tool,Patch Generation |
19 | AutoPaG: Towards automated software patch generation with source code root cause identification and repair | Lin, Zhiqiang and Jiang, Xuxian and Xu, Dongyan and Mao, Bing and Xie, Li | 2007 | eProceedings of the 2nd ACM Symposium on Information, Computer and Communications Security, ASIACCS '07 | Patch Generation,Security |
20 | Auto-patching DOM-based XSS at scale | Parameshwaran, Inian and Budianto, Enrico and Shinde, Shweta and Dang, Hung and Sadhu, Atul and Saxena, Prateek | 2015 | 2015 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2015 - Proceedings | Patch Generation,Security |
21 | Band-aid Patching | Sidiroglou, Stelios and Ioannidis, Sotiris and Keromytis, Angelos D. | 2007 | Third Workshop on Hot Topics in System Dependability (HotDep'07) | Binary-level,Hotpatch Generation |
22 | Binary Quilting to Generate Patched Executables without Compilation | Saieva, Anthony and Kaiser, Gail | 2020 | FEAST 2020 - Proceedings of the 2020 ACM Workshop on Forming an Ecosystem Around Software Transformation | Binary-level,Hotpatch Generation |
23 | Building a Reactive Immune System for Software Services | Sidiroglou, Stelios and Locasto, Michael E. and Boyd, Stephen W. and Keromytis, Angelos D. | 2005 | FEAST 2020 - Proceedings of the 2020 ACM Workshop on Forming an Ecosystem Around Software Transformation | E2E Tool |
24 | Continuous deployment at Facebook and OANDA | Savor, Tony and Douglas, Mitchell and Gentili, Michael and Williams, Laurie and Beck, Kent and Stumm, Michael | 2016 | Proceedings - International Conference on Software Engineering | Empirical |
25 | Continuous release and upgrade of component-based software | Van Der Storm, Tijs | 2005 | Proceedings of the 12th International Workshop on Software Configuration Management, SCM 2005 | Propagation |
26 | CRANE: Failure prediction, change analysis and test prioritization in practice - Experiences from windows | Czerwonka, Jacek and Das, Rajiv and Nagappan, Nachiappan and Tarvo, Alex and Teterev, Alex | 2011 | Proceedings - 4th IEEE International Conference on Software Testing, Verification, and Validation, ICST 2011 | Detection |
27 | Cross-Stack Threat Sensing for Cyber Security and Resilience | Araujo, Frederico and Taylor, Teryl and Zhang, Jialong and Stoecklin, Marc Ph | 2018 | Proceedings - 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, DSN-W 2018 | Detection |
28 | Dataflow analysis for known vulnerability prevention system | Qin, Lifang and Li, Yichao and Yue, Cao | 2008 | 2008 IEEE International Conference on Cybernetics and Intelligent Systems, CIS 2008 | Detection,Hotpatch Generation,Security |
29 | Debugging in the (very) large: Ten years of implementation and experience | Glerum, Kirk and Kinshumann, Kinshuman and Greenberg, Steve and Aul, Gabriel and Orgovan, Vince and Nichols, Greg and Grant, David and Loihle, Gretchen and Hunt, Galen | 2009 | SOSP'09 - Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles | Bug reporting,Operator Tooling |
30 | Do faster releases improve software quality? An empirical case study of Mozilla Firefox | Khomh, Foutse and Dhaliwal, Tejinder and Zou, Ying and Adams, Bram | 2012 | IEEE International Working Conference on Mining Software Repositories | Empirical |
31 | Embroidery: Patching vulnerable binary code of fragmentized android devices | Zhang, Xuewen and Zhang, Yuanyuan and Li, Juanru and Hu, Yikun and Li, Huayi and Gu, Dawu | 2017 | Proceedings - 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017 | Binary-level,Hotpatch Generation,Security |
32 | Ensembles of models for automated diagnosis of system performance problems | Zhang, Steve and Cohen, Ira and Goldszmidt, Moises and Symons, Julie and Fox, Armando | 2005 | Proceedings of the International Conference on Dependable Systems and Networks | Detection |
33 | Exploring the relationship of a file's history and its fault-proneness: An empirical study | Illes-Seifert, Timea and Paech, Barbara | 2008 | Proceedings - Testing: Academic and Industrial Conference Practice and Research Techniques, TAIC PART 2008 | Empirical |
34 | Exterminator: Automatically correcting memory errors with high probability | Novark, Gene and Berger, Emery D. and Zorn, Benjamin G. | 2007 | Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) | Detection,Patch Generation |
35 | Field studies of computer system administrators: analysis of system management tools and practices | Barrett, Rob and Kandogan, Eser and Maglio, Paul P. and Haber, Eben M. and Takayama, Leila A. and Prabaker, Madhu | 2004 | CSCW '04: Proceedings of the 2004 ACM conference on Computer supported cooperative work | Empirical |
36 | Fingerprinting the datacenter: Automated classification of performance crises | Bodik, Peter and Goldszmidt, Moises and Fox, Armando and Woodard, Dawn B. and Andersen, Hans | 2010 | EuroSys'10 - Proceedings of the EuroSys 2010 Conference | Detection,Operator Tooling |
37 | First-aid: Surviving and preventing memory management bugs during production runs | Gao, Qi and Zhang, Wenbin and Tang, Yan and Qin, Feng | 2009 | Proceedings of the 4th ACM European Conference on Computer Systems, EuroSys'09 | Operator Tooling,Patch Generation |
38 | Handling vulnerabilities with mobile agents in order to consider the delay and disruption tolerant characteristic of military networks | Aurisch, Thorsten and Jacke, Andreas | 2018 | 2018 International Conference on Military Communications and Information Systems, ICMCIS 2018 | Detection,Patch Generation,Security |
39 | Healing online service systems via mining historical issue repositories | Ding, Rui and Fu, Qiang and Lou, Jian Guang and Lin, Qingwei and Zhang, Dongmei and Shen, Jiajun and Xie, Tao | 2012 | 2012 27th IEEE/ACM International Conference on Automated Software Engineering, ASE 2012 - Proceedings | Patch Generation |
40 | High-impact defects: A study of breakage and surprise defects | Shihab, Emad and Mockus, Audri and Kamei, Yasutaka and Adams, Bram and Hassan, Ahmed E. | 2011 | SIGSOFT/FSE 2011 - Proceedings of the 19th ACM SIGSOFT Symposium on Foundations of Software Engineering | Detection |
41 | Hot-patching a web server: A case study of ASAP code repair | Payer, Mathias and Gross, Thomas R. | 2013 | 2013 11th Annual Conference on Privacy, Security and Trust, PST 2013 | Propagation,Runtime |
42 | Identifying impactful service system problems via log analysis | He, Shilin and Lin, Qingwei and Lou, Jian Guang and Zhang, Hongyu and Lyu, Michael R. and Zhang, Dongmei | 2018 | ESEC/FSE 2018 - Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering | Bug reporting,Detection |
43 | Improving cybersecurity hygiene through JIT patching | Araujo, Frederico and Taylor, Teryl | 2020 | ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering | Propagation,Runtime,Security |
44 | InstaGuard: Instantly Deployable Hot-patches for Vulnerable System Programs on Android | Chen, Yaohui and Li, Yuping and Lu, Long and Lin, Yueh-Hsun and Vijayakumar, Hayawardh and Wang, Zhi and {Xinming Ou | 2018 | Network and Distributed System Security Symposium (NDSS'18) | Hotpatch Generation,Propagation,Security |
45 | Katana: A hot patching framework for ELF executables | Ramaswamy, Ashwin and Bratus, Sergey and Smith, Sean W. and Locasto, Michael E. | 2010 | ARES 2010 - 5th International Conference on Availability, Reliability, and Security | Binary-level,Hotpatch Generation,Propagation |
46 | Keepers of the Machines: Examining How System Administrators Manage Software Updates | Li, Frank and Chetty, Marshini and Rogers, Lisa and Mathur, Arunesh and Malkin, Nathan | 2019 | Fifteenth Symposium on Usable Privacy and Security (SOUPS 2019) | Empirical,Operator Tooling |
47 | LEONORE - Large-scale provisioning of resource-constrained IoT deployments | Vogler, Michael and Schleicher, Johannes M. and Inzinger, Christian and Nastic, Stefan and Sehic, Sanjin and Dustdar, Schahram | 2015 | Proceedings - 9th IEEE International Symposium on Service-Oriented System Engineering, IEEE SOSE 2015 | Propagation |
48 | Mining historical issue repositories to heal large-scale online service systems | Ding, Rui and Fu, Qiang and Lou, Jian Guang and Lin, Qingwei and Zhang, Dongmei and Xie, Tao | 2014 | Proceedings of the International Conference on Dependable Systems and Networks | Patch Generation |
49 | Onion: Identifying incident-indicating logs for cloud systems | Zhang, Xu and Xu, Yong and Qin, Si and He, Shilin and Qiao, Bo and Li, Ze and Zhang, Hongyu and Li, Xukun and Dang, Yingnong and Lin, Qingwei and Chintalapati, Murali and Rajmohan, Saravanakumar and Zhang, Dongmei | 2021 | ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering | Bug reporting,Operator Tooling |
50 | Online Model-Based Clustering for Crisis Identification in Distributed Computing | Woodard, Dawn B. and Goldszmidt, Moises | 2012 | Journal of the American Statistical Association | Bug reporting,Detection |
51 | Patch me if you can: A study on the effects of individual user behavior on the end-host vulnerability state | Sarabi, Armin and Zhu, Ziyun and Xiao, Chaowei and Liu, Mingyan and Dumitras, Tudor | 2017 | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Empirical |
52 | PatchDroid: Scalable third-party security patches for android devices | Mulliner, Collin and Oberheide, Jon and Robertson, William and Kirda, Engin | 2013 | ACM International Conference Proceeding Series | Propagation,Security |
53 | Performance issue diagnosis for online service systems | Fu, Qiang and Lou, Jian Guang and Lin, Qing Wei and Ding, Rui and Zhang, Dongmei and Ye, Zihao and Xie, Tao | 2012 | Proceedings of the IEEE Symposium on Reliable Distributed Systems | Detection |
54 | Poster AutoPatch: Automatic Hotpatching of Real-Time Embedded Devices | Salehi, Mohsen and Pattabiraman, Karthik | 2022 | Proceedings of the ACM Conference on Computer and Communications Security | Hotpatch Generation,Security |
55 | Precise and Accurate Patch Presence Test for Binaries | Zhang, Hang and Qian, Zhiyun | 2018 | 27th USENIX Security Symposium (USENIX Security 18) | Binary-level,Detection,Security |
56 | Predicting bug-fixing time: An empirical study of commercial software projects | Zhang, Hongyu and Gong, Liang and Versteeg, Steve | 2013 | Proceedings - International Conference on Software Engineering | Empirical |
57 | ProbeGuard: Mitigating Probing Attacks Through Reactive Program Transformations | Bhat, Koustubha and Van Der Kouwe, Erik and Bos, Herbert and Giuffrida, Cristiano | 2019 | International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS | Detection,Hotpatch Generation,Security |
58 | R2C: Robust Rolling-Upgrade in Clouds | Sun, Daniel and Fekete, Alan and Gramoli, Vincent and Li, Guoqiang and Xu, Xiwei and Zhu, Liming | 2018 | IEEE Transactions on Dependable and Secure Computing | Detection,Propagation |
59 | Recovery from failures due to Mandelbugs in IT systems | Trivedi, Kishor S. and Mansharamani, Rajesh and Kim, Dong Seong and Grottke, Michael and Nambiar, Manoj | 2011 | Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC | Empirical |
60 | Recursive restartability: Turning the reboot sledgehammer into a scalpel | Candea, George and Fox, Armando | 2001 | Proceedings of the Workshop on Hot Topics in Operating Systems - HOTOS | Propagation,Runtime |
61 | ReDAC - Dynamic reconfiguration of distributed component-based applications with cyclic dependencies | Rasche, Andreas and Polze, Andreas | 2008 | Proceedings - 11th IEEE Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2008 | Configuration Management,Runtime |
62 | Rx: Treating bugs as allergies - A safe method to survive software failures | Qin, Feng and Tucek, Joseph and Sundaresan, Jagadeesan and Zhou, Yuanyuan | 2005 | Proceedings of the 20th ACM Symposium on Operating Systems Principles, SOSP 2005 | Symptom Mitigation |
63 | Security vulnerabilities in Javascript hotpatching in iOS with a commercial and open-source tool | Ford, Sarah and Olmsted, Aspen | 2018 | International Conference on Information Society, i-Society 2017 | Propagation,Security |
64 | Security-related vulnerability life cycle analysis | Marconato, Geraldine Vache and Nicomette, Vincent and Kaaniche, Mohamed | 2012 | 7th International Conference on Risks and Security of Internet and Systems, CRiSIS 2012 | Empirical |
65 | ShieldGen: Automatic data patch generation for unknown vulnerabilities with informed probing | Cui, Weidong and Peinado, Marcus and Wang, Helen J. and Locasto, Michael E. | 2007 | Proceedings - IEEE Symposium on Security and Privacy | Detection,Patch Generation,Security |
66 | Software analytics for incident management of online services: An experience report | Lou, Jian Guang and Lin, Qingwei and Ding, Rui and Fu, Qiang and Zhang, Dongmei and Xie, Tao | 2013 | 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013 - Proceedings | Bug reporting,Operator Tooling,Symptom Mitigation |
67 | Source Code and Binary Level Vulnerability Detection and Hot Patching | Xu, Zhengzi | 2020 | Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering | Detection,Hotpatch Generation,Runtime,Security |
68 | SPIDER: Enabling fast patch propagation in related software repositories | MacHiry, Aravind and Redini, Nilo and Camellini, Eric and Kruegel, Christopher and Vigna, Giovanni | 2020 | Proceedings - IEEE Symposium on Security and Privacy | Patch Generation,Security |
69 | STOP: Socio-temporal opportunistic patching of short range mobile malware | Tang, John and Kim, Hyoungshick and Mascolo, Cecilia and Musolesi, Mirco | 2012 | 2012 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks, WoWMoM 2012 - Digital Proceedings | Propagation,Security |
70 | Striving for Failure: An Industrial Case Study About Test Failure Prediction | Anderson, Jeff and Salem, Saeed and Do, Hyunsook | 2015 | IEEE/ACM 37th IEEE International Conference on Software Engineering | Empirical |
71 | Studying the urgent updates of popular games on the Steam platform | Lin, Dayi and Bezemer, Cor Paul and Hassan, Ahmed E. | 2017 | Empirical Software Engineering | Empirical |
72 | Sweeper: A lightweight end-to-end system for defending against fast worms | Tucek, Joseph and Lu, Shan and Huang, Chengdu and Xanthos, Spiros and Zhou, Yuanyuan and Newsome, James and Brumley, David and Song, Dawn | 2007 | Operating Systems Review (ACM) | Detection,Security,Symptom Mitigation |
73 | Synergistic debug-repair of heap manipulations | Verma, Sahil and Roy, Subhajit | 2017 | Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering | Debug Assistance,Hotpatch Generation |
74 | Tales of software updates: The process of updating software | Vaniea, Kami and Rashidi, Yasmeen | 2016 | Conference on Human Factors in Computing Systems - Proceedings | Empirical |
75 | Talos: Neutralizing Vulnerabilities with Security Workarounds for Rapid Response | Huang, Zhen and Dangelo, Mariana and Miyani, Dhaval and Lie, David | 2016 | Proceedings - 2016 IEEE Symposium on Security and Privacy, SP 2016 | Security,Symptom Mitigation |
76 | The empirical commit frequency distribution of open source projects | Kolassa, Carsten and Riehle, Dirk and Salim, Michel A. | 2013 | Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013 | Empirical |
77 | TJOSConf: Automatic and Safe System Environment Operations Platform | Wang, Yida and Jiang, Shuangshuang and Cui, Bin | 2022 | ACM International Conference Proceeding Series | Operator Tooling |
78 | Toward Just-in-Time Patching for Containerized Applications | Tunde-Onadele, Olufogorehan and Carolina, North and Lin, Yuhang and He, Jingzhu and Gu, Xiaohui | 2020 | Proceedings of the 7th Symposium on Hot Topics in the Science of Security | Detection,Patch Generation,Security |
79 | Towards release strategy optimization for apps in Google Play | Shen, Sheng and Lu, Xuan and Hu, Ziniu and Liu, Xuanzhe | 2017 | ACM International Conference Proceeding Series | Empirical |
80 | Triage: Diagnosing production run failures at the user's site | Tucek, Joseph and Lu, Shan and Huang, Chengdu and Xanthos, Spiros and Zhou, Yuanyuan | 2007 | Operating Systems Review (ACM) | Detection,Patch Generation |
81 | Troubleshooting Transiently-Recurring Errors in Production Systems with Blame-Proportional Logging Troubleshooting Transiently-Recurring Problems in Production Systems with Blame-Proportional Logging | Luo, Liang and Nath, Suman and Sivalingam, Ravindranath and Musuvathi, Madan and Ceze, Luis | 2018 | USENIX Annual Technical Conference (USENIX ATC 18) | Detection,Operator Tooling |
82 | Update with care: Testing candidate bug fixes and integrating selective updates through binary rewriting | Saieva, Anthony and Kaiser, Gail | 2022 | Journal of Systems and Software | Bug reporting,Detection,Propagation |
83 | Using pre-release test failures to build early post-release defect prediction models | Herzig, Kim | 2014 | Proceedings - International Symposium on Software Reliability Engineering, ISSRE | Empirical |
84 | Virtual machine preserving host updates for zero day patching in public cloud | Russinovich, Mark and Govindaraju, Naga and Raghuraman, Melur and Hepkin, David and Schwartz, Jamie and Kishan, Arun | 2021 | EuroSys 2021 - Proceedings of the 16th European Conference on Computer Systems | Propagation,Security |
85 | VPatcher: VMI-based transparent data patching to secure software in the cloud | Zhang, Hao and Zhao, Lei and Xu, Lai and Wang, Lina and Wu, Deming | 2015 | Proceedings - 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2014 | Detection,Security |
86 | We'll Fix It in Post: What Do Bug Fixes in Video Game Update Notes Tell Us? | Truelove, Andrew and Santana de Almeida, Eduardo and Ahmed, Iftekhar | 2021 | IEEE/ACM 43rd International Conference on Software Engineering (ICSE) | Empirical |
87 | When App Stores Listen to the Crowd to Fight Bugs in the Wild | Gomez, Maria and Martineza, Matias and Monperrus, Martin and Rouvoy, Romain | 2015 | Proceedings - International Conference on Software Engineering | E2E Tool |
88 | Binary change set composition | Tijs Van Der Storm | 2007 | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Propagation |
89 | Self-patch: Beyond patch tuesday for containerized applications | Olufogorehan Tunde-Onadele and Yuhang Lin and Jingzhu He and Xiaohui Gu | 2020 | Proceedings - 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems, ACSOS 2020 | Detection,Hotpatch Generation,Security |
90 | Microreboot-A Technique for Cheap Recovery | George Candea and Shinichi Kawamoto and Yuichi Fujiki and Greg Friedman and Armando Fox | 2004 | arXiv | Propagation |
91 | Production-driven patch generation | Thomas Durieux and Youssef Hamadi and Martin Monperrus | 2017 | Proceedings - 2017 IEEE/ACM 39th International Conference on Software Engineering: New Ideas and Emerging Results Track, ICSE-NIER 2017 | E2E Tool |
92 | Recovery from Software Failures Caused by Mandelbugs | Grottke, Michael and Kim, Dong Seong and Mansharamani, Rajesh and Nambiar, Manoj and Natella, Roberto and Trivedi, Kishor S. | 2016 | IEEE Transactions on Reliability | Empirical |
93 | An Empirical Investigation of Incident Triage for Online Service Systems | Chen, Junjie and He, Xiaoting and Lin, Qingwei and Xu, Yong and Zhang, Hongyu and Hao, Dan and Gao, Feng and Xu, Zhangwei and Dang, Yingnong and Zhang, Dongmei | 2019 | Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2019 | Empirical |
94 | An Empirical Study on Change-induced Incidents of Online Service Systems | Wu, Yifan and Chai, Bingxu and Li, Ying and Liu, Bingchang and Li, Jianguo and Yang, Yong and Jiang, Wei | 2023 | Proceedings - International Conference on Software Engineering | Empirical |
95 | Assess and Summarize: Improve Outage Understanding with Large Language Models | Jin, Pengxiang and Zhang, Shenglin and Ma, Minghua and Li, Haozhe and Kang, Yu and Li, Liqun and Liu, Yudong and Qiao, Bo and Zhang, Chaoyun and Zhao, Pu and He, Shilin and Sarro, Federica and Dang, Yingnong and Rajmohan, Saravan and Lin, Qingwei and Zhang, Dongmei | 2023 | ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering | Debug Assistance,Empirical,Operator Tooling |
96 | Automatically and Adaptively Identifying Severe Alerts for Online Service Systems | Zhao, Nengwen and Jin, Panshi and Wang, Lixin and Yang, Xiaoqin and Liu, Rong and Zhang, Wenchi and Sui, Kaixin and Pei, Dan | 2020 | Proceedings - IEEE INFOCOM | Operator Tooling,Triage and Response |
97 | AutoTSG: learning and synthesis for incident troubleshooting | Shetty, Manish and Bansal, Chetan and Upadhyayula, Sai Pramod and Radhakrishna, Arjun and Gupta, Anurag | 2022 | ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering | Empirical,Operator Tooling |
98 | Why Do Computers Stop and What Can Be Done About It? | Gray, J | 1986 | Symposium on Reliability in Distributed Software and Database Systems | Empirical |
99 | Why Do Internet Services Fail, and What Can Be Done About It? | Oppenheimer, David and Ganapathi, Archana and Patterson, David A | 2003 | Symposium on Reliability in Distributed Software and Database Systems | Empirical |
100 | Why does the cloud stop computing? Lessons from hundreds of service outages | Gunawi, Haryadi S. and Hao, Mingzhe and Suminto, Riza O. and Laksono, Agung and Satria, Anang D. and Adityatama, Jeffry and Eliazar, Kurnia J. | 2016 | Proceedings of the 7th ACM Symposium on Cloud Computing, SoCC 2016 | Empirical |
101 | ConfSeer: leveraging customer support knowledge bases for automated misconfiguration detection | Potharaju, Rahul and Chan, Joseph and Hu, Luhui and Nita-Rotaru, Cristina and Wang, Mingshi and Zhang, Liyuan and Jain, Navendu | 2015 | Proceedings of the VLDB Endowment | Detection |
102 | Continuous incident triage for large-scale online service systems | Chen, Junjie and He, Xiaoting and Lin, Qingwei and Zhang, Hongyu and Hao, Dan and Gao, Feng and Xu, Zhangwei and Dang, Yingnong and Zhang, Dongmei | 2019 | Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019 | Triage and Response |
103 | Critical event prediction for proactive management in large-scale computer clusters | Sahoo, R K and Oliner, A J and Rish, I and Gupta, M and Moreira, J E and Ma, S and Vilalta, R and Sivasubramaniam, A | 2003 | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining | Detection |
104 | DeCaf: Diagnosing and triaging performance issues in large-scale cloud services | Bansal, Chetan and Renganathan, Sundararajan and Asudani, Ashima and Midy, Olivier and Janakiraman, Mathru | 2020 | Proceedings - International Conference on Software Engineering | Debug Assistance,Triage and Response |
105 | DeepTriage: Automated Transfer Assistance for Incidents in Cloud Services | Pham, Phuong and Jain, Vivek and Dauterman, Lukas and Ormont, Justin and Jain, Navendu | 2020 | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining | Triage and Response |
106 | Efficient customer incident triage via linking with system incidents | Gu, Jiazhen and Wen, Jiaqi and Wang, Zijian and Zhao, Pu and Luo, Chuan and Kang, Yu and Zhou, Yangfan and Yang, Li and Sun, Jeffrey and Xu, Zhangwei and Qiao, Bo and Li, Liqun and Lin, Qingwei and Zhang, Dongmei | 2020 | ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering | Triage and Response |
107 | Empowering Practical Root Cause Analysis by Large Language Models for Cloud Incidents | Chen, Yinfang and Xie, Huaibing and Ma, Ming-Jie and Kang, Yu and Gao, Xin and Shi, Liu and Cao, Yunjie and Gao, Xue-Chao and Fan, Hao and Wen, Ming and Zeng, Jun and Ghosh, Supriyo and Zhang, Xuchao and Zhang, Chaoyun and Lin, Qingwei and Rajmohan, S and Zhang, Dongmei | 2023 | arXiv.org | Operator Tooling,Triage and Response |
108 | Experience report on applying software analytics in incident management of online service | Lou, Jian Guang and Lin, Qingwei and Ding, Rui and Fu, Qiang and Zhang, Dongmei and Xie, Tao | 2017 | Automated Software Engineering | Operator Tooling,Triage and Response |
109 | Failure Recovery: When the Cure Is Worse Than the Disease | Guo, Zhenyu and McDirmid, Sean and Yang, Mao and Zhuang, Li and Zhang, Pu and Luo, Yingwei and Bergan, Tom and Musuvathi, Madan and Zhang, Zheng and Zhou, Lidong | 2013 | Automated Software Engineering | Empirical |
110 | Fast outage analysis of large-scale production clouds with service correlation mining | Wang, Yaohui and Li, Guozheng and Wang, Zijian and Kang, Yu and Zhou, Yangfan and Zhang, Hongyu and Gao, Feng and Sun, Jeffrey and Yang, Li and Lee, Pochian and Xu, Zhangwei and Zhao, Pu and Qiao, Bo and Li, Liqun and Zhang, Xu and Lin, Qingwei | 2021 | Proceedings - International Conference on Software Engineering | Debug Assistance,Triage and Response |
111 | Fighting the Fog of War: Automated Incident Detection for Cloud Systems | Li, Liqun and Zhang, Xu and Zhao, Xin and Zhang, Hongyu and Kang, Yu and Zhao, Pu and Qiao, Bo and He, Shilin and Lee, Pochian and Sun, Jeffrey and Gao, Feng and Yang, Li and Lin, Qingwei and Rajmohan, Saravanakumar and Xu, Zhangwei and Zhang, Dongmei | 2021 | Proceedings - International Conference on Software Engineering | Detection |
112 | Hot-patching Platform for Executable and Linkable Format Binary Application for System Resilience | Jeong, Haegeon and Kang, Kyungtae and An, Jinsung | 2023 | Proceedings of the ACM Symposium on Applied Computing | Hotpatch Generation |
113 | How Incidental are the Incidents? Characterizing and Prioritizing Incidents for Large-Scale Online Service Systems | Chen, Junjie and Zhang, Shu and He, Xiaoting and Lin, Qingwei and Zhang, Hongyu and Hao, Dan and Kang, Yu and Gao, Feng and Xu, Zhangwei and Dang, Yingnong and Zhang, Dongmei | 2020 | Proceedings - 2020 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020 | Empirical,Triage and Response |
114 | How Long Will it Take to Mitigate this Incident for Online Service Systems? | Wang, Weijing and Chen, Junjie and Yang, Lin and Zhang, Hongyu and Zhao, Pu and Qiao, Bo and Kang, Yu and Lin, Qingwei and Rajmohan, Saravanakumar and Gao, Feng and Xu, Zhangwei and Dang, Yingnong and Zhang, Dongmei | 2021 | Proceedings - International Symposium on Software Reliability Engineering, ISSRE | Empirical,Operator Tooling |
115 | How to Fight Production Incidents? An Empirical Study on a Large-scale Cloud Service | Ghosh, Supriyo and Shetty, Manish and Bansal, Chetan and Nath, Suman | 2022 | SoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing | Empirical |
116 | How to Manage Change-Induced Incidents? Lessons from the Study of Incident Life Cycle | Zhao, Yujin and Jiang, Ling and Tao, Ye and Zhang, Songlin and Wu, Changlong and Wu, Yifan and Jia, Tong and Li, Ying and Wu, Zhonghai | 2023 | Proceedings - International Symposium on Software Reliability Engineering, ISSRE | Empirical |
117 | How to mitigate the incident? an effective troubleshooting guide recommendation technique for online service systems | Jiang, Jiajun and Lu, Weihai and Chen, Junjie and Lin, Qingwei and Zhao, Pu and Kang, Yu and Zhang, Hongyu and Xiong, Yingfei and Gao, Feng and Xu, Zhangwei and Dang, Yingnong and Zhang, Dongmei | 2020 | ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering | Empirical,Operator Tooling |
118 | How to tame your online services | Lin, Qingwei and Lou, Jian Guang and Zhang, Hongyu and Zhang, Dongmei | 2016 | Perspectives on Data Science for Software Engineering | Triage and Response |
119 | IDice: Problem identification for emerging issues | Lin, Qingwei and Lou, Jian Guang and Zhang, Hongyu and Zhang, Dongmei | 2016 | Proceedings - International Conference on Software Engineering | Debug Assistance,Operator Tooling |
120 | IFeedback: Exploiting user feedback for real-time issue detection in large-scale online service systems | Zheng, Wujie and Lu, Haochuan and Zhou, Yangfan and Liang, Jianming and Zheng, Haibing and Deng, Yuetang | 2019 | Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019 | Detection |
121 | Learning a hierarchical monitoring system for detecting and diagnosing service issues | Nair, Vinod and Raul, Ameya and Khanduja, Shwetabh and Bahirwani, Vikas and Shao, Qihong and Sundararajan, S. and Keerthi, Sathiya and Herbert, Steve and Dhulipalla, Sudheer | 2015 | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining | Detection |
122 | Leveraging Large Language Models for the Auto-remediation of Microservice Applications: An Experimental Study | Sarda, Komal and Namrud, Zakeya and Litoiu, Marin and Shwartz, Larisa and Watts, Ian | 2024 | FSE Companion - Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering | Patch Generation |
123 | LogFlash: Real-time Streaming Anomaly Detection and Diagnosis from System Logs for Large-scale Software Systems | Jia, Tong and Wu, Yifan and Hou, Chuanjia and Li, Ying | 2021 | Proceedings - International Symposium on Software Reliability Engineering, ISSRE | Detection |
124 | Xpert: Empowering Incident Management with Query Recommendations via Large Language Models | Jiang, Yuxuan and Zhang, Chaoyun and He, Shilin and Yang, Zhihao and Ma, Minghua and Qin, Si and Kang, Yu and Dang, Yingnong and Rajmohan, Saravan and Lin, Qingwei and Zhang, Dongmei | 2023 | Proceedings - International Conference on Software Engineering | Operator Tooling |
125 | Mining Root Cause Knowledge from Cloud Service Incident Investigations for AIOps | Saha, Amrita and Hoi, Steven C. H. | 2022 | IEEE Aerospace Conference Proceedings | Debug Assistance,Triage and Response |
126 | Neural knowledge extraction from cloud service incidents | Shetty, Manish and Bansal, Chetan and Kumar, Sumit and Rao, Nikitha and Nagappan, Nachiappan and Zimmermann, Thomas | 2021 | Proceedings - International Conference on Software Engineering | Triage and Response |
127 | Not as easy as just update: Survey of System Administrators and Patching Behaviours | Jenkins, Adam and Wolters, Maria and Liu, Linsen and Vaniea, Kami | 2024 | Conference on Human Factors in Computing Systems - Proceedings | Empirical |
128 | Outage prediction and diagnosis for cloud service systems | Chen, Yujun and Yang, Xian and Lin, Qingwei and Zhang, Dongmei and Dong, Hang and Xu, Yong and Li, Hao and Kang, Yu and Zhang, Hongyu and Gao, Feng and Xu, Zhangwei and Dang, Yingnong | 2019 | The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019 | Detection,Operator Tooling |
129 | Predicting remediations for hardware failures in large-scale datacenters | Lin, Fred and Davoli, Antonio and Akbar, Imran and Kalmanje, Sukumar and Silva, Leandro and Stamford, John and Golany, Yanai and Piazza, Jim and Sankar, Sriram | 2020 | Proceedings - 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks: Supplemental Volume, DSN-S 2020 | Patch Generation |
130 | Predictive and Adaptive Failure Mitigation to Avert Production Cloud VM Interruptions | Zheng, Lianmin and Berkeley, Uc and Jia, Chengfan and Sun, Minmin and Wu, Zhao and Group, Alibaba and Yu, Cody Hao and Haj-Ali, Ameer and Wang, Yida and Yang, Jun and Zhuo, Danyang and Sen, Koushik and Gonzalez, Joseph E and Stoica, Ion | 2020 | Proceedings - 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks: Supplemental Volume, DSN-S 2020 | Detection |
131 | RAPID: Real-Time Alert Investigation with Context-aware Prioritization for Efficient Threat Discovery | Liu, Yushan and Shu, Xiaokui and Sun, Yixin and Jang, Jiyong and Mittal, Prateek | 2022 | ACM International Conference Proceeding Series | Operator Tooling,Triage and Response |
132 | Real-time incident prediction for online service systems | Zhao, Nengwen and Chen, Junjie and Wang, Zhou and Peng, Xiao and Wang, Gang and Wu, Yong and Zhou, Fang and Feng, Zhen and Nie, Xiaohui and Zhang, Wenchi and Sui, Kaixin and Pei, Dan | 2020 | ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering | Detection |
133 | Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models | Ahmed, Toufique and Ghosh, Supriyo and Bansal, Chetan and Zimmermann, Thomas and Zhang, Xuchao and Rajmohan, Saravan | 2023 | Proceedings - International Conference on Software Engineering | Triage and Response |
134 | RESIN: A Holistic Service for Dealing with Memory Leaks in Production Cloud Infrastructure | Lou, Chang and Chen, Cong and Huang, Peng and Dang, Yingnong and Qin, Si and Yang, Xinsheng and Li, Xukun and Lin, Qingwei and Chintalapati, Murali | 2022 | Proceedings - International Conference on Software Engineering | Detection,Patch Generation |
135 | Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed {Data-Intensive | Yuan, Ding and Luo, Yu and Zhuang, Xin and Rodrigues, Guilherme Renna and Zhao, Xu and Zhang, Yongle and Jain, Pranay U. and Stumm, Michael | 2014 | Proceedings - International Conference on Software Engineering | Detection |
136 | Six Years and 184 Tickets: The Vast Scope of the Mars Science Laboratory's Ultimate Flight Software Release | Holloway, Alexandra and Denison, Jonathan and Patel, Neel and Maimone, Mark and Rankin, Arturo | 2023 | IEEE Aerospace Conference Proceedings | Empirical |
137 | Testing Configuration Changes in Context to Prevent Production Failures | Zheng, Lianmin and Berkeley, Uc and Jia, Chengfan and Sun, Minmin and Wu, Zhao and Group, Alibaba and Yu, Cody Hao and Haj-Ali, Ameer and Wang, Yida and Yang, Jun and Zhuo, Danyang and Sen, Koushik and Gonzalez, Joseph E and Stoica, Ion | 2020 | IEEE Aerospace Conference Proceedings | Detection |
138 | Towards intelligent incident management: why we need it and how we make it | Chen, Zhuangbin and Kang, Yu and Li, Liqun and Zhang, Xu and Zhang, Hongyu and Xu, Hui and Zhou, Yangfan and Yang, Li and Sun, Jeffrey and Xu, Zhangwei and Dang, Yingnong and Gao, Feng and Zhao, Pu and Qiao, Bo and Lin, Qingwei and Zhang, Dongmei and Lyu, Michael R. | 2020 | Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering | Empirical |
139 | TraceArk: Towards Actionable Performance Anomaly Alerting for Online Service Systems | Zeng, Zhengran and Zhang, Yuqun and Xu, Yong and Ma, Minghua and Qiao, Bo and Zou, Wentao and Chen, Qingjun and Zhang, Meng and Zhang, Xu and Zhang, Hongyu and Gao, Xuedong and Fan, Hao and Rajmohan, Saravan and Lin, Qingwei and Zhang, Dongmei | 2023 | Proceedings - International Conference on Software Engineering | Detection |
140 | What bugs cause production cloud incidents? | Liu, Haopeng and Lu, Shan and Musuvathi, Madan and Nath, Suman | 2019 | Proceedings of the Workshop on Hot Topics in Operating Systems, HotOS 2019 | Empirical |