Optimised fitness functions for automated improvement of software's execution time

Nov 16, 2025·

Dimitrios Stamatios Bouras

Carol Hanna

Justyna Petke

· 0 min read

Abstract

Precise measurement of software execution time is challenging due to environmental variability and measurement overheads, an issue critical for search-based software improvement systems that evaluate thousands of variants. While precise measurements offer precise fitness measures, they often introduce a significant time overhead. To understand which measures are most effective as fitness functions in search-based software optimisation, we conducted an empirical study of 21 approximates of execution time. These included hardware-level counters from perf, RAPL energy, and a custom measure based on weighted instruction cycles. To improve reliability, we evaluated each fitness function up to five times, using medians to reduce noise. We integrated the 13 most promising measures into a search-based software optimisation framework called MAGPIE. We evaluated these fitness functions plus Time, already present in MAGPIE, on 7 benchmarks using both code-level and parameter-level mutations. To assess generalizability, we tested the best performing measures with the parameter tuning tool ParamILS and analyzed how tool and search strategy affect outcomes. Our results show that perf’s cycles measure yields the best overall performance, outperforming Time by 5.1%. Sampling three times balances reliability and exploration. Energy and the weight-based measure excel in specific scenarios, with weights being the best for parameter optimization on MAGPIE, but are better suited to longer searches due to their overhead. We highlight a trade-off- low-overhead measures like Time work well for short runs, while robust measures such as cycles and weights benefit longer ones.

Type

Conference paper

Publication

In Symposium on Search-Based Software Engineering