Research from LIX presented at AAAI 2022
While one of the tasks of computer scientists is to design useful algorithms, an equally important one is to ensure that these algorithms work as intended. This guarantee comes from theoretical understanding of the algorithm machinery and mathematical proofs that everything works fine. “But for almost all algorithms used for real-world applications, such proofs don’t exist” explains Benjamin Doerr, professor at the Computer Science Laboratory of the Ecole Polytechnique (*LIX). Together with Weijie Zheng from Southern University of Science and Technology in China and Yufei Liu, bachelor student at Ecole Polytechnique, they have provided a first theoretical analysis of an algorithm called NSGA-II.
NSGA-II stands for “Non-Dominated Sorting Genetic Algorithm II” and is the most intensively used multi-objective evolutionary algorithm. It provides solutions to optimization problems, where several competing goals come into play. A simple example: if you want to buy an electric car with a low price and a large range at the same time, what are the best sets of parameters for the car (battery capacity, size of the wheels, etc.)?
NSGA-II is a genetic algorithm, inspired by the principles of biological evolution. It starts with a fixed-size “population” of possible solutions, i.e., sets of car parameters. By applying small changes to the parameters of these solutions, akin to "mutations" in living organisms, the algorithm creates an equal number of new possible solutions. Then it selects the best solutions regarding the goals to obtain a new population with the same size as the starting population. Subsequently, these steps are repeated.
Selecting the new population of solutions from the old one is the core of NSGA-II. First, it picks all the “non-dominated” solutions, for which no other solution is better for both objectives. These solutions are directly included in the new population. Then it selects the ones that are not dominated by any of the remaining solutions, and so on. Eventually, it reaches the point where some solutions need to be selected for the next generation, but not all of them. To sort them, NSGA-II uses a mathematical tiebreaker called “crowding distance”.
These sorting steps were a challenge to analyse for Benjamin Doerr and his colleagues. “Large population sizes are always more complicated. It is impossible to precisely follow the complicated population dynamics. However, small differences in the population can have drastic effects. This is particularly true for the selection step” he explains. Nevertheless, in the work presented at AAAI 2022, the team was able to perform a runtime analysis of the NSGA-II, meaning that they theoretically determined how long the NSGA-II takes to find all the relevant solutions for two classic benchmark problems. Interestingly, they show that the algorithm may reject relevant solutions even when the population size is as large as the number of relevant solutions. This may not be an issue for today’s practical use of NSGA-II, as starting with a larger population can overcome this problem. “However, it means that the tiebreaking mechanism is not perfect,” explains Benjamin Doerr. “These are important insights, in particular for the future development of this algorithm, and we will keep working on it.”
*LIX: a joint research unit CNRS, École Polytechnique - Institut Polytechnique de Paris