Open Energy Benchmark

This platform contains the results of benchmarking 5 optimization solvers on 120 problems arising from energy system models. For each benchmark run, we measure runtime and memory consumption of the solver, along with other metrics to ensure solution quality across solvers.

Note that we run all solvers with their default options, with some exceptions – see full details on our Methodology page. We also gather information such as the number of variables and constraints for each problem instance, along with information about the scenario being modelled by each problem, this along with download links to each problem can be found on our Benchmark Set page.

This page presents the main takeaways from our benchmark platform, in an introductory and accessible manner. Advanced users, and those wishing to dig into more details can visit the full results in our interactive dashboards.

How good is each solver, and for what cases?

The overall summary of our results is shown in the plot below, which shows the runtime of each solver, relative to the fastest solver, on each subset of our benchmark set. A problem on which a solver timed out or errored is assumed to have a runtime equal to the timeout with which it was run. (More details, and other ways to handle time outs and errors, can be found on our main dashboard). We split our set of problems by problem size and also categorize certain problems as realistic if they arise from, or have similar model features as, models used in real-world energy planning studies. Hovering over any bar on the plot above will show you the average runtime of that solver on the subset of benchmarks, along with the percentage of benchmarks it could solve in the time limit.

Runtime relative to fastest solver

The next plot shows the concrete performance of each solver on a few representative realistic problems from a few modelling frameworks in our benchmark set. Hover over the problem name in order to see more details about the benchmark features and why we consider it as representative for that modelling framework. Solvers that timed out or errored on a particular problem are indicated by red text above the corresponding bar. 4 out of the 7 problems can be solved by at least one open source solver, with different solvers (HiGHS or SCIP) providing the best performance on different problems.

Runtime relative to fastest solver

Note: As with all benchmarks, our results provide only an indication of which solvers might be good for your problems. Our benchmark set is not yet as diverse and comprehensive as we would like, see the What benchmark problems do we have section below to view the gaps in our benchmark set. We encourage users to use our scripts to benchmark solvers on their own problems before picking a solver, and also encourage modellers to contribute problems that can help us make our benchmark set more representative and diverse. Reach out to us if you'd like to contribute!

How are solvers evolving over time?

This plot shows the average runtime of each year’s final-released solver version, relative to that year’s fastest solver, over all S and M size benchmarks in our set. This shows the performance evolution of solvers, relative to one another.

SGM Runtime (Relative to Best per Year)

Solver:

The plot below shows the performance evolution of the selected solver individually, relative to the first version of that solver that we have benchmarked. The bars denote the number of unsolved problems in our benchmark set, so the fewer the better. The red line shows the reduction in average runtime over the set relative to the first version (i.e. speedup factor).

More detailed statistics regarding performance evolution of solvers can be seen in our Performance History dashboard, which also allows calculating performance statistics on any subset of benchmarks that are of interest.

What is feasible for open source solvers?

Here are the largest LP and MILP problems that open source solvers can solve, from each modelling framework in our set. Please note that we did not generate / collect benchmark problems with the intention of finding the largest ones solvable by open solvers – so there could be larger problems solvable by open solvers than those in our set. This section can still be used to get an idea of the kinds of spatial and temporal resolutions that are solvable in reasonable time periods by open source solvers, and we encourage the community to contribute more benchmark problems so we can more accurately identify the boundary of feasibility.

Clicking on any benchmark problem name takes you to the benchmark details page that contains more information on the model scenario, various size instances, full results on that problem, and download links to the problem LP/MPS file and solver logs and solution files.

Model Framework	LP Benchmark	Num. variables	Num. constraints	Spatial resolution	Temporal resolution	Solver	Runtime

Note: There are some important caveats to keep in mind when comparing spatial and temporal resolutions across different modelling frameworks. Concerning spatial resolution, regions and number of nodes do not represent the same entity and cannot be compared; in general, some models use nodes to disaggregate the spatial scale due to a larger detail needed on power sector analysis and/or on the feedback of other sectors on the electrical grid, while regions are used to ease data aggregation from energy use statistics and for analysis with a broader focus on the system itself rather than on the physical structure of the (electricity, but not only) network. Furthermore, some nodal models can use a hybrid approach to also aggregate nodes to reflect a regional focus. Regarding temporal resolution, time slices are aggregations of time frames with similar energy production/consumption features. Therefore, to give and idea of the correspondence with a 1 hour resolution-model, a model adopting time slices should consider 8760 time slices per year, with different data (and thus results) associated to each of them.

Given the limitations of our benchmark set, the strongest observable influence on runtime is model size, in terms of number of variables/constraints (see more details in What factors affect solver performance below). This is despite the fact that the above problems do not share many features and are built with different spatial/temporal resolutions and time horizons. It is also interesting that a realistic TEMOA-based problem like temoa-US_9R_TS_SP (9-12) does not have similar runtime to the largest solved TIMES-based model, Times-Ireland-noco2-counties (26-1ts), despite both having > 1e6 variables.

Model Framework	MILP Benchmark	Num. variables	Num. constraints	Spatial resolution	Temporal resolution	Solver	Runtime

We note that we do not yet have large problem instances from some modelling frameworks in our benchmark set. We welcome contributions to fill these gaps!

Benchmark problems corresponding to representative model use-cases

All our technical dashboards can be filtered or focused to the application domain or problem type of interest. All our plots and results are generated on-the-fly when you select any particular filter option. Since this may be overwhelming for some users, we highlight in the table below some particular filter combinations that correspond to representative problems arising from common use-cases of each modelling framework. Click any benchmark problem name to see more details about it, and to view its results.

Framework	Problem Class	Application	Time Horizon	MILP Features	Realistic	Example
GenX	LP	Infrastructure & Capacity Expansion	Single Period	None	Realistic	genx-elec_trex, genx-elec_trex_co2 genx-elec_co2
GenX	MILP	Infrastructure & Capacity Expansion	Single Period	Unit commitment	Realistic	genx-elec_trex_uc
PyPSA	LP	Infrastructure & Capacity Expansion	Single Period	None	Realistic	pypsa-eur-sec pypsa-eur-elec-trex
TEMOA	LP	Infrastructure & Capacity Expansion	Multi Period	None	Realistic	temoa-US_9R_TS temoa-US_9R_TS_NDC temoa-US_9R_TS_NZ temoa-US_9R_TS_SP temoa-US_9R_TS_NZ_trunc_4periods
TIMES	LP	Infrastructure & Capacity Expansion	Multi Period	None	Realistic	TIMES-GEO-global-base TIMES-GEO-global-netzero times-nz-kea

What benchmark problems do we have (and what are missing?)

This section breaks down our current benchmark set according to modelling framework, problem type, application domain, and model features. This highlights the kinds of energy models that we test solvers on, but is also a useful warning of the gaps in our collection.

	DCOPF	GenX	PowerModels	PyPSA	Sienna	TEMOA	TIMES	Tulipa
Problem Classes
LP					✕			✕
MILP						✕	✕
Applications
DC Optimal Power Flow	✕		✕	✕	✕	N.A	N.A	✕
Resource Adequacy		✕	✕	✕	✕	✕	✕	✕
Infrastructure & Capacity Expansion	N.A		✕		✕	✕	✕	✕
Operational	✕	✕	N.A		✕	N.A	N.A	✕
Steady-state Optimal Power Flow	N.A	✕		✕	N.A	N.A	N.A	N.A
Production cost modelling	✕	✕	N.A	✕	✕	✕	✕	✕
Time Horizons
Single Period							✕
Multi Period	✕		✕	✕	✕			✕
MILP Features
None			✕		✕			✕
Unit commitment						✕	✕
Piecewise fuel usage	✕		✕	✕	✕	✕	✕	✕
Transmission switching	✕	✕		✕	✕	✕	✕	✕
Modularity	✕	✕	✕		✕	✕	✕
Realistic
Realistic					✕
Other	✕							✕

* N.A. : the modelling framework does not cover this kind of analysis

For version 2 of our platform, we plan to have a public call for benchmarks to address the gaps above. In particular, we welcome benchmark problem contributions that cover:

“Application”: DC optimal power flow, Operational and Production cost modelling analyses for most modelling frameworks.
“Time horizon”: Multi Period analyses; something particularly important as problems with multiple time horizons are more challenging to solve.
“MILP features”: Unit commitment for TIMES and TEMOA (though, indeed, they do not focus particularly on power sector modelling); other MILP features, such as Piecewise fuel usage, Transmission switching and Modularity are missing for most frameworks.
“Realistic”: Realistic problems are missing for PowerModels and Sienna
Large problem instances are also missing for many model frameworks, see the section What is feasible for open solvers above.

Reach out to us if you'd like to contribute any benchmark problems that can fill the above gaps!

Key Insights

On this page

How good is each solver, and for what cases?

How are solvers evolving over time?

What is feasible for open source solvers?

What factors affect solver performance?

Effect of increasing spatial and temporal resolutions on PyPSA models

Effect of unit commitment (UC) on GenX models

Effect of unit commitment (UC) on PyPSA models

Effect of UC, transmission expansion, and CO2 constraints on GenX models

Effect of increasingly stringent CO2 constraints on TEMOA models

Effect of time horizons on TIMES models

Benchmark problems corresponding to representative model use-cases

What benchmark problems do we have (and what are missing?)