Refactor/micro opts #255

krypt-n · 2019-07-09T09:53:24Z

Issue

Tasks

check the performance improvement on different benchmarks and different hardware
review

I had some old commits lying around that apply some micro optimizations with the goal of improving the performance of vroom somewhat. I think slight loss in maintainability and modularity is justified by the gained speedup.

Benchmark results on the Solomon VRPTW instances:

master:
,Gaps,Computing times
Min,0.0,106
First decile,0.0,171
Lower quartile,0.84,264
Median,3.51,358
Upper quartile,5.49,914
Ninth decile,7.65,1218
Max,11.68,2230

this branch:
,Gaps,Computing times
Min,0.0,71
First decile,0.0,141
Lower quartile,0.84,193
Median,3.51,280
Upper quartile,5.49,632
Ninth decile,7.65,786
Max,11.68,870

The computed solutions are exactly the same. I do expect similar computing time gains on other benchmarks, but I have not measured it yet

faster: vector<bool> is space efficient but not fast since single bits need to be extracted from memory. Indexing with size_ts removes the need of zero-extending before indexing.

speeds up constructors of all operator implementations

get_matrix is called in some pretty hot functions. Getting rid of the call overhead shaves of up to 5% runtime for solomon vrptw instances

jcoupey · 2019-07-09T13:00:30Z

@krypt-n thanks for the PR, this looks great, especially the worst-case computing time improvement. I'll run some experiments on my side with various VRPTW/CVRP benchmarks and report. Only TSP won't be affected because it has it's own logic.

I'm fine with all the changes, but I'd like to understand the std::vector<bool> part thoroughly. I've read your comment in 485d609 but I don't quite understand what makes it faster in this case, despite the various casts when filling _vehicle_to_job_compatibility and accessing values from vehicle_ok_with_jobs?

Also is there a reason for not applying the same logic to the _vehicle_to_vehicle_compatibility vector?

krypt-n · 2019-07-09T13:20:28Z

https://godbolt.org/z/LiwMb1 shows the difference between the two variants. std::vector<bool> is specialized to store 8 booleans in a byte and thus requires some bit-arithmetic to extract values. std::vector<unsigned char> is not and thus can extract values with a simple address dereference.

despite the various casts when filling _vehicle_to_job_compatibility and accessing values from vehicle_ok_with_jobs

vroom far more often reads the values then it writes them. And as can be seen in the link above, the cast just results in a comparison with 0.

Also is there a reason for not applying the same logic to the _vehicle_to_vehicle_compatibility vector?

_vehicle_to_vehicle_compatiblity didn't show up as important during profiling

I measured with single thread by the way, multiple threads may show less speed gain.

jcoupey · 2019-07-09T14:45:44Z

Thanks for the explanation. I'll report when I'm able to run some tests on my side.

jcoupey · 2019-07-12T09:05:28Z

I did compare current master at 4486868 with this PR using 8c49bdb, running on my usual test machine with -t 8 across all values of -x on all CVRP + VRPTW benchmarks.

All solutions are strictly identical
Average computing times for CVRP instances are consistently down by ~25%
Average computing times for VRPTW instances are a bit further reduced (around 26-27% for Homberger, up to 30-33% for Solomon)

This is great!

@krypt-n I think we should mention this refactor in the changelog.

krypt-n · 2019-07-23T13:02:55Z

That's great

I added a changelog entry with the last commit. Apologies for the delay

krypt-n added 9 commits July 9, 2019 10:59

Make vehicle_ok_with_job faster and inlineable

485d609

faster: vector<bool> is space efficient but not fast since single bits need to be extracted from memory. Indexing with size_ts removes the need of zero-extending before indexing.

Make job index() inlineable

f6b6504

Make operator constructor inlineable

0fe516c

speeds up constructors of all operator implementations

Reduce allocations in try_job_additions

3e9fa69

Reduce allocations in run_ls_step

c11ab28

Reduce allocations in update_nearest_job_rank_in_routes

cf04181

Reduce allocations in update_amounts

f92d32c

Make input.get_matrix() inlineable

e4ac67d

get_matrix is called in some pretty hot functions. Getting rid of the call overhead shaves of up to 5% runtime for solomon vrptw instances

Format changes

8c49bdb

jcoupey added enhancement refactor labels Jul 9, 2019

jcoupey added this to the v1.5.0 milestone Jul 9, 2019

Add changelog entry for speed up

f42315a

jcoupey merged commit f42315a into VROOM-Project:master Aug 5, 2019

jcoupey mentioned this pull request Aug 5, 2019

Performance improvement opportunities #254

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor/micro opts #255

Refactor/micro opts #255

Uh oh!

krypt-n commented Jul 9, 2019 •

edited by jcoupey

Loading

Uh oh!

jcoupey commented Jul 9, 2019

Uh oh!

krypt-n commented Jul 9, 2019 •

edited

Loading

Uh oh!

jcoupey commented Jul 9, 2019

Uh oh!

jcoupey commented Jul 12, 2019

Uh oh!

krypt-n commented Jul 23, 2019

Uh oh!

Uh oh!

Refactor/micro opts #255

Refactor/micro opts #255

Uh oh!

Conversation

krypt-n commented Jul 9, 2019 • edited by jcoupey Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Tasks

Uh oh!

jcoupey commented Jul 9, 2019

Uh oh!

krypt-n commented Jul 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcoupey commented Jul 9, 2019

Uh oh!

jcoupey commented Jul 12, 2019

Uh oh!

krypt-n commented Jul 23, 2019

Uh oh!

Uh oh!

krypt-n commented Jul 9, 2019 •

edited by jcoupey

Loading

krypt-n commented Jul 9, 2019 •

edited

Loading