Skip to content

Conversation

krypt-n
Copy link
Contributor

@krypt-n krypt-n commented Jul 9, 2019

Issue

#254

Tasks

  • check the performance improvement on different benchmarks and different hardware
  • review

I had some old commits lying around that apply some micro optimizations with the goal of improving the performance of vroom somewhat. I think slight loss in maintainability and modularity is justified by the gained speedup.

Benchmark results on the Solomon VRPTW instances:

master:
,Gaps,Computing times
Min,0.0,106
First decile,0.0,171
Lower quartile,0.84,264
Median,3.51,358
Upper quartile,5.49,914
Ninth decile,7.65,1218
Max,11.68,2230

this branch:
,Gaps,Computing times
Min,0.0,71
First decile,0.0,141
Lower quartile,0.84,193
Median,3.51,280
Upper quartile,5.49,632
Ninth decile,7.65,786
Max,11.68,870

The computed solutions are exactly the same. I do expect similar computing time gains on other benchmarks, but I have not measured it yet

krypt-n added 9 commits July 9, 2019 10:59
faster: vector<bool> is space efficient but not fast since single bits
need to be extracted from memory. Indexing with size_ts removes the need
of zero-extending before indexing.
speeds up constructors of all operator implementations
get_matrix is called in some pretty hot functions. Getting rid of the
call overhead shaves of up to 5% runtime for solomon vrptw instances
@jcoupey jcoupey added this to the v1.5.0 milestone Jul 9, 2019
@jcoupey
Copy link
Collaborator

jcoupey commented Jul 9, 2019

@krypt-n thanks for the PR, this looks great, especially the worst-case computing time improvement. I'll run some experiments on my side with various VRPTW/CVRP benchmarks and report. Only TSP won't be affected because it has it's own logic.

I'm fine with all the changes, but I'd like to understand the std::vector<bool> part thoroughly. I've read your comment in 485d609 but I don't quite understand what makes it faster in this case, despite the various casts when filling _vehicle_to_job_compatibility and accessing values from vehicle_ok_with_jobs?

Also is there a reason for not applying the same logic to the _vehicle_to_vehicle_compatibility vector?

@krypt-n
Copy link
Contributor Author

krypt-n commented Jul 9, 2019

https://godbolt.org/z/LiwMb1 shows the difference between the two variants. std::vector<bool> is specialized to store 8 booleans in a byte and thus requires some bit-arithmetic to extract values. std::vector<unsigned char> is not and thus can extract values with a simple address dereference.

despite the various casts when filling _vehicle_to_job_compatibility and accessing values from vehicle_ok_with_jobs

vroom far more often reads the values then it writes them. And as can be seen in the link above, the cast just results in a comparison with 0.

Also is there a reason for not applying the same logic to the _vehicle_to_vehicle_compatibility vector?

_vehicle_to_vehicle_compatiblity didn't show up as important during profiling

I measured with single thread by the way, multiple threads may show less speed gain.

@jcoupey
Copy link
Collaborator

jcoupey commented Jul 9, 2019

Thanks for the explanation. I'll report when I'm able to run some tests on my side.

@jcoupey
Copy link
Collaborator

jcoupey commented Jul 12, 2019

I did compare current master at 4486868 with this PR using 8c49bdb, running on my usual test machine with -t 8 across all values of -x on all CVRP + VRPTW benchmarks.

  • All solutions are strictly identical
  • Average computing times for CVRP instances are consistently down by ~25%
  • Average computing times for VRPTW instances are a bit further reduced (around 26-27% for Homberger, up to 30-33% for Solomon)

This is great!

@krypt-n I think we should mention this refactor in the changelog.

@krypt-n
Copy link
Contributor Author

krypt-n commented Jul 23, 2019

That's great

I added a changelog entry with the last commit. Apologies for the delay

@jcoupey jcoupey merged commit f42315a into VROOM-Project:master Aug 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants