Skip to content
This repository was archived by the owner on Nov 19, 2020. It is now read-only.

Speed optimization: Replace multi-dimentional with jagged arrays in I… #825

Merged
merged 2 commits into from
Sep 7, 2017

Conversation

volgunin
Copy link

Hi,

I optimized the performance of SURF feature detector a little bit by using jagged instead of multi-dimensional arrays in IntegralImage and SURF ResponseLayer.

My tests show ~10-15% performance gain, which is due to the poor implementation of multi-dimensional arrays in CLR. Microsoft recommends using jagged arrays instead.

This is apparently a braking change as some public properties in IntegralImage and ResponseLayer class were affected.

Regards,
Alexander

@CatchemAL
Copy link
Collaborator

Hi Alexander,

Thank you very much for your contribution. It's great to see you speeding things up! - and fixing the typo :)

As this is a breaking change, the cost of the speed performance will need to be weighed up against the disruption caused to existing users. @cesarsouza will be best placed to comment there. However, it would be helpful if you could indicate what parts in particular you have seen a speedup in (is it a particular method or do you have a code snippet you could post? etc.).

@cesarsouza - I had a very quick read through of this code (so I may very well have missed the bottleneck) but I noticed that these lines here and here could be sped up by using the (mostly pre-existing) unsafe code more effectively. For instance, in the second link, the bounds check on the image could be done once outside the loop and then then the IntegralImage arrays could be fixed - both examples traverse row-wise through the arrays which is ideal. Also - perhaps this can be optimized a little but, as I am not familiar with the underlying code, it is difficult to identify the obvious bottlenecks without a little more info.

Thanks again
Alex

@cesarsouza
Copy link
Member

cesarsouza commented Aug 31, 2017

Hi everyone!

Alexander, thanks a lot for the contribution! And Alex thanks a lot for doing a first pass over it and raising those concerns!

@volgunin: This is a very nice contribution, and I think it does not have to be a breaking change. The first observation is that the ResponseLayer is an internal class, so we can modify it as needed without breaking user code. And the second is that, instead of modifying the return type of the InternalData property of IntegralImage, we can add another property such as public uint[][] Matrix to return the current integralImage, then update the InternalData property to convert integralImage to a jagged matrix using the ToJagged() method and finally mark it as obsolete. At least this would not result in compilation errors for the end user. However, please also see the my alternative suggestion for speeding up this class below.

As @AlexJCross, I also think that using unsafe code should also help. However, I think it will only help if we pin the integralImage array at the class constructor to avoid fixing it every time the GetRectangleSum() is called.

Let me explain: The main method of the IntegralImage class is the GetRectangleSum() method that is often called multiple times at once (i.e. such as in the Compute(IntegralImage image) method of ResponseLayer linked by @AlexJCross). If we were to keep the matrices as multidimensional but use unsafe code to avoid the bound checks, it means that we would need to either fix the array every time GetRectangleSum is called or we would have to pin the matrix at the class constructor then release it using a Dispose pattern. In my experience, there is a considerable overhead in using fixed so I think we would really need to pin the matrix at construction time to have an advantage over jagged arrays. In any case, both implementations would need to be tested and compared against using jagged arrays if we would like to be sure we are optimizing the code properly.

@AlexJCross: Just to address some of the other points you had raised, the "CheckBounds" method you see is actually marked with a [Conditional("DEBUG")] attribute, meaning it is not there when the project is compiled in release mode. So it doesn't really have a cost, even being inside the for loop. Regarding the H.Inverse(inPlace: true).Dot(d), actually the dot operation could have been computed without any loops at all since the Hessian is always 3x3 (if you take a look at the source code for Inverse, you can see that it actually has a special case for 3x3 matrices). However, if we were to start adding special cases for 2 or 3-dimensional vectors and matrices, perhaps we should also take a look at using SSE operations (it has actually been some time since I last looked into it, and at the time, .NET's SIMD implementation was not really ready for prime time - but surely things have improved since then, so it might be worth taking a look again!)

Regards,
Cesar

@CatchemAL
Copy link
Collaborator

Nice one @cesarsouza - a lot of good points there. I should have read the code a bit slower :) I missed both the internal and conditional! I never knew the Inverse(...) handled 3x3's like that - that's a really cool touch. Of the two, perhaps the jagged implementation proposed by @volgunin is the easiest and best way to go.

@CatchemAL CatchemAL closed this Aug 31, 2017
@CatchemAL CatchemAL reopened this Aug 31, 2017
@CatchemAL
Copy link
Collaborator

oops, sorry i closed and reopened this. that's just my fat fingers hitting the wrong button - I do it all the time! As I was saying, jaggeds are definitely faster for general use so it sounds like the way to go here. I don't have massive familiarity with this code but if it's mission critical and we'd like to find out the results of constructor pinning, I'm sure I could cook up a benchmark. let me know.

Finally, @cesarsouza, I have also not looked at SIMD for a while. From my recollection, I thought it was rather limited to the float data type (not double) but if that's changed or I've misunderstood, definitely create an issue and I'll get researching SIMD!

Thanks,
Alex

@volgunin
Copy link
Author

Thanks everyone for looking into this.

I missed that ResponseLayer is internal. There should not be any braking problems there.

For IntegralImage, as @cesarsouza has suggested, I can keep the old InternalData property interface intact with added lazy initialization/conversion from jagged data just in case someone uses it and mark it as obsolete. That would increase memory usage in apps that use InternalData property, but all existing code would continue to compile. The new public uint[][] Matrix property will provide access to jagged data.

The method that benefits from change is SpeededUpRobustFeaturesDetector.ProcessImage as it accesses IntegralImage and ResponseLayer intensively. FastRetinaKeypointDescriptor would also benefit, but I am not using it in my app and do not know the measure of acceleration.

@cesarsouza
Copy link
Member

Hi @volgunin,

Thanks a lot, after those small changes the PR could be accepted right away. Thanks a lot for the contribution!

And @AlexJCross thanks a lot for the help!

Regards,
Cesar

@volgunin
Copy link
Author

volgunin commented Sep 7, 2017

Hi everyone!

I just added Matrix property as promised.

Regards,
Alexander

@cesarsouza
Copy link
Member

Hi @volgunin,

It is amazing! Thanks a lot for your contribution!

Regards,
Cesar

@cesarsouza cesarsouza merged commit 8b4a429 into accord-net:development Sep 7, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants