Skip to content

Conversation

jetersen
Copy link
Contributor

@jetersen jetersen commented Apr 3, 2022

Reason for change:
Almost 5x saving for single decode
After further changes by adding a optimized splitter using Span<char> it is only a 2x saving.
By adding a specific GetNumberFrom that handles a single number we avoid creating unnecessary array for long[] and can simply pass back long

I think it would greatly benefit to also actually have a GenerateHashFrom that takes single int/long that would mean single encode decode could be 100% allocation free.

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1620 (21H2)
AMD Ryzen 9 3950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.103
  Job-RCMHEX : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT

|               Method |        Job |            Runtime |       Mean |     Error |    StdDev | Code Size |  Gen 0 | Allocated |
|--------------------- |----------- |------------------- |-----------:|----------:|----------:|----------:|-------:|----------:|
|        DecodeNumbers | Job-RCMHEX |           .NET 6.0 |   864.4 ns |   3.99 ns |   3.74 ns |   1,359 B | 0.0181 |     152 B |
|   DecodeSingleNumber | Job-RCMHEX |           .NET 6.0 |   799.2 ns |   5.65 ns |   5.28 ns |     190 B | 0.0038 |      32 B |

@jetersen
Copy link
Contributor Author

jetersen commented Apr 3, 2022

I may have missed the purpose of the _guards split in the GetNumbersFrom and whether it applies to single.

@KeterSCP
Copy link
Contributor

KeterSCP commented Apr 4, 2022

The following code throws NoResultException in your PR if no _guards were used:

var hashids = new HashidsNet.Hashids("salt", minHashLength: 4);

var hash = hashids.Encode(123);
Console.WriteLine(hash);

var decoded = hashids.DecodeSingle(hash); // <-- Exception here
Console.WriteLine(decoded);

@jetersen
Copy link
Contributor Author

jetersen commented Apr 4, 2022

Thanks I'll take a look.

@jetersen
Copy link
Contributor Author

jetersen commented Apr 4, 2022

I think I managed to find a way to actually improve Decode for multiple numbers without allocating string arrays for guards 😄

@jetersen
Copy link
Contributor Author

jetersen commented Apr 4, 2022

@KeterSCP bc5f785 could also apply to multiple decoding.

@KeterSCP
Copy link
Contributor

KeterSCP commented Apr 4, 2022

The following code throws NoResultException in your PR if no _guards were used:

var hashids = new HashidsNet.Hashids("salt", minHashLength: 4);

var hash = hashids.Encode(123);
Console.WriteLine(hash);

var decoded = hashids.DecodeSingle(hash); // <-- Exception here
Console.WriteLine(decoded);

@jetersen this still throws with minHashLength=5

@jetersen
Copy link
Contributor Author

jetersen commented Apr 4, 2022

Damn empty entries 🤣🤣🤣🤣

@jetersen
Copy link
Contributor Author

jetersen commented Apr 4, 2022

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1620 (21H2)
AMD Ryzen 9 3950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.103
  [Host]     : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
  Job-AEAOHU : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT

OutlierMode=DontRemove  Runtime=.NET 6.0  MemoryRandomization=True  

The slight speed difference I think is only down to outliers... Which is set to not remove 🤔

Before: a7478cd

|               Method |       Mean |    Error |   StdDev | Code Size |  Gen 0 | Allocated |
|--------------------- |-----------:|---------:|---------:|----------:|-------:|----------:|
|        RoundtripInts | 3,785.7 ns | 18.53 ns | 17.33 ns |      50 B | 0.0610 |     512 B |
|       RoundtripLongs | 4,287.0 ns | 18.31 ns | 17.13 ns |   3,330 B | 0.0610 |     512 B |
|         RoundtripHex | 3,725.4 ns | 22.88 ns | 21.41 ns |      49 B | 0.1602 |   1,344 B |
|         SingleNumber |   844.0 ns |  3.69 ns |  3.45 ns |     145 B | 0.0076 |      64 B |
|        DecodeNumbers |   862.9 ns |  4.09 ns |  3.82 ns |   1,359 B | 0.0181 |     152 B |
|      RoundtripSingle | 1,255.6 ns |  8.97 ns |  8.39 ns |     304 B | 0.0076 |      64 B |
| SingleNumberAsParams |   862.7 ns |  4.93 ns |  4.61 ns |   2,040 B | 0.0191 |     160 B |

After: a7478cd

|               Method |       Mean |    Error |   StdDev | Code Size |  Gen 0 | Allocated |
|--------------------- |-----------:|---------:|---------:|----------:|-------:|----------:|
|        RoundtripInts | 3,827.1 ns | 25.90 ns | 24.22 ns |      50 B | 0.0305 |     264 B |
|       RoundtripLongs | 4,356.8 ns | 34.19 ns | 31.98 ns |   3,612 B | 0.0229 |     224 B |
|         RoundtripHex | 3,817.9 ns | 13.13 ns | 12.28 ns |      49 B | 0.1335 |   1,128 B |
|         SingleNumber |   846.4 ns |  5.44 ns |  5.09 ns |     145 B | 0.0076 |      64 B |
|        DecodeNumbers |   864.6 ns |  3.67 ns |  3.43 ns |   1,641 B | 0.0076 |      64 B |
|      RoundtripSingle | 1,248.2 ns |  7.87 ns |  7.36 ns |     304 B | 0.0076 |      64 B |
| SingleNumberAsParams |   855.6 ns |  3.93 ns |  3.67 ns |   2,040 B | 0.0191 |     160 B |

@KeterSCP
Copy link
Contributor

KeterSCP commented Apr 4, 2022

Hmm, the following code produces different exception for main and PR branches:

var hashids = new HashidsNet.Hashids("salt", minHashLength: 0, alphabet: "0123456789ABCDEF");

var hash = hashids.EncodeLong(long.MaxValue);
Console.WriteLine(hash);

var decoded = hashids.DecodeSingleLong(hash + "1"); // HashidsNet.NoResultException on main branch and HashidsNet.MultipleResultsException on PR branch
Console.WriteLine(decoded);

@jetersen
Copy link
Contributor Author

jetersen commented Apr 5, 2022

@KeterSCP That's true because I decided to check for separators ahead of time. Previous code was doing after the decode had returned multiple numbers.

So that's a toss up and this behavior was only added in 1.5.0 so which behavior is more correct?

See #57

@KeterSCP
Copy link
Contributor

KeterSCP commented Apr 5, 2022

@KeterSCP That's true because I decided to check for separators ahead of time. Previous code was doing after the decode had returned multiple numbers.

So that's a toss up and this behavior was only added in 1.5.0 so which behavior is more correct?

See #57

I've checked the 1.4.1 version and following sample produces array of 0 length:

var hashids = new HashidsNet.Hashids("salt", minHashLength: 0, alphabet: "0123456789ABCDEF");

var hash = hashids.EncodeLong(1000);
Console.WriteLine(hash);

var decodedArray = hashids.DecodeLong(hash + "1");
Console.WriteLine(decodedArray.Length); // <-- 0 (No results decoded)

While on the PR branch this gives a MultipleResultsException.
Generally speaking, I support such perf changes as long as they mirror old behavior, otherwise, it would introduce a breaking change.

@manigandham
Copy link
Collaborator

@jetersen

Can you please rebase on master to use the latest tests?

Also agree with @KeterSCP that the behavior shouldn't change. If there are no results decoded then it should not be returning a MultipleEntries exception. Is your code finding multiple integers in that hash?

@jetersen
Copy link
Contributor Author

jetersen commented Apr 5, 2022

I took your example @KeterSCP

You specifically added a 1 to decode and with your setup that is one of the separators:

image

So how can DecodeSingle be wrong?

I can remove the check for separators but than I should also remove the MultipleEntriesException cause there is no way to detect other than to look for separators after the result does not match the GenerateHashFrom equals check.

I tried to keep the behavior of DecodeSingleLong as it was coded in 1.5.0:

if (numbers.Length > 1)
throw new MultipleResultsException("The hash provided yielded more than one result.");

@jetersen
Copy link
Contributor Author

jetersen commented Apr 5, 2022

Rebased

@manigandham
Copy link
Collaborator

Rebased

The log doesn't show the changes. You might need to pull down the changes on master first before rebasing...

@jetersen
Copy link
Contributor Author

jetersen commented Apr 5, 2022

Rebased successfully 😅 I forgot to fetch upstream before rebasing.

@KeterSCP
Copy link
Contributor

KeterSCP commented Apr 5, 2022

I took your example @KeterSCP

You specifically added a 1 to decode and with your setup that is one of the separators:

image

So how can DecodeSingle be wrong?

I can remove the check for separators but than I should also remove the MultipleEntriesException cause there is no way to detect other than to look for separators after the result does not match the GenerateHashFrom equals check.

I tried to keep the behavior of DecodeSingleLong as it was coded in 1.5.0:

if (numbers.Length > 1)
throw new MultipleResultsException("The hash provided yielded more than one result.");

Well, I just checked how it behaves for invalid hashes and compared how it did before. If such change is ok for the author of library, then i suppose it can be merged. Code looks good.
cc: @ullmark

As this can never happen without checking for separators.
@jetersen
Copy link
Contributor Author

jetersen commented Apr 5, 2022

I created 535df3d as another way. This would result in NoResultException when trying to parse multiple when using DecodeSingle

I can remove the commit as well.

I have added my reasons for the PRs in the description.

return Array.Empty<long>();
}

private long[] NumbersFrom(string hash)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created this new method to reset stack and avoid stackoverflow 😅

@jetersen
Copy link
Contributor Author

jetersen commented Apr 8, 2022

Before:

|               Method |       Mean |    Error |   StdDev | Code Size |  Gen 0 | Allocated |
|--------------------- |-----------:|---------:|---------:|----------:|-------:|----------:|
|        RoundtripInts | 3,785.7 ns | 18.53 ns | 17.33 ns |      50 B | 0.0610 |     512 B |
|       RoundtripLongs | 4,287.0 ns | 18.31 ns | 17.13 ns |   3,330 B | 0.0610 |     512 B |
|         RoundtripHex | 3,725.4 ns | 22.88 ns | 21.41 ns |      49 B | 0.1602 |   1,344 B |
|         SingleNumber |   844.0 ns |  3.69 ns |  3.45 ns |     145 B | 0.0076 |      64 B |
|        DecodeNumbers |   862.9 ns |  4.09 ns |  3.82 ns |   1,359 B | 0.0181 |     152 B |
|      RoundtripSingle | 1,255.6 ns |  8.97 ns |  8.39 ns |     304 B | 0.0076 |      64 B |
| SingleNumberAsParams |   862.7 ns |  4.93 ns |  4.61 ns |   2,040 B | 0.0191 |     160 B |

After 99a41e4, 395a2d6 and 7567d90:

|               Method |       Mean |    Error |   StdDev |  Gen 0 | Code Size | Allocated |
|--------------------- |-----------:|---------:|---------:|-------:|----------:|----------:|
|        RoundtripInts | 3,835.8 ns | 11.98 ns | 11.21 ns | 0.0229 |      50 B |     200 B |
|       RoundtripLongs | 4,389.7 ns | 16.26 ns | 15.21 ns | 0.0153 |     862 B |     136 B |
|         RoundtripHex | 3,843.8 ns |  9.92 ns |  9.28 ns | 0.1259 |      49 B |   1,064 B |
|         SingleNumber |   875.7 ns |  1.76 ns |  1.65 ns | 0.0076 |     196 B |      64 B |
|        DecodeNumbers |   875.4 ns |  8.87 ns |  8.30 ns | 0.0038 |     501 B |      32 B |
|   DecodeSingleNumber |   814.0 ns |  2.66 ns |  2.49 ns |      - |     118 B |         - |
|      RoundtripSingle | 1,272.6 ns | 12.14 ns | 11.36 ns | 0.0038 |     283 B |      32 B |
| SingleNumberAsParams |   911.1 ns |  4.11 ns |  3.84 ns | 0.0191 |     450 B |     160 B |

I would like to see if it possible to get them to return ReadOnlySpan<char> to have return option that is only on the stack.

But GenerateHashFrom now allows us to potentially do that.

public string EncodeLong(long number) => GenerateHashFrom(stackalloc[] { number });
public string EncodeLong(long number)
{
Span<char> result = stackalloc char[20];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is a limit of 20 chars? Now it throws if you try to encode value with minHashLength parameter set to >20, however previously it was possible to get string of any length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the max length for a long was 19 plus 1 for lottery hence 20.

We should have a test that validate any length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also potentially any length does not make sense when it comes to max url length or other serialization issues or memory optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't confuse input number length (e.g var num = 1234 has length of 4) and output hash length (which can differ)

@manigandham
Copy link
Collaborator

hey @jetersen

The fundamental problem here is that DecodeSingle is still not truthful about the error. Previously it would return MultipleResultsException even if there was nothing decoded, but now the code never returns MultipleResultsException even if there are, effectively pointing users to the wrong error.

As nice as it is to have improved performance, I don't think it should come at the cost of correct results. Is there a way to fix this?

@jetersen
Copy link
Contributor Author

jetersen commented May 19, 2022

@manigandham As you can see there was an argument about this

Whether it returns NoResultException or MultipleEntriesException to me is irrelevant.

But as you can see I suggest I can revert that change in 535df3d

Before 535df3d it is possible to detect multiple entries but apparently the way I do that was not appreciated.

I was looking for any Separator characters, and throw MultipleEntriesException, see #65 (comment) and see #65 (comment)

Again MultipleEntriesException was only added in recent version as well. So in my opinion we could do either.
Revert 535df3d to throw MultipleEntriesException or keep 535df3d thereby throwing NoResultException

What difference does it make MultipleEntriesException is the same as NoResultException both exception is a non result situation.

@manigandham
Copy link
Collaborator

What difference does it make MultipleEntriesException is the same as NoResultException both exception is a non result situation.

An exception being thrown signals that there's an issue with the operation, but the actual exception type describes why there was an issue. I think it's very important for the user to know so they understand why the result didn't decode (whether there's multiple numbers or none).

I'll leave it to @ullmark to decide how to proceed.

@ullmark
Copy link
Owner

ullmark commented May 22, 2022

@manigandham If the performance increases of this library is important for the ones who uses it, and this looks good we should merge this.

tbh; it uses some syntax I've never seen or used (or had any need to) in C#, so @jetersen you are welcome to come in and maintain if you like. :)

Re. the errors, if it's not possible to detect multiple entries in a good way, just throw a "NoResultException" since that is still true, given the alphabet + salt, and how you use the library, we weren't able to get a result. 🤷‍♂️

@manigandham
Copy link
Collaborator

@jetersen if you want to change the exception to say NoResultException then I'll merge in and do the next release.

@jetersen
Copy link
Contributor Author

@manigandham it is already throwing NoResultException, see the tests in GeneralTests.cs :)

@jetersen
Copy link
Contributor Author

@ullmark I would be happy to help maintain and review PRs 😄

@manigandham manigandham merged commit 480cabe into ullmark:master May 22, 2022
@manigandham manigandham self-assigned this May 22, 2022
@jetersen jetersen deleted the feat/improveSingleDecode branch May 22, 2022 21:33
@manigandham
Copy link
Collaborator

v1.6.0 released https://www.nuget.org/packages/Hashids.net/1.6.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants