-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
RevisionGraph: improve internal structures #11268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RevisionGraph: improve internal structures #11268
Conversation
@@ -16,9 +17,18 @@ public class RevisionGraph : IRevisionGraphRowProvider | |||
internal const int MaxLanes = 40; | |||
private const int _straightenLanesLookAhead = 20; | |||
|
|||
// Some unordered collections with raw data | |||
private ConcurrentDictionary<ObjectId, RevisionGraphRevision> _nodeByObjectId = new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removal of this data structure is the majority of the decreased runtime.
Updating an ImmutableList will reallocate, so a lot of OH are removed.
Array.Sort(localOrderedNodesCache, (x, y) => x.Score.CompareTo(y.Score)); | ||
_orderedNodesCache = localOrderedNodesCache; | ||
if (localOrderedNodesCache.Length > 0) | ||
RevisionGraphRevision[] n = _nodeByObjectId.Values.ToArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a builder was slightly slow in this scenario (85 vs 65 ms), not what I expected.
The following was about 35 ms:
_orderedNodesCache = _nodeByObjectId.Values
.OrderBy(revision => revision.Score)
.ToImmutableArray();
The array has the least allocation here, so it was chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted to this code anyway, got an exception in tests while sorting.
@@ -2,22 +2,25 @@ | |||
{ | |||
public class LaneInfo | |||
{ | |||
public LaneInfo(RevisionGraphSegment startSegment, LaneInfo? derivedFrom) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No functionality change, just separating the two cases.
/// and the first page is to be displayed, very quick. | ||
/// The second time is when all revisions are loaded, that may require 50 ms to build. | ||
/// </remarks> | ||
private ImmutableArray<RevisionGraphRevision> _orderedNodesCache = ImmutableArray<RevisionGraphRevision>.Empty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An ImmutableArray is accessed about as fast as a normal array, and _orderedNodesCache is never changed when in use.
It is normally created two times so it seems the correct choice. (Immutable would allow parallel processing but that is not really an improvement here).
For _orderedRowCache other data structures could be a better choice. It was updated quicker in the creation scenario (like 150 ms if I recall correctly) but the scrolling could give more allocations.
I cannot see any slowdowns or obvious fragmentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used ImmutableArray for row cache too, handle locking in a much cheaper way and only allocates when scrolling.
/// <summary> | ||
/// The number of revisions added, potentially nodes in the graph. | ||
/// </summary> | ||
public int Count => _nodeByObjectId.Count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a potential performance issue, _nodeByObjectId.Count is locking a ConcurrentDictionary
But accesses from other threads when loading (adding) the revisions is low and there are few so critical concurrent accesses after. A separate counter does not seem to be worth it.
{ | ||
return _nodeByObjectId.TryGetValue(objectId, out revision); | ||
} | ||
=> _nodeByObjectId.TryGetValue(objectId, out revision); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CoPilot insists that Contains() is faster here but it mixes up Contains+Get vs TryGetValue
{ | ||
// This revision was added earlier, but is now found in the log. | ||
// Increase the score to the current maxScore to keep the order intact. | ||
revisionGraphRevision.EnsureScoreIsAbove(nextScore); | ||
revisionGraphRevision.GitRevision = revision; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove this assigment, done below (was accidentally kept when splitting the modified insert artificial for #11267
|
||
if (localOrderedRowCache is null || IsRowCacheDirty(localOrderedRowCache, orderedNodesCache)) | ||
int capacity = Math.Max(currentRowIndex, lastOrderedNodeIndex) + 1; | ||
ImmutableArray<RevisionGraphRow>.Builder localOrderedRowCache = ImmutableArray.CreateBuilder<RevisionGraphRow>(capacity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will move this allocation to after line 349/374
Array.Sort(localOrderedNodesCache, (x, y) => x.Score.CompareTo(y.Score)); | ||
_orderedNodesCache = localOrderedNodesCache; | ||
if (localOrderedNodesCache.Length > 0) | ||
RevisionGraphRevision[] n = _nodeByObjectId.Values.ToArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_nodeByObjectId.Values will make a consistent snapshot
c0f3349
to
31360a4
Compare
Plan to tune this some more, but most will likely not change. (The row cache is not always invalidated.) |
31360a4
to
2913758
Compare
int i = 0; | ||
foreach (RevisionGraphRevision node in _revisionGraph._nodeByObjectId.Values) | ||
{ | ||
foreach (RevisionGraphRevision parent in node.Parents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int i = 0
and i++
are obsolete as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge issue after the first var
PR.
Fixed in the new rebase after the second var PR, removed the (new) line 656. i is used in printouts only.
d9e8c25
to
cd53095
Compare
will investigate the data structure for _orderedRowCache again before removing draft (no other changes planned) |
65b77d7
to
e28354c
Compare
I believe this is working now, will test some more myself before changing from draft. After the latest changes #9223 got worse in a certain scenario on the Linux repo. I believe this issue is now fix. Also, it may handle some of the intermittently failing tests as well. There should have been display issues when git-log sorted commits in topo order too. The first was that the ordered list were not consistently updated and could be modified in parallel. The second is more complex. Commits are added with a score to define the order. Parents are normally after the child and are just given a new score. For topo order or when artificial commits are inserted, the node cache were invalidated for the case where artificial were inserted which will cause the row cache to be updated, but the grid was still using the previous cached data. Topo order and tests did not invalidate the caches at all (so it worked if caches were not yet updated). |
e28354c
to
a577c1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of the first commit including its fixes
Changed the row cache to ImmutableArray. The row cache is only updated from one thread but need to be consistent when accessed in other threads. I got a few errors with the previous setup. I still get some visual issues when scrolling a lot, but better than before. Plan to squash and remove draft for this tomorrow. |
130837b
to
3ccd613
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤯🤯🤯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost done
/// <summary> | ||
/// The number of revisions in <see cref="_revisionByObjectId"/>, potentially nodes in the graph. | ||
/// </summary> | ||
public int Count => _revisionCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔
/// <summary> | |
/// The number of revisions in <see cref="_revisionByObjectId"/>, potentially nodes in the graph. | |
/// </summary> | |
public int Count => _revisionCount; | |
/// <summary> | |
/// The number of revisions in <see cref="_revisionByObjectId"/>, potential nodes in the graph. | |
/// </summary> | |
public int Count { get; private set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 have not run
I would like to merge soon this to get some usage and see if the refresh issues are solved or at least improved. The slowdown in updating the grid is in StraightenLanes(), that is separate from this issue but it is nice to have this as a base when trying to improve that. (@mstv knows this better than me but he is busy but still put a good effort in reviewing this series of PRs, so I guess we will be in touch). |
General changes to improve cpu execution and to limit allocations. Some changes are very minor like using for instead of foreach, other changes the data structures. Separate storage of nodes that has no GitRevision. This allows to remove a separate node storage and to simplify the code. Use ImmutableArray instead of array and list for the ordered caches to improve lock handling. Force update data grid when revision graph nodes have been modified The order of nodes in the revision graph may change if the commits are listed out of order or if artificial commits are inserted. The revision graph was (normally) invalidated but the data grid was only refreshed if scrolling. Use separate counter for revisions to limit the lock time for _revisionByObjectId.Count Rename variables and add comments Remove unused IsRevisionRelative(). and a merge error in tests
if (minScore <= _orderedUntilScore) | ||
if (!_orderedNodesCacheInvalid) | ||
{ | ||
_reorder = true; | ||
ImmutableArray<RevisionGraphRevision> localOrderedNodesCache = _orderedNodesCache; | ||
if (localOrderedNodesCache.Length > 0 && minScore <= localOrderedNodesCache[^1].Score) | ||
{ | ||
_orderedNodesCacheInvalid = true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have analyzed this moved code multiple times. I do still not see why it causes an empty revision grid for many seconds when switching to the Linux repo.
_orderedUntilScore = int.MinValue; | ||
_reorder = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found it right after writing the previous comment:
The necessary replacement for _orderedUntilScore = int.MinValue;
is _orderedNodesCache = ImmutableArray<RevisionGraphRevision>.Empty;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I did not see that
Squashed existing, added this with updated commit message
If you think it's ready and *is working* - merge it
|
General changes to improve cpu execution and to limit allocations. Some changes are very minor like using for instead of foreach, other changes the data structures. Separate storage of nodes that has no GitRevision. This allows to remove a separate node storage and to simplify the code. Use ImmutableArray instead of array and list for the ordered caches to improve locking without excessive allocations. Force update data grid when revision graph nodes have been modified The order of nodes in the revision graph may change if the commits are listed out of order or if artificial commits are inserted. The revision graph was (normally) invalidated but the data grid was only refreshed if scrolling. Use separate counter for revisions to limit the lock time for _revisionByObjectId.Count Rename variables and add comments Remove unused IsRevisionRelative().
9b6179f
to
21c7754
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works for me
This does not completely solve #9223, but it is an improvement. There are two separate issues from here:
|
Proposed changes
General changes to improve cpu execution and to
limit allocations. Some changes are very minor like
using for instead of foreach, other changes the data structures.
Separate storage of nodes that has no GitRevision.
This allows to remove a separate node storage and
to simplify the code.
Use ImmutableArray instead of array and list for the ordered caches
to improve allocations.
Force update data grid when revision graph nodes have been modified
The order of nodes in the revision graph may change if the commits are
listed out of order or if artificial commits are inserted.
The revision graph was (normally) invalidated but the data grid was only
refreshed if scrolling.
Use separate counter for revisions to limit
the lock time for _revisionByObjectId.Count
Rename variables and add comments
Remove unused IsRevisionRelative().
--
This is partly a test of how you can use CoPilot to optimize code.
It is maybe 1/5 times you get something that works and 1/10 that something works.
Quite often CoPilot insists on something that is not an improvement.
But that is not so bad, there are improvements in the end.
This code is running when revisions are loaded, so performance is important.
Also minor changes could make a difference.
I will add some comments about decisions in an own review of the code.
Tests mostly on Linux repo, limiting loading to 100K commits.
This reduces the time RevisionDataGridView.Add() runs from about 1000-1400 ms to 400-600ms.
However, the normal load time (as reported by RevisionReader) is mostly unchanged at 3000 - 3400 ms, so it seem like git-log delivers commits slower than they are handled here.
The total time for git-log to write to a file is about 1900 ms so there is something more to it, maybe something with reading the stream (there are changes in .NET7 I will try later).
So there may be a potential for some minor improvements later.
The decreased allocation seem to give more consistent running times (less gc).
The incorrect insertion of artificial commits (that #11266 made more visible) is fixed at least.
I still get some visual issues when scrolling a lot, but better than before.
Similar with no grid when rebasing, I have not seen that lately.
More changes may be needed lock-cache-refresh handling, this is a step forward.
Test methodology
There are tests for graph loading.
Merge strategy
I agree that the maintainer squash merge this PR (if the commit message is clear).
✒️ I contribute this code under The Developer Certificate of Origin.