Skip to content

Conversation

gerhardol
Copy link
Member

@gerhardol gerhardol commented Oct 12, 2023

Proposed changes

General changes to improve cpu execution and to
limit allocations. Some changes are very minor like
using for instead of foreach, other changes the data structures.

Separate storage of nodes that has no GitRevision.
This allows to remove a separate node storage and
to simplify the code.

Use ImmutableArray instead of array and list for the ordered caches
to improve allocations.

Force update data grid when revision graph nodes have been modified

The order of nodes in the revision graph may change if the commits are
listed out of order or if artificial commits are inserted.
The revision graph was (normally) invalidated but the data grid was only
refreshed if scrolling.

Use separate counter for revisions to limit
the lock time for _revisionByObjectId.Count

Rename variables and add comments

Remove unused IsRevisionRelative().

--

This is partly a test of how you can use CoPilot to optimize code.
It is maybe 1/5 times you get something that works and 1/10 that something works.
Quite often CoPilot insists on something that is not an improvement.
But that is not so bad, there are improvements in the end.

This code is running when revisions are loaded, so performance is important.
Also minor changes could make a difference.

I will add some comments about decisions in an own review of the code.

Tests mostly on Linux repo, limiting loading to 100K commits.

This reduces the time RevisionDataGridView.Add() runs from about 1000-1400 ms to 400-600ms.
However, the normal load time (as reported by RevisionReader) is mostly unchanged at 3000 - 3400 ms, so it seem like git-log delivers commits slower than they are handled here.
The total time for git-log to write to a file is about 1900 ms so there is something more to it, maybe something with reading the stream (there are changes in .NET7 I will try later).
So there may be a potential for some minor improvements later.
The decreased allocation seem to give more consistent running times (less gc).

The incorrect insertion of artificial commits (that #11266 made more visible) is fixed at least.
I still get some visual issues when scrolling a lot, but better than before.
Similar with no grid when rebasing, I have not seen that lately.
More changes may be needed lock-cache-refresh handling, this is a step forward.

Test methodology

There are tests for graph loading.

Merge strategy

I agree that the maintainer squash merge this PR (if the commit message is clear).


✒️ I contribute this code under The Developer Certificate of Origin.

@ghost ghost assigned gerhardol Oct 12, 2023
@@ -16,9 +17,18 @@ public class RevisionGraph : IRevisionGraphRowProvider
internal const int MaxLanes = 40;
private const int _straightenLanesLookAhead = 20;

// Some unordered collections with raw data
private ConcurrentDictionary<ObjectId, RevisionGraphRevision> _nodeByObjectId = new();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of this data structure is the majority of the decreased runtime.
Updating an ImmutableList will reallocate, so a lot of OH are removed.

Array.Sort(localOrderedNodesCache, (x, y) => x.Score.CompareTo(y.Score));
_orderedNodesCache = localOrderedNodesCache;
if (localOrderedNodesCache.Length > 0)
RevisionGraphRevision[] n = _nodeByObjectId.Values.ToArray();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a builder was slightly slow in this scenario (85 vs 65 ms), not what I expected.
The following was about 35 ms:

_orderedNodesCache = _nodeByObjectId.Values
                        .OrderBy(revision => revision.Score)
                        .ToImmutableArray();

The array has the least allocation here, so it was chosen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted to this code anyway, got an exception in tests while sorting.

@@ -2,22 +2,25 @@
{
public class LaneInfo
{
public LaneInfo(RevisionGraphSegment startSegment, LaneInfo? derivedFrom)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No functionality change, just separating the two cases.

/// and the first page is to be displayed, very quick.
/// The second time is when all revisions are loaded, that may require 50 ms to build.
/// </remarks>
private ImmutableArray<RevisionGraphRevision> _orderedNodesCache = ImmutableArray<RevisionGraphRevision>.Empty;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An ImmutableArray is accessed about as fast as a normal array, and _orderedNodesCache is never changed when in use.
It is normally created two times so it seems the correct choice. (Immutable would allow parallel processing but that is not really an improvement here).

For _orderedRowCache other data structures could be a better choice. It was updated quicker in the creation scenario (like 150 ms if I recall correctly) but the scrolling could give more allocations.
I cannot see any slowdowns or obvious fragmentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used ImmutableArray for row cache too, handle locking in a much cheaper way and only allocates when scrolling.

/// <summary>
/// The number of revisions added, potentially nodes in the graph.
/// </summary>
public int Count => _nodeByObjectId.Count;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a potential performance issue, _nodeByObjectId.Count is locking a ConcurrentDictionary
But accesses from other threads when loading (adding) the revisions is low and there are few so critical concurrent accesses after. A separate counter does not seem to be worth it.

{
return _nodeByObjectId.TryGetValue(objectId, out revision);
}
=> _nodeByObjectId.TryGetValue(objectId, out revision);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoPilot insists that Contains() is faster here but it mixes up Contains+Get vs TryGetValue

{
// This revision was added earlier, but is now found in the log.
// Increase the score to the current maxScore to keep the order intact.
revisionGraphRevision.EnsureScoreIsAbove(nextScore);
revisionGraphRevision.GitRevision = revision;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove this assigment, done below (was accidentally kept when splitting the modified insert artificial for #11267


if (localOrderedRowCache is null || IsRowCacheDirty(localOrderedRowCache, orderedNodesCache))
int capacity = Math.Max(currentRowIndex, lastOrderedNodeIndex) + 1;
ImmutableArray<RevisionGraphRow>.Builder localOrderedRowCache = ImmutableArray.CreateBuilder<RevisionGraphRow>(capacity);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move this allocation to after line 349/374

Array.Sort(localOrderedNodesCache, (x, y) => x.Score.CompareTo(y.Score));
_orderedNodesCache = localOrderedNodesCache;
if (localOrderedNodesCache.Length > 0)
RevisionGraphRevision[] n = _nodeByObjectId.Values.ToArray();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_nodeByObjectId.Values will make a consistent snapshot

@gerhardol gerhardol force-pushed the feature/improve-revgrid-loading branch from c0f3349 to 31360a4 Compare October 15, 2023 22:48
@gerhardol
Copy link
Member Author

Plan to tune this some more, but most will likely not change. (The row cache is not always invalidated.)

Comment on lines 653 to 584
int i = 0;
foreach (RevisionGraphRevision node in _revisionGraph._nodeByObjectId.Values)
{
foreach (RevisionGraphRevision parent in node.Parents)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int i = 0 and i++ are obsolete as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge issue after the first var PR.
Fixed in the new rebase after the second var PR, removed the (new) line 656. i is used in printouts only.

@gerhardol gerhardol force-pushed the feature/improve-revgrid-loading branch from d9e8c25 to cd53095 Compare October 19, 2023 22:49
@gerhardol
Copy link
Member Author

will investigate the data structure for _orderedRowCache again before removing draft (no other changes planned)

@gerhardol gerhardol force-pushed the feature/improve-revgrid-loading branch 2 times, most recently from 65b77d7 to e28354c Compare October 21, 2023 19:40
@gerhardol
Copy link
Member Author

I believe this is working now, will test some more myself before changing from draft.
At that point, I will merge #11267 and squash this PR before removing draft.

After the latest changes #9223 got worse in a certain scenario on the Linux repo. I believe this issue is now fix. Also, it may handle some of the intermittently failing tests as well. There should have been display issues when git-log sorted commits in topo order too.
There are two parts of it.

The first was that the ordered list were not consistently updated and could be modified in parallel.

The second is more complex. Commits are added with a score to define the order. Parents are normally after the child and are just given a new score. For topo order or when artificial commits are inserted, the node cache were invalidated for the case where artificial were inserted which will cause the row cache to be updated, but the grid was still using the previous cached data. Topo order and tests did not invalidate the caches at all (so it worked if caches were not yet updated).
To solve this, the results for modified scores were propagated to the data grid so a refresh flag could be set. The workaround for this issue was to scroll forward and then back to force the update.

Copy link
Member

@mstv mstv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of the first commit including its fixes

@gerhardol
Copy link
Member Author

gerhardol commented Oct 23, 2023

Changed the row cache to ImmutableArray. The row cache is only updated from one thread but need to be consistent when accessed in other threads. I got a few errors with the previous setup.
This will allocate every time the cache is updated (in chunks of 1500), so there will be some garbage collection if scrolled to the end.
Still better than to copy the List<> when inserting every new revision.
I tried using a lock() instead, but that was much slower.
(Current implementation uses Immutable for the nodes list, so it allocates a new array for every revision added, not just a new allocation every 1500 if you scroll so the new implementation is some magnitudes better in the normal use case).

I still get some visual issues when scrolling a lot, but better than before.
Similar with no grid when rebasing, I have not seen that lately.
More changes may be needed lock-cache-refresh handling, this is a step forward.

Plan to squash and remove draft for this tomorrow.

@gerhardol gerhardol force-pushed the feature/improve-revgrid-loading branch from 130837b to 3ccd613 Compare October 24, 2023 21:13
@gerhardol gerhardol marked this pull request as ready for review October 24, 2023 22:36
Copy link
Member

@RussKie RussKie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯🤯🤯

Copy link
Member

@mstv mstv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost done

Comment on lines 103 to 106
/// <summary>
/// The number of revisions in <see cref="_revisionByObjectId"/>, potentially nodes in the graph.
/// </summary>
public int Count => _revisionCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// <summary>
/// The number of revisions in <see cref="_revisionByObjectId"/>, potentially nodes in the graph.
/// </summary>
public int Count => _revisionCount;
/// <summary>
/// The number of revisions in <see cref="_revisionByObjectId"/>, potential nodes in the graph.
/// </summary>
public int Count { get; private set; }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member

@mstv mstv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 have not run

@gerhardol
Copy link
Member Author

🤯🤯🤯

👍 have not run

I would like to merge soon this to get some usage and see if the refresh issues are solved or at least improved.
I can some issues still when scrolling in a specific way in specific situations in linux repo so we may need to go back to this.

The slowdown in updating the grid is in StraightenLanes(), that is separate from this issue but it is nice to have this as a base when trying to improve that. (@mstv knows this better than me but he is busy but still put a good effort in reviewing this series of PRs, so I guess we will be in touch).

General changes to improve cpu execution and to
limit allocations. Some changes are very minor like
using for instead of foreach, other changes the data structures.

Separate storage of nodes that has no GitRevision.
This allows to remove a separate node storage and
to simplify the code.

Use ImmutableArray instead of array and list for the ordered caches
to improve lock handling.

Force update data grid when revision graph nodes have been modified

The order of nodes in the revision graph may change if the commits are
listed out of order or if artificial commits are inserted.
The revision graph was (normally) invalidated but the data grid was only
refreshed if scrolling.

Use separate counter for revisions to limit
the lock time for _revisionByObjectId.Count

Rename variables and add comments

Remove unused IsRevisionRelative().

and a merge error in tests
Comment on lines -591 to 658
if (minScore <= _orderedUntilScore)
if (!_orderedNodesCacheInvalid)
{
_reorder = true;
ImmutableArray<RevisionGraphRevision> localOrderedNodesCache = _orderedNodesCache;
if (localOrderedNodesCache.Length > 0 && minScore <= localOrderedNodesCache[^1].Score)
{
_orderedNodesCacheInvalid = true;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have analyzed this moved code multiple times. I do still not see why it causes an empty revision grid for many seconds when switching to the Linux repo.

Comment on lines -564 to -565
_orderedUntilScore = int.MinValue;
_reorder = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it right after writing the previous comment:
The necessary replacement for _orderedUntilScore = int.MinValue; is _orderedNodesCache = ImmutableArray<RevisionGraphRevision>.Empty;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I did not see that
Squashed existing, added this with updated commit message

@RussKie
Copy link
Member

RussKie commented Oct 26, 2023 via email

General changes to improve cpu execution and to
limit allocations. Some changes are very minor like
using for instead of foreach, other changes the data structures.

Separate storage of nodes that has no GitRevision.
This allows to remove a separate node storage and
to simplify the code.

Use ImmutableArray instead of array and list for the ordered caches
to improve locking without excessive allocations.

Force update data grid when revision graph nodes have been modified

The order of nodes in the revision graph may change if the commits are
listed out of order or if artificial commits are inserted.
The revision graph was (normally) invalidated but the data grid was only
refreshed if scrolling.

Use separate counter for revisions to limit
the lock time for _revisionByObjectId.Count

Rename variables and add comments

Remove unused IsRevisionRelative().
@gerhardol gerhardol force-pushed the feature/improve-revgrid-loading branch from 9b6179f to 21c7754 Compare October 26, 2023 21:24
@gerhardol gerhardol merged commit 1ffedca into gitextensions:master Oct 27, 2023
@ghost ghost added this to the vNext milestone Oct 27, 2023
@gerhardol gerhardol deleted the feature/improve-revgrid-loading branch October 27, 2023 19:56
Copy link
Member

@mstv mstv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me

@gerhardol
Copy link
Member Author

This does not completely solve #9223, but it is an improvement.

There are two separate issues from here:

  1. Find what is the problem with the refreshing. That is RevisionDataGridView interacting with RevisionGraph. Row nodes are OK but the column or RevisionGraphRow data is incorrect
  2. Speedup the lane handling that is slower after 4.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants