You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 19, 2020. It is now read-only.
There's possibly an issue is with the ErrorBasedPruning class - The subset of observations corresponding to a decision node may contains duplicates.
The following snippet is taken from ErrorBasedPruning's 'compute' method:
if (indices.Length == 0)
{
// The rule employed by this node doesn't cover
// any input points. This node could be removed.
node.Branches = null;
node.Output = null;
foreach (var child in node)
subsets[child].Clear();
for (int i = 0; i < inputs.Length; i++)
trackDecisions(node, inputs[i], i);
return true;
}
This part specifically:
for (int i = 0; i < inputs.Length; i++)
trackDecisions(node, inputs[i], i);
I think there is no reason to re-track any decisions, since no decisions reach 'node'. The subset of each node is kept in a list data structure, and therefore might contain duplicates (even though it shouldn't). Since 'node' is a leaf (but not null), tracking all inputs from 'node' adds all the observations to 'node''s subset (even though they don't reach it). This may result in favoring 'node' over its sibling when later considering its parent.
I'm also not sure the foreach loop does anything. 'indices' is the subset of indices of observations which are routed through 'node'. indices.Length == 0, so no observations are routed through 'node'. If this is the case, no observations are routed through its children too, so there is no reason to clear their subsets.