DecisionTrees\Pruning\ErrorBasedPruning: The subset of observations corresponding to a decision node may contains duplicates

There's possibly an issue is with the **ErrorBasedPruning** class - **The subset of observations corresponding to a decision node may contains duplicates**.
The following snippet is taken from ErrorBasedPruning's 'compute' method:

```
if (indices.Length == 0)
{
    // The rule employed by this node doesn't cover
    // any input points. This node could be removed.

    node.Branches = null;
    node.Output = null;

    foreach (var child in node)
        subsets[child].Clear();

    for (int i = 0; i < inputs.Length; i++)
        trackDecisions(node, inputs[i], i);

    return true;
}
```

This part specifically:

```
for (int i = 0; i < inputs.Length; i++)
    trackDecisions(node, inputs[i], i);
```

I think there is no reason to re-track any decisions, since no decisions reach 'node'. The subset of each node is kept in a list data structure, and therefore might contain duplicates (even though it shouldn't). Since 'node' is a leaf (but not null), tracking all inputs from 'node' adds all the observations to 'node''s subset (even though they don't reach it). This may result in favoring 'node' over its sibling when later considering its parent.

I'm also not sure the foreach loop does anything. 'indices' is the subset of indices of observations which are routed through 'node'. `indices.Length == 0`, so no observations are routed through 'node'. If this is the case, no observations are routed through its children too, so there is no reason to clear their subsets.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DecisionTrees\Pruning\ErrorBasedPruning: The subset of observations corresponding to a decision node may contains duplicates #234

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DecisionTrees\Pruning\ErrorBasedPruning: The subset of observations corresponding to a decision node may contains duplicates #234

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions