Skip to content
This repository was archived by the owner on Nov 19, 2020. It is now read-only.
This repository was archived by the owner on Nov 19, 2020. It is now read-only.

DecisionTrees\Pruning\ErrorBasedPruning: The subset of observations corresponding to a decision node may contains duplicates #234

@YaronK

Description

@YaronK

There's possibly an issue is with the ErrorBasedPruning class - The subset of observations corresponding to a decision node may contains duplicates.
The following snippet is taken from ErrorBasedPruning's 'compute' method:

if (indices.Length == 0)
{
    // The rule employed by this node doesn't cover
    // any input points. This node could be removed.

    node.Branches = null;
    node.Output = null;

    foreach (var child in node)
        subsets[child].Clear();

    for (int i = 0; i < inputs.Length; i++)
        trackDecisions(node, inputs[i], i);

    return true;
}

This part specifically:

for (int i = 0; i < inputs.Length; i++)
    trackDecisions(node, inputs[i], i);

I think there is no reason to re-track any decisions, since no decisions reach 'node'. The subset of each node is kept in a list data structure, and therefore might contain duplicates (even though it shouldn't). Since 'node' is a leaf (but not null), tracking all inputs from 'node' adds all the observations to 'node''s subset (even though they don't reach it). This may result in favoring 'node' over its sibling when later considering its parent.

I'm also not sure the foreach loop does anything. 'indices' is the subset of indices of observations which are routed through 'node'. indices.Length == 0, so no observations are routed through 'node'. If this is the case, no observations are routed through its children too, so there is no reason to clear their subsets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions