Skip to content
This repository was archived by the owner on Nov 19, 2020. It is now read-only.
This repository was archived by the owner on Nov 19, 2020. It is now read-only.

Robust multivariate regression causes IndexOutOfRangeException #602

@ndcomplete

Description

@ndcomplete

While training a multivariate linear regression in Accord.NET, I get an exception during training when I attempt to make the regression robust.

I am attempting to train a regression from a set of inputs M to a set of outputs N. Every time it tries to Learn, it throws a System.IndexOutOfRangeException. I've traced it to the internal JaggedSingularValueDecomposition. It seems that while solving the decomposition of the matrix, there is a hard assumption that the output value y is a single column. This obviously does not work for me, or I am doing it wrong.

As an additional note, I have many more training samples than either my set of explanatory variables or my response variables.

Is there something I'm doing wrong here? When doing non-robust regression the training works, the only issue is the least squares decomposition. I've looked at the iterative/generalized models but none of the support multiple outputs.

Simple example:

var trainingInputs = new double[][]
        {
            new double[] { 1, 2, 3 },
            new double[] { 2, 3, 4 },
            new double[] { 3, 4, 5 },
            new double[] { 4, 5, 6 },
            new double[] { 5, 6, 7 },
            new double[] { 6, 7, 8 },
        };

var trainingOutputs = new double[][]
        {
            new double[] { 3, 4 },
            new double[] { 4, 5 },
            new double[] { 5, 6 },
            new double[] { 6, 7 },
            new double[] { 7, 8 },
            new double[] { 8, 9 },
        };

var ols = new OrdinaryLeastSquares() { IsRobust = true };
ols.Learn(inputs, outputs); // Throws IndexOutOfRangeException

I've attempted a workaround by instead manually calculating the JaggedSingularValueDecomposition, then solving it for each column of output samples individually. This yields me an array of coefficients and inputs that I can use.

Is this a valid method? Are there any caveats to this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions