Skip to content

DL with h2o #28

@szilard

Description

@szilard

Trying to see if DL can match RF/GBM in accuracy on the airline dataset (where train is sampled from years 2005-2006, while validation and test sets sampled disjunctly from 2007). Also, some variables are kept categorical artificially and are intentionally not encoded as ordinal variables (to better match the structure of business datasets).

Recap: with 10M records training (largest in the benchmark) RF AUC 0.80 GBM 0.81 (on test set).

So far I get 0.73 with DL with h2o on 1M and 10M train as well:
https://github.com/szilard/benchm-ml/blob/master/4-DL/1-h2o.R

I tried a few architectures/activation/regularizations, but it won't beat the default. Runs about 2-3 minutes with early stopping (using validation set) on a 32 cores EC2 box.

The "problem" is DL learns very fast, the best AUC reached after 1.3 epochs on 1M rows train and 0.15 epochs on 10M (and early stopping kicks in around 9 and 0.9, rsp). On the other hand RF/GBM runs ~1hr to get good accuracy. That is the DL model seems underfitted to me.

Surely, DL might not beat GBM on this kind of data (proxy for general business data such as credit risk or fraud detection), but it should do better than 0.73.

Datasets:
https://s3.amazonaws.com/benchm-ml--main/train-1m.csv
https://s3.amazonaws.com/benchm-ml--main/train-10m.csv
https://s3.amazonaws.com/benchm-ml--main/valid.csv
https://s3.amazonaws.com/benchm-ml--main/test.csv

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions