Benchmarking Tree Regressions

The respiratory is partially inspired by benchopt, but aimed for a more R-user-friendly, easily-hacking, more statistically benchmarking.

❗❗🤖 Bot `@new-csv` ❗❗

This repository supports an automated GitHub Action bot that listens for issues with the title

@new-csv DataName DataURL Idx_of_Y

The bot will create a new commit and open a pull request to add new data for benchmarking automatically. For example, Issue #39 triggers Pull Request #38, which adds the abalone dataset for benchmarking.

@new-csv abalone https://raw.githubusercontent.com/jbrownlee/Datasets/refs/heads/master/abalone.csv 9

The response variable is the 9-th column, so Idx_of_Y = 9. (Currently, assume no header column names.)

🌲 What is the Repo Structure?

Source Files

The source files are stored in R/, which contains three files

datasets.R: each dataset (either simulated or real dataset) is defined as a function

For the simulation data, we consider

different covariance structure of the design matrix
- Independent
- AR(1)
- AR(1)+
- Factor
different data generating model
- Friedman
- Checkerboard
- Linear
- Max

For the real datasets, we consider

Data	Description	n	p	URL
BostonHousing	Housing values and other information about Boston census tracts.	506	13	🔗
CaliforniaHousing	Aggregated housing data from each of 20,640 neighborhoods (1990 census block groups) in California.	20640	8	🔗
CASP	Physicochemical Properties of Protein Tertiary Structure	45730	9	🔗
Energy	Appliances Energy Prediction	19735	27	🔗
AirQuality	Air Quality	9357	12	🔗
BiasCorrection	Bias correction of numerical prediction model temperature forecast	7590	21	🔗
ElectricalStability	Electrical Grid Stability Simulated Data	10000	12	🔗
GasTurbine	Gas Turbine CO and NOx Emission Data Set	36733	10	🔗
ResidentialBuilding	Residential Building	372	107	🔗
LungCancerGenomic	Lung cancer genomic data from the Chemores Cohort Study	123	945	🔗
StructureActivity	Qualitative Structure Activity Relationships (triazines)	186	60	🔗
BloodBrain	Blood Brain Barrier Data `data("BloodBrain", package = "caret")`	208	134	🔗
GSE65904	Whole-genome expression analysis of melanoma tumor biopsies from a population-based cohort.	210	47323	🔗

methods.R: each method is defined as a function and with tuning parameters

Here are the method we considered:

Method	Software
Bayesian Additive Regression Trees (BART)
Accelerated BART (XBART)
Random Forests
XGBoost
Multivariate Adaptive Regression Splines (MARS)

evaluate.R: evaluate each method (with particular parameter) on each dataset, and report the CV error and running time (the criteria can be customized)

Shiny App

To interactively display the results, we adopt the Shiny App, and the related files are in the folder benchmark-tree-regressions/

Note that a typical shiny app is server-based. With the help of shinylive, we deploy the shiny website on the GitHub pages https://hohoweiya.xyz/benchmark.tree.regressions/.

🚀 How to Run Locally?

The repository used renv to manage the compatible R packages. It is also better to use the same R version 4.1.2 in case.

Then one can install the dependencies of R package via

> renv::restore()  # Restores packages from the lockfile (if using renv)

Next, you can check the scripts run-local.R and run-action.R.

Due to the limited computational resources, run-action.R take shorter time to finish.

MyServer@ZJU

Specifically, the current local results are run on the Yale Mccelary HPC.

$ R
> renv::restore() # restore packages from the lockfile
> system.time({source("run-local.R")})
     user    system   elapsed 
207084.40  56119.33 107226.75

How to Contribute?

Automated Way: open an issue with title @new-csv DataName URL Idx_of_Y

Step by Step: Add a real data

add data info into lst_real_data in datasets.R, and write out a meta data

real.data.meta = df_data_meta()
saveRDS(real.data.meta, file = "benchmark-tree-regressions/real-data-meta.rds")

run print_to_readme() to update the table in this README.md file
add the preparation step for the data (starting from downloading the data) as a function real_XXX in datasets.R
update the list of real data benchmark-tree-regressions/choices.real.data.R

📓 Use the Repo as a Template

The current respiratory is mainly benchmarking various tree-based regressions, but the structure is much more flexible for other tasks. If you find this structure useful, please feel free to use it as a template for your benchmarking analysis via the link 🔗.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.github/workflows		.github/workflows
R		R
benchmark-tree-regressions		benchmark-tree-regressions
renv		renv
.Rprofile		.Rprofile
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
benchmark.tree.regressions.Rproj		benchmark.tree.regressions.Rproj
renv.lock		renv.lock
run-action.R		run-action.R
run-local.R		run-local.R
run-preload.R		run-preload.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Benchmarking Tree Regressions

❗❗🤖 Bot `@new-csv` ❗❗

🌲 What is the Repo Structure?

Source Files

datasets.R: each dataset (either simulated or real dataset) is defined as a function

methods.R: each method is defined as a function and with tuning parameters

evaluate.R: evaluate each method (with particular parameter) on each dataset, and report the CV error and running time (the criteria can be customized)

Shiny App

🚀 How to Run Locally?

MyServer@ZJU

How to Contribute?

📓 Use the Repo as a Template

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

szcf-weiya/benchmark.tree.regressions

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Tree Regressions

❗❗🤖 Bot @new-csv ❗❗

🌲 What is the Repo Structure?

Source Files

datasets.R: each dataset (either simulated or real dataset) is defined as a function

methods.R: each method is defined as a function and with tuning parameters

evaluate.R: evaluate each method (with particular parameter) on each dataset, and report the CV error and running time (the criteria can be customized)

Shiny App

🚀 How to Run Locally?

MyServer@ZJU

How to Contribute?

📓 Use the Repo as a Template

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

❗❗🤖 Bot `@new-csv` ❗❗

Packages