SC Data is a Python package and related tools making use of
sparecores-crawler
to pull and
standardize data on cloud compute resources. This repository actually
runs the crawler every 5 minutes to update spot prices, and every hour
to update all cloud resources in an internal SCD table and public
SQLite snapshot as well.
Stable version from PyPI:
pip install sparecores-data
Most recent version from GitHub:
pip install "sparecores-data @ git+https://git@github.com/SpareCores/sc-data.git"
For easy access to the most recent version of the SQLite database file, import
the db
object of the sc_data
Python package, which runs an updater thread
in the background to keep the SQLite file up-to-date:
from sc_data import db
print(db.path)
This attempts to download the latest version of the database from our public S3 bucket within 30 seconds (see config options below), and returns the path of the tempfile on success, or a limited version of the database that is bundled with the package (without pricing information).
To enforce waiting for the update to complete, you can use the updated
event:
db.updated.wait()
The package comes with the following set of default parameters, which can be overridden by builtins or environment variables:
Configuration | Description | Default Value | Builtin Name | Environment Variable |
---|---|---|---|---|
Initial Database | The file path of the initial database to load | data/sc-data-priceless.db |
sc_data_db_path |
SC_DATA_DB_PATH |
Disable Updates | Whether to disable automatic updates | False |
sc_data_no_update |
SC_DATA_NO_UPDATE |
Database URL | The URL of the most recent version of the database file | https://sc-data-public-40e9d310.s3.amazonaws.com/sc-data-all.db.bz2 |
sc_data_db_url |
SC_DATA_DB_URL |
HTTP Timeout | The timeout in seconds for downloading the database file | 30 |
sc_data_http_timeout |
SC_DATA_HTTP_TIMEOUT |
Refresh Interval | The interval in seconds to update the database | 600 |
sc_data_db_refresh_seconds |
SC_DATA_DB_REFRESH_SECONDS |