Skip to content

SpareCores/sc-data

Repository files navigation

Spare Cores Data

Build Last Run Project Status: Beta Maintenance Status: Active CC-BY-SA 4.0 License PyPI - Python Version NGI Search Open Call 3 beneficiary

SC Data is a Python package and related tools making use of sparecores-crawler to pull and standardize data on cloud compute resources. This repository actually runs the crawler every 5 minutes to update spot prices, and every hour to update all cloud resources in an internal SCD table and public SQLite snapshot as well.

Installation

Stable version from PyPI:

pip install sparecores-data

Most recent version from GitHub:

pip install "sparecores-data @ git+https://git@github.com/SpareCores/sc-data.git"

Usage

For easy access to the most recent version of the SQLite database file, import the db object of the sc_data Python package, which runs an updater thread in the background to keep the SQLite file up-to-date:

from sc_data import db
print(db.path)

This attempts to download the latest version of the database from our public S3 bucket within 30 seconds (see config options below), and returns the path of the tempfile on success, or a limited version of the database that is bundled with the package (without pricing information).

To enforce waiting for the update to complete, you can use the updated event:

db.updated.wait()

Configuration

The package comes with the following set of default parameters, which can be overridden by builtins or environment variables:

Configuration Description Default Value Builtin Name Environment Variable
Initial Database The file path of the initial database to load data/sc-data-priceless.db sc_data_db_path SC_DATA_DB_PATH
Disable Updates Whether to disable automatic updates False sc_data_no_update SC_DATA_NO_UPDATE
Database URL The URL of the most recent version of the database file https://sc-data-public-40e9d310.s3.amazonaws.com/sc-data-all.db.bz2 sc_data_db_url SC_DATA_DB_URL
HTTP Timeout The timeout in seconds for downloading the database file 30 sc_data_http_timeout SC_DATA_HTTP_TIMEOUT
Refresh Interval The interval in seconds to update the database 600 sc_data_db_refresh_seconds SC_DATA_DB_REFRESH_SECONDS

References

About

Structured data collected by sc-crawler

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages