This project tracks new courses in the MIT OCW platform.
It checks for the latest 10 courses across all departments, and stores
their general information into the Database.
We also provide a GraphQL API to query the stored courses, and filter through them.
More details on how to use the application follows.
to setup and use the project, please refer to this document.
As you can see in the graph, there's three main components, one message bus, and one DB.
Based on a cron schedule -e.g. midnight- a new job is run to scrap OCW.mit.edu, and check for new courses.
For each new course, a new message is published on the message bus containing course data.
more details document
These are workers that listen on the message bus for new messages.
For each new message, they attempt to parse the content into a structured form.
The result is then stored in a SQL database.
more details document
The user interface; a restful API queryable by the user, and provides a basic direct interface into the database. This component is mainly a search engine on the database. more details document
- Improve the database layer abstraction
- The parser instances should be spawned only when needed and one container per message
- Currently, a scraper container is spawned when a
make up
is performed. It should only be spawned at the correct cron intervals. - Improve the GraphQL capabilities