Cricket Match Outcome Prediction

This project investigates the possibility of predicting the winner during the second innings of a cricket match. By applying different machine learning methods and innovative shrinkage strategies, the project aims to provide accurate predictions that could be useful for captains, coaches, and broadcasting teams to enhance strategic decisions during live matches. This repository only contains the code for the web application.

Motivation

Real-time winning predictions during cricket matches keep the broadcast engaging for viewers. This inspired us to create a model that predicts the winner as the second innings progresses, allowing viewers to better understand the evolving match dynamics. For team management, this tool can offer insights to improve on-field decision-making. The availability of ball-by-ball datasets for public use further motivated us to utilize this data to develop and compare different machine learning models.

Objectives

The project aims to achieve the following:

Identify key factors that influence the outcome of a match during the second innings.
Develop machine learning models based on these predictors.
Deploy the models in a web-based dashboard to predict match outcomes in real time.

Data Modeling and Preprocessing

Dataset Overview

The dataset, sourced from Kaggle (Jamie Welsh), includes ball-by-ball data from T20 cricket matches between 2005 and 2023. The dataset contains 425,119 records with 34 attributes describing each delivery. After filtering for second-innings data, 200,304 observations were used in the analysis.

Problem Definition

The prediction task is modeled as a classification problem, where the target variable (Chased Successfully) is binary:

0: Chasing team lost.
1: Chasing team won.

Predictor Selection

Thirteen key predictors were chosen based on their practical relevance to the match outcome:

Runs Required
Balls Remaining
Current Score
Balls Delivered
Wickets Remaining
Target Score
Current Run Rate (CRR)
Required Run Rate (RRR)
Striker's Score
Balls Faced by Striker
Non-Striker's Score
Balls Faced by Non-Striker
Runs Conceded by Bowler

Feature Engineering

Derived predictors such as Balls Delivered, Wickets Remaining, CRR, and RRR were included. Missing values in CRR and RRR were imputed with their mean and maximum values, respectively.

Tools and Libraries

The following Python libraries were used:

NumPy: For numerical operations.
Pandas: For data manipulation.
Scikit-learn: For machine learning model development.
Streamlit: For building the web-based dashboard.

Summary of Results

The table below summarizes the performance of each classifier based on accuracy when applied to the test dataset:

Classifier	Accuracy (%)
Logistic Regression	81.89
Penalized Logistic Regression (LASSO)	81.89
Shrinkage Estimation	81.94
Positive Shrinkage Estimation	81.94
Linear Shrinkage Estimation	81.89
Pretest Estimation	81.96
Shrinkage Pretest Estimation	81.96
Gradient Boosting Machine	91.66

The Gradient Boosting Machine model performed the best, achieving an accuracy of 91.66%, while the other classifiers showed accuracy between 81.89% and 81.96%.

Technologies Used

Python: Programming language used for model development.
NumPy: Numerical computation.
Pandas: Data manipulation and preprocessing.
Scikit-learn: Machine learning library.
Streamlit: Framework for building the web-based dashboard.

Usage

The Streamlit app is available in the Web Application folder, and the model notebook can be found in the Model folder. You can access the app by visiting https://match-predictor.onrender.com/.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Web Application		Web Application
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cricket Match Outcome Prediction

Table of Contents

Motivation

Objectives

Data Modeling and Preprocessing

Dataset Overview

Problem Definition

Predictor Selection

Feature Engineering

Tools and Libraries

Summary of Results

Technologies Used

Usage

About

Uh oh!

Releases

Packages

Languages

pranathlcp/ml-methods-with-shrinkage-strategies

Folders and files

Latest commit

History

Repository files navigation

Cricket Match Outcome Prediction

Table of Contents

Motivation

Objectives

Data Modeling and Preprocessing

Dataset Overview

Problem Definition

Predictor Selection

Feature Engineering

Tools and Libraries

Summary of Results

Technologies Used

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages