GenAI Engineer | Full-Stack Developer | Open Source Contributor
I'm currently building AI infrastructure at Spheron Network, where I spend my days architecting scalable inference systems and creating AI applications that actually work in production. When I'm not wrestling with distributed systems or optimizing GPU clusters, you'll find me contributing to open source projects or organizing tech meetups (like that time I got 100+ Reddit users together for an IMAX screening!).
Right now, I'm deep in some fascinating problems around making AI actually work at scale. Here's what's keeping me busy:
- AI Infrastructure: Designing systems that can handle massive AI workloads without crashing (those NVIDIA B200/H200 clusters don't manage themselves!)
- RAG Applications: Building secure, multi-tenant systems where different users can query their own documents without seeing each other's data
- LLM Deployment: Creating platforms that make deploying large language models less of a nightmare for other developers
- Community Building: Bringing developers together because the best solutions happen when smart people collaborate
Primary Languages: JavaScript, TypeScript, Python, C++
Emerging Tech: Rust, Go
Web Technologies: RESTful APIsa, WebSockets
Frontend: Next.js, React
Backend: FastAPI, Express.js, Node.js
LLM Platforms: Hugging Face, LiteLLM, OpenAI
AI Frameworks: RAG pipelines, Vector databases
Model Deployment: LoRA fine-tuning, Automated inference
Hardware Optimization: NVIDIA B200/H200 systems
Cloud Platforms: Google Cloud Platform, AWS
Containerization: Docker, Kubernetes, Helm
Monitoring: Prometheus, Custom dashboards
CI/CD: GitHub Actions, Automated pipelines
NoSQL: MongoDB, Scylla
SQL: PostgreSQL
Caching: Redis
Cloud Storage: S3 architecture, Secure artifact storage
LLMizer - LLM Deployment Orchestration Platform
Ever tried deploying a large language model and ended up with a mess of scripts and configs? Yeah, me too. That's why I built this.
What makes it actually useful:
- FastAPI Backend: Handles the heavy lifting for model management and inference
- Next.js Dashboard: Real-time monitoring that doesn't make your eyes bleed
- LiteLLM Integration: One interface to rule them all (OpenAI, Anthropic, you name it)
- Actually Works in Production: Because demos are easy, production is hard
Tech Stack: FastAPI, Next.js, LiteLLM, Docker, Kubernetes
Audio Craft - Professional Text-to-Speech Library
Turns out Google's Gemini API can generate surprisingly good audio. So I wrapped it in a Python library that doesn't suck.
Why you might care:
- Async-First Design: Won't block your entire app while generating audio
- Multi-Language Support: 24 languages because English isn't the only language
- Redis Queue System: For when you need to generate thousands of audio files without melting your server
- Multi-Speaker Dialogue: Create conversations that sound like actual different people talking
Tech Stack: Python, Google Gemini API, Asyncio, Redis
HLS Microservice Backend - Video Processing Pipeline
Video streaming is harder than it looks. You can't just throw a .mp4 file at users and hope their internet can handle it. This converts videos to HLS format so they stream smoothly.
The interesting bits:
- Microservice Architecture: Each service does one thing well (and can be debugged separately when things break)
- Queue-Based Processing: RabbitMQ handles the video conversion queue because FFmpeg takes forever
- Kubernetes Ready: Helm charts included because manually deploying microservices is a special kind of hell
- RESTful API: Upload, process, track status - the basics that actually work
Tech Stack: TypeScript, Express.js, MongoDB, RabbitMQ, FFmpeg, Docker, Kubernetes
Plex-rclone - Cloud Media Server Solution
Have terabytes of media stored in the cloud but want to stream it through Plex? This Docker image has you covered.
What it solves:
- Secure Cloud Mounting: Safely connects to Google Drive, Dropbox, whatever you use
- Streaming Optimized: Configured so your movies don't buffer every 30 seconds
- Flexible Setup: Works with pretty much any cloud storage provider rclone supports
Tech Stack: Docker, Plex, rclone, Linux, Shell Scripting
I've been lucky to learn from amazing open source projects, so I try to give back when I can. Here are some projects where my PRs actually got merged:
Helicone AI Gateway - Rust, Tokio
- Enhanced Error Handling: Replaced generic error types with specific variants for precise debugging
- Code Quality Improvements: Led major refactoring to decompose monolithic functions
- Performance Optimization: Eliminated 100+ lines of duplicated code by creating reusable abstractions
Arsky Project - TypeScript Migration
- TypeScript Migration: Improved type safety and developer experience
- Code Standards: Implemented ESLint and Prettier for consistent code quality
- CI/CD Enhancement: Upgraded deployment pipeline with GitHub Actions
Spheron CLI - Node.js, TypeScript
- Developer Experience: Updated TypeScript definitions for better IDE support
- Configuration Accuracy: Fixed deployment script configurations
Gdu - Go
- Reliability: Fixed potential nil pointer crashes in file handling
- Defensive Programming: Added proper nil checks for critical variables
- Reddit VP Recognition: Acknowledged by Reddit's VP of International Growth for successfully organizing the Interstellar IMAX community event for 100+ users
- Payment Processing: Handled βΉ2 lakh in transactions through secure Razorpay integration
- Academic Excellence: Graduated with 8.9 CGPA in Computer Science Engineering
- Production Impact: Built systems handling high-scale AI inference and real-time user interactions
Here's the thing about building softwareβanyone can make something work on their laptop. The real challenge (and the fun part) is making it work for thousands of users without breaking, scaling gracefully, and actually solving real problems people have.
What really gets me going:
- Making AI Less Intimidating: Too many great AI tools are stuck behind complex setups. I want to change that.
- Building Community: Some of my best learning has happened in Slack channels and Discord servers with other developers. I try to create those spaces.
- Code That Doesn't Break at 3 AM: Because nobody wants to debug a production issue during their weekend (been there, done that).
- Open Source: If I figure out something useful, why not share it? The best developers I know learned by reading other people's code.
Always up for a good conversation about tech, AI, or that weird bug you've been chasing for three days. Seriously, hit me up.
- Email: shivambansal.in30@gmail.com
- GitHub: You're already here! Feel free to poke around my repos
- Phone: +91 9953165877
Fair warning: I might get excited and talk your ear off about whatever I'm currently building.