Abhishek Shah Portfolio

About Me

Senior Data Scientist

I am a technology leader with 13+ years of experience in enterprise AI solutions, specializing in developing production-grade ML systems, LLM applications, and cloud-native AI solutions. I have a proven track record of leading cross-functional teams to deliver impactful business solutions through advanced analytics and machine learning. Outside of work, I enjoy hiking and spending time in the outdoors, finding inspiration in nature's complexity and beauty.

Experience

2024 - Present

Data Scientist - BMW

Led development and training of enterprise LLM platform serving 100,000+ users
Spearheaded the development of responsible AI use cases, ensuring alignment with business objectives to elevate business value and improve customer outcomes.
Collaborated with executive leadership and cross-functional teams to deploy transformative AI solutions across BMW’s global manufacturing network.

2019 - 2024

Data Scientist - Intel

Built a RAG pipeline with OpenAI & vector DB, automating SOW analysis for 45 engineers and saving 10 hours/week per engineer, cutting $100K/month in costs.
Developed an ML model for FOUP defect detection using Random Forest & PCA, optimized with OpenVINO, reducing scrap wafers and saving $250K yearly.
Implemented AI-driven quality control at Audi, analyzing 900 welding robots, enabling real-time inspection of 5M+ welds, and cutting labor costs by 50%.
Built Power BI dashboard with ADF to track 4,000+ pipelines, enabling early failure detection and saving the analytics team 1 hour/day in troubleshooting.

2017 - 2019

Data Analytics Process Engineer - Westlake Chemical

Utilized data analytics and statistical process control to design safeguards for brine recovery, reducing raw material usage by 20%.
Conducted feasibility analysis for a $4MM chlorine liquefaction system, selecting an environmentally friendly alternative that increased throughput by 15%.
Applied predictive modeling to redesign acid discharge pump systems, achieving a 25% reduction in acid consumption and improved efficiency.
Developed a safe work procedure for HCl burner sampling, saving $20,000 annually in lab consultation costs.
Led hazard analysis and operability studies using big data to assess process changes for safety and environmental impact.
Managed three plant shutdown projects, optimizing schedules with data analytics to reduce timelines by 5%.

2012 - 2017

Staff Data Analytics Process Engineer & Project Manager - Tate & Lyle

Managed $5MM+ capital projects, using data analytics for research, construction planning, and cost estimation.
Developed key performance indicators (KPIs) to monitor plant efficiency, optimizing production planning and downtime reduction.
Created a real-time dashboard using PI ProcessBook for enhanced process visibility and troubleshooting.
Led process hazard analysis and risk mitigation strategies to ensure safe equipment installation and operation.
Supervised up to 20 contractors during plant outages, optimizing resource allocation and minimizing costs through data analytics.

Education

2022 - 2025

M.S. Artificial Intelligence & Machine Learning - University of Michigan

Concentration: NLP | Relevant Courses: Advanced Deep Learning, ML Engineering, Cloud AI Systems

2018 - 2020

M.S in Engineering and Technology Management - Washington State University

Focus: Data-Driven Decision Making, Technology Strategy

2007 - 2012

Bachelor of Science in Chemical Engineering - University of Minnesota

Tech Stack

AI/ML

TensorFlow, PyTorch, HuggingFace, LangChain, OpenAI GPT, Claude

Cloud & DevOps

AWS SageMaker, Azure ML, Kubernetes, Docker, Airflow, Terraform

Data Engineering

Spark, Snowflake, Databricks, SQL, NoSQL, DataRobot, MLflow

Latest Projects

Local RAG with DeepSeek and Ollama

A Streamlit app for SEC filings (10-K, 10-Q, 8-K) using Ollama, DeepSeek, OpenAI embeddings, Pinecone, and local RAG for AI-driven search, analysis, and trend comparison.

LangGraph Multi-Agent AI Travel Planner

A state-of-the-art multi-agent travel planning system built with LangGraph, Google Gemini Flash-2.0, and DuckDuckGo Search.

Customer Service Using Multiagent Swarm Agent

This project implements a multi-agent system (Agent Swarm) that processes user requests through specialized agents working together.

Agentic Rag Financial Analysis AI Assistant

This project implements AI agents using the Agno Agentic framework to fetch web search results and financial data and do Analysis using Agno Agent

Customer Support Intelligence with NLP and Gemini AI

An interactive Streamlit app leveraging NLP embeddings and Gemini AI to analyze, classify, and provide insights on customer support issues

RAG With Neo4J Knowledge Graph With OpenAI

This project implements a high-performance NLP pipeline for scientific document analysis, integrating a Neo4j knowledge graph for structured storage and retrieval. The Retrieval-Augmented Generation (RAG) system enables semantic search and contextual querying of scientific literature.

Legal Document Search Using NLP

A Streamlit-based legal document search system using PySpark, BM25, and TF-IDF for efficient full-text retrieval. It preprocesses legal texts, builds an inverted index, and ranks results dynamically, enabling fast, AI-powered search and analysis.

Lyft Dynamic Pricing

A Streamlit-based Lyft Trip Cost Predictor using Linear Regression, scikit-learn, pandas, and joblib to estimate ride prices based on distance, time, and peak hours. 🚕 Optimized for dynamic pricing analysis and real-time cost estimation.

Financial Doc Analyser

AI-powered SEC filing analysis using OpenAI embeddings, Pinecone, and Streamlit for fast, structured search. Retrieve, process, and analyze 10-K, 10-Q, and 8-K filings with instant, context-aware insights.

Credit Risk Modeling

A Streamlit-based loan default prediction app using XGBoost, Logistic Regression, Pandas, and NumPy. It analyzes loan characteristics and predicts default risk in real time based on user inputs.

Fake Image/Video Detector Using Deep Learning and Gemini

The AI Fake Image & Video Detector is a powerful Streamlit-based application designed to identify whether images or videos are AI-generated or authentic. Utilizing advanced techniques, this tool helps detect synthetic media created by popular AI models such as DALL-E, Midjourney, Stable Diffusion, and others.

Stock Market Market Analsyis

A stock market prediction system using RNN, LSTM, GRU, DNN, KNN, and Random Forest to forecast next-day closing prices. Built with Yahoo Finance, Pandas, Scikit-learn, and Matplotlib, it compares deep learning and machine learning models for accuracy.

Publications

June 11, 2023 Technical Article

Battle of Pathfinding Algorithms: A*, Branch & Bound, and Dijkstra’s Showdown in the 4 Knights…

Embark on an intriguing journey through the implementation of three search algorithms—A*, Branch and Bound (BnB), and Dijkstra—in the context of the 4 Knights problem.

Hi, I'm Abhishek Shah

I'm a

About Me

Senior Data Scientist

Experience

Data Scientist - BMW

Data Scientist - Intel

Data Analytics Process Engineer - Westlake Chemical

Staff Data Analytics Process Engineer & Project Manager - Tate & Lyle

Education

M.S. Artificial Intelligence & Machine Learning - University of Michigan

M.S in Engineering and Technology Management - Washington State University

Bachelor of Science in Chemical Engineering - University of Minnesota

Tech Stack

AI/ML

Cloud & DevOps

Data Engineering

Latest Projects

Local RAG with DeepSeek and Ollama

LangGraph Multi-Agent AI Travel Planner

Customer Service Using Multiagent Swarm Agent

Agentic Rag Financial Analysis AI Assistant

Customer Support Intelligence with NLP and Gemini AI

RAG With Neo4J Knowledge Graph With OpenAI

Legal Document Search Using NLP

Lyft Dynamic Pricing

Financial Doc Analyser

Credit Risk Modeling

Fake Image/Video Detector Using Deep Learning and Gemini

Stock Market Market Analsyis

Publications

Battle of Pathfinding Algorithms: A*, Branch & Bound, and Dijkstra’s Showdown in the 4 Knights…

Recommender System for Matching HealthCare Professionals with Jobs Using Cosine Similarity

Implementation Of Generalized Linear Regression Model Using Moore-Penrose Inverse

The Human Touch: Navigating the Intersection of Technology and Humanity

Comparison of AutoEncoders vs. Variational Autoencoders

YARN (Yet Another Resource Negotiator) Architecture

Computational Learning Theory In Machine Learning

Advancing Fusion Energy Research With Machine Learning

Reshaping the Dataset For Neural Networks

DFS vs BFS Algorithms for Graph Traversal

Learning Optimizers in Deep Learning

Feature Engineering using Random Forest Classifier in Machine Learning

Decision Tree and Ensemble Learning Algorithms in Machine Learning

Generalization Error in Machine Learning (Bias vs. Variance)

Overcoming overfitting a model in Machine Learning

My Podcasts

AI Powered SEC Analyzer

The Decade Ahead in AI

Customer Support Intelligence with NLP and Gemini AI

NLP Pipeline with Neo4j Knowledge Graph for Scientific Literature

Cracking the Protein Code: How AlphaFold Earned the 2024 Nobel Prize in Chemistry

A Survey of Dynamic Programming Algorithms

Retrieval Interleaved Generation (RIG) using LLM: What is It and How It Works?

Influence of a Large Language Model on Diagnostic Reasoning

Scaling Laws for Neural Language Models

Latent Dirichlet Allocation

Hi,
I'm Abhishek Shah