Sales Playbook Optimization

Machine Learning • XGBoost • Streamlit • Docker • Business Analytics

This project tackled the challenge of improving B2B sales strategy using predictive analytics. I led the end-to-end development of a machine learning-driven playbook that forecasts deal outcomes and segments high-value customers, transforming static sales processes into dynamic, data-informed strategies.

📊 Data Preparation & Exploration

We worked with three anonymized HubSpot datasets (Companies, Deals, Tickets), cleaning over 100 features and handling high missingness. EDA revealed that the dataset was US-centric and dominated by small-to-midsize companies. We engineered duration-based features and corrected severe class imbalance across industry and customer type categories.

🛠 Feature Engineering & Modeling

After merging company and deal data via mapping dictionaries, I engineered business-relevant features such as region zone, deal size category, and revenue brackets. I used Random Forest to identify top predictors, selecting 10 robust features spanning behavioral and demographic attributes.

I trained and evaluated six models (Random Forest, XGBoost, AdaBoost, KNN, Logistic Regression, and Decision Trees). Ensemble models like XGBoost and AdaBoost delivered high AUC-ROC scores, with high accuracy. To ensure model integrity, features prone to leakage (e.g., deal probability, stage) were excluded.

📈 Insights & Strategy

We found that smaller companies and industries like Manufacturing and Retail had the highest win rates. Customer types labeled “In Trial” or “Partner” had over 80% conversion, while “Prospect” and “Vendor” had the lowest. These insights shaped our lead-scoring and segmentation logic.

🧠 Deployment & Dashboard

I deployed the model using Streamlit and Docker. The dashboard includes five modules: deal prediction, filtered analytics, win-rate insights, segment discovery, and segment comparisons. This tool empowers sales teams to make fast, data-backed decisions and focus efforts on high-impact deals.

🚀 Tech Stack

Python • pandas • scikit-learn • XGBoost • Streamlit • Docker • Git • matplotlib • seaborn

🔗 GitHub & Live App

View Code on GitHub Try the Dashboard Back to projects