Sales Playbook Optimization with Machine Learning

Dynamic B2B Sales Strategy Powered by Predictive Analytics

XGBoost · KMeans Clustering · Streamlit · Docker

DS-5640 Machine Learning | Vanderbilt University

Traditional sales playbooks are static and reactive. This project builds a dynamic alternative using predictive analytics and classification models, delivering an intelligent, evolving guide to improve B2B deal closures. The final deliverable is an interactive Streamlit dashboard backed by a trained ML model in a Dockerized deployment.

Data Scale

19,851

Company records

593

Deal records

6

Models compared

Machine Learning Approach

Trained and compared 6 algorithms to find the best deal outcome predictor:

XGBoost

Winner

Random Forest

AdaBoost

KNN

Decision Tree

Logistic Reg.

XGBoost selected for best test performance and robust generalization

Customer Segmentation (KMeans)

Grouped companies by revenue, engagement, and age into actionable segments:

High-Value

Top priority accounts

Active Clients

Engaged, growing potential

Low-Value

Deprioritize or nurture

Streamlit Dashboard Features

  • Deal Outcome Predictor: Predicts win/loss for new deals based on entered parameters
  • Dataset Filter Tool: Drill down into data by any column and value
  • Sales Summary: Visualize win rates and lead scoring
  • High-Value Segments: Identify top-converting industries and company sizes
  • Cross-Segment Comparison: Compare win rates across customer attributes

Feature Engineering

  • Custom fields for revenue buckets, tech stack indicators, deal size categories
  • Categorical encoding, scaling, and imputation strategies
  • Leak-proof feature pipeline ensuring clean train/test split
  • Handled high-missing columns and outliers in company data

Technologies Used

PythonXGBoostScikit-learnKMeansStreamlitDockerPandasHubSpot Data

Team

  • Roshan Siddartha Sivakumar
  • Xiaochen Liu
  • Anna Lorenz
  • Najma Thomas-Akpanoko