ML in Lean Six Sigma: Partner Revenue Velocity Analysis (with Synthetic Data)

DEFINE

Problem: Partner revenue velocity inconsistency

Target: 25% velocity improvement

MEASURE

Current State Metrics:

Avg Revenue: $1.3M
Velocity: 0.25 (25%)
Cycle Time: 15.16 days

ANALYZE

Root Causes:

Work order delays (1-26 days)
Approval bottlenecks (4-23 days)
Variable opportunity count (8-51)

IMPROVE

Solutions:

Standardize approval process
Reduce cycle time variation
Optimize resource allocation

CONTROL

Control Plan:

Monthly velocity tracking
SPC charts for monitoring
Process audits

Key Performance Indicators

$1.3M

Avg Revenue

↗ 15%

25%

Pipeline Velocity

→ 0%

15.16

Avg Cycle Days

↘ 8%

Opportunities

↗ 12%

Current State Process Map

Lead Generation

→

Opportunity Creation

→

Work Order

→

Approval

→

Revenue

Waiting Rework Overprocessing

Executive Summary

This Lean Six Sigma analysis examines partner revenue velocity using synthetic data spanning 100 observations from 2022-2030. The study identifies key process inefficiencies and proposes targeted improvements.

Problem Statement

Partner revenue velocity shows significant variation (0.12-0.44) with extended cycle times averaging 15.16 days, indicating process waste and inefficiency in the partner revenue generation system.

Data Analysis Results

Revenue Range: $687K - $1.9M (Mean: $1.3M)
Velocity Variation: 0.12 - 0.44 (σ = 0.08)
Cycle Time: 1-26 days (Mean: 15.16, σ = 6.2)
Opportunity Count: 8-51 (Mean: 25)

Lean Six Sigma Recommendations

Standardize Work: Implement standard operating procedures for work order processing
Reduce Variation: Apply statistical process control to minimize cycle time variation
Eliminate Waste: Remove non-value-added activities in approval processes
Continuous Monitoring: Establish control charts for ongoing performance tracking

Expected Benefits

25% improvement in pipeline velocity
30% reduction in cycle time variation
15% increase in partner satisfaction
$200K annual cost savings

R Code Implementation

# ========================================================
# ML Model to Predict Partner Delivered Revenue (Synthetic Data)
# ========================================================

# Install missing packages
install.packages(c("corrplot", "caret", "gridExtra", "scales", "randomForest"))

# Load necessary libraries
library(tidyverse)
library(corrplot)
library(caret)
library(gridExtra)
library(scales)
library(randomForest)

# ========================================================
# 1. Generate Synthetic Data
# ========================================================

set.seed(42)
n_samples <- 100

# Generate synthetic data with realistic relationships
data <- tibble(
  Month = seq(as.Date("2022-01-01"), by = "month", length.out = n_samples),
  woCreate2CompleteDays = round(rnorm(n_samples, mean = 15, sd = 5)),
  woaCreate2CompleteDays = round(rnorm(n_samples, mean = 12, sd = 4)),
  PartnerBillableBooking = round(rnorm(n_samples, mean = 500000, sd = 150000)),
  PartnerPipelineVelocity = round(rnorm(n_samples, mean = 0.25, sd = 0.08), 2),
  POapprovalCycleTime = round(rnorm(n_samples, mean = 8, sd = 3)),
  PartnerOppCount = round(rnorm(n_samples, mean = 25, sd = 8))
) %>%
  # Generate revenue with realistic relationships to predictors
  mutate(
    PartnerDeliveredRevenue = round(
      300000 + 
      PartnerBillableBooking * 0.8 +
      PartnerPipelineVelocity * 2000000 +
      PartnerOppCount * 15000 -
      woCreate2CompleteDays * 8000 -
      woaCreate2CompleteDays * 5000 -
      POapprovalCycleTime * 12000 +
      rnorm(n_samples, 0, 100000)
    )
  ) %>%
  # Ensure positive values
  mutate(
    woCreate2CompleteDays = pmax(woCreate2CompleteDays, 1),
    woaCreate2CompleteDays = pmax(woaCreate2CompleteDays, 1),
    PartnerBillableBooking = pmax(PartnerBillableBooking, 50000),
    PartnerPipelineVelocity = pmax(PartnerPipelineVelocity, 0.05),
    POapprovalCycleTime = pmax(POapprovalCycleTime, 1),
    PartnerOppCount = pmax(PartnerOppCount, 5),
    PartnerDeliveredRevenue = pmax(PartnerDeliveredRevenue, 100000)
  )

# Display structure and summary
str(data)
summary(data)

# ========================================================
# 2. Exploratory Data Analysis
# ========================================================

my_colors <- c("#2C3E50", "#E74C3C", "#3498DB", "#2ECC71", "#F1C40F", "#9B59B6")

# Time series plot
p1 <- ggplot(data, aes(x = Month, y = PartnerDeliveredRevenue)) +
  geom_line(color = "#2C3E50", size = 1) +
  geom_point(color = "#E74C3C", size = 2) +
  theme_minimal() +
  labs(title = "Partner Delivered Revenue Over Time (Synthetic Data)",
       x = "Month", y = "Revenue") +
  scale_y_continuous(labels = function(x) paste0("$", format(x/1000000, digits = 1), "M")) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        axis.text.x = element_text(angle = 45, hjust = 1))

print(p1)

# Variable distributions
p2 <- data %>%
  select(-Month) %>%
  gather(key = "variable", value = "value") %>%
  ggplot(aes(x = value)) +
  geom_histogram(fill = "#3498DB", color = "#2C3E50", bins = 15) +
  facet_wrap(~ variable, scales = "free") +
  theme_minimal() +
  labs(title = "Distribution of Variables", x = "", y = "Frequency") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"))

print(p2)

# Correlation matrix
correlation_matrix <- cor(data %>% select(-Month))
corrplot(correlation_matrix, 
         method = "color", 
         type = "upper", 
         tl.col = "black",
         tl.cex = 0.8,
         col = colorRampPalette(c("#E74C3C", "#FFFFFF", "#2ECC71"))(100),
         title = "Correlation Matrix")

# ========================================================
# 3. Multiple Linear Regression Model
# ========================================================

set.seed(123)
trainIndex <- createDataPartition(data$PartnerDeliveredRevenue, p = .8, list = FALSE)
train_data <- data[trainIndex, ]
test_data <- data[-trainIndex, ]

# Build model
lm_model <- lm(PartnerDeliveredRevenue ~ woCreate2CompleteDays + woaCreate2CompleteDays + 
               PartnerBillableBooking + PartnerPipelineVelocity + POapprovalCycleTime + 
               PartnerOppCount, data = train_data)

summary(lm_model)

# Test predictions
predictions <- predict(lm_model, test_data)
test_rmse <- sqrt(mean((test_data$PartnerDeliveredRevenue - predictions)^2))
test_r2 <- cor(test_data$PartnerDeliveredRevenue, predictions)^2

cat("Test RMSE:", test_rmse, "\n")
cat("Test R-squared:", test_r2, "\n")

# ========================================================
# 4. Random Forest Model
# ========================================================

rf_model <- randomForest(
  PartnerDeliveredRevenue ~ woCreate2CompleteDays + woaCreate2CompleteDays + 
                         PartnerBillableBooking + PartnerPipelineVelocity + 
                         POapprovalCycleTime + PartnerOppCount,
  data = train_data,
  ntree = 500,
  importance = TRUE
)

print(rf_model)

# Variable importance
importance(rf_model)
varImpPlot(rf_model)

# Random Forest predictions
rf_predictions <- predict(rf_model, test_data)
rf_rmse <- sqrt(mean((test_data$PartnerDeliveredRevenue - rf_predictions)^2))
rf_r2 <- cor(test_data$PartnerDeliveredRevenue, rf_predictions)^2

cat("Random Forest Test RMSE:", rf_rmse, "\n")
cat("Random Forest Test R-squared:", rf_r2, "\n")

# ========================================================
# 5. Model Comparison
# ========================================================

# Compare predictions
comparison_df <- data.frame(
  Actual = test_data$PartnerDeliveredRevenue,
  Linear_Model = predictions,
  Random_Forest = rf_predictions
)

# Prediction vs Actual plots
p_lm <- ggplot(comparison_df, aes(x = Actual, y = Linear_Model)) +
  geom_point(color = "#3498DB", size = 2) +
  geom_abline(intercept = 0, slope = 1, color = "#E74C3C", linetype = "dashed") +
  theme_minimal() +
  labs(title = "Linear Model: Predicted vs Actual",
       x = "Actual Revenue", y = "Predicted Revenue") +
  scale_x_continuous(labels = function(x) paste0("$", format(x/1000000, digits = 1), "M")) +
  scale_y_continuous(labels = function(x) paste0("$", format(x/1000000, digits = 1), "M"))

p_rf <- ggplot(comparison_df, aes(x = Actual, y = Random_Forest)) +
  geom_point(color = "#2ECC71", size = 2) +
  geom_abline(intercept = 0, slope = 1, color = "#E74C3C", linetype = "dashed") +
  theme_minimal() +
  labs(title = "Random Forest: Predicted vs Actual",
       x = "Actual Revenue", y = "Predicted Revenue") +
  scale_x_continuous(labels = function(x) paste0("$", format(x/1000000, digits = 1), "M")) +
  scale_y_continuous(labels = function(x) paste0("$", format(x/1000000, digits = 1), "M"))

grid.arrange(p_lm, p_rf, ncol = 2)

ML in Lean Six Sigma - Partner Revenue Velocity