31  Retail and E-commerce Innovation

NoteChapter Overview

Retail and e-commerce—from product discovery to inventory management to customer experience—operate on matching supply with demand, understanding customer preferences, and optimizing operational efficiency. This chapter applies embeddings to retail transformation: product discovery and matching using multi-modal embeddings that understand products from images, text descriptions, and behavioral signals to enable semantic search beyond keyword matching, visual search and style transfer with image embeddings that let customers find products by uploading photos or describing aesthetic preferences, inventory optimization through demand embeddings that forecast stockouts and overstock situations weeks in advance, customer journey analysis via sequential embeddings of touchpoints and interactions that identify friction points and conversion opportunities, and dynamic catalog management using embedding-based product relationships to automatically create collections, recommendations, and merchandising strategies. These techniques transform retail from static catalogs and rule-based recommendations to adaptive, learned representations that capture the full complexity of product semantics, customer preferences, and market dynamics.

Building on the cross-industry patterns for security and automation (Chapter 26), embeddings enable retail and e-commerce innovation at unprecedented scale. Traditional retail systems rely on keyword search (exact text matching), manual categorization (static taxonomies), demographic segments (age, gender, location), and rule-based recommendations (frequently bought together). Embedding-based retail systems represent products, customers, and sessions as vectors, enabling semantic product discovery that understands intent rather than keywords, visual similarity that transcends categorical boundaries, hyper-personalized recommendations based on implicit preference signals, and demand forecasting that learns seasonal patterns and trend dynamics—providing competitive advantages measured in conversion rates, average order values, and customer lifetime value.

31.1 Product Discovery and Matching

E-commerce product catalogs contain millions of SKUs with heterogeneous attributes, inconsistent naming, and varying quality of metadata. Embedding-based product discovery represents products as vectors learned from images, descriptions, specifications, reviews, and behavioral signals, enabling semantic search that understands product relationships invisible to keyword matching.

31.1.1 The Product Discovery Challenge

Traditional product search faces limitations:

  • Keyword mismatch: User searches “laptop” but product titled “notebook computer”
  • Attribute explosion: Products have hundreds of attributes (color, size, material, brand)
  • Taxonomy rigidity: Products force-fit into categories (yoga pants: athletic wear or fashion?)
  • Long-tail queries: “waterproof hiking boots under $150 with good arch support”
  • Cross-lingual: Different languages, regional terminology variations
  • Visual-textual gap: User has image in mind, searches with inadequate words

Embedding approach: Learn product embeddings from multi-modal signals—images encode visual appearance, text encodes semantic meaning, behavioral signals encode utility. Products that solve similar needs cluster together even with different keywords or categories. Search becomes retrieval in embedding space: query → embedding → nearest neighbor products.

"""
Product Discovery with Multi-Modal Embeddings

Architecture:
1. Image encoder: CNN/Vision Transformer for product photos
2. Text encoder: BERT for titles, descriptions, specifications
3. Behavioral encoder: Co-purchase, co-view patterns
4. Multi-modal fusion: Combine image, text, behavioral signals
5. Query encoder: Map search queries to product embedding space

Techniques:
- Contrastive learning: Products co-purchased/co-viewed closer in space
- Hard negative mining: Similar-looking but functionally different products
- Multi-task learning: Search relevance, click-through, purchase prediction
- Cross-modal retrieval: Text query → image results, image query → text results
- Hierarchical embeddings: Category, brand, product levels

Production considerations:
- Index size: 10M-1B products, <100ms retrieval
- Freshness: New products immediately searchable
- Personalization: Adapt embeddings to user preferences
- Explainability: Why these results for this query?
- A/B testing: Measure impact on conversion, revenue
"""

from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Dict, List, Optional

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F


@dataclass
class Product:
    """Product representation for e-commerce"""
    product_id: str
    title: str
    description: str
    category: List[str]  # Hierarchical: ["Electronics", "Computers", "Laptops"]
    brand: str
    price: float
    attributes: Dict[str, Any] = field(default_factory=dict)
    images: List[str] = field(default_factory=list)
    reviews: List[str] = field(default_factory=list)
    rating: float = 0.0
    review_count: int = 0
    inventory: int = 0
    created_at: Optional[datetime] = None
    embedding: Optional[np.ndarray] = None


@dataclass
class SearchQuery:
    """User search query"""
    query_id: str
    user_id: str
    query_text: Optional[str] = None
    query_image: Optional[str] = None
    filters: Dict[str, Any] = field(default_factory=dict)
    timestamp: Optional[datetime] = None
    session_id: Optional[str] = None
    embedding: Optional[np.ndarray] = None


class ImageEncoder(nn.Module):
    """Encode product images to embeddings using CNN backbone"""

    def __init__(self, backbone="resnet50", embedding_dim=512):
        super().__init__()
        self.embedding_dim = embedding_dim
        # Simplified CNN backbone (in production: use torchvision.models)
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
        self.global_pool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(256, embedding_dim)

    def forward(self, images: torch.Tensor) -> torch.Tensor:
        x = F.relu(self.conv1(images))
        x = self.pool(x)
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        x = F.relu(self.conv3(x))
        x = self.global_pool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return F.normalize(x, p=2, dim=1)


class TextEncoder(nn.Module):
    """Encode product text to embeddings using Transformer"""

    def __init__(self, vocab_size=30000, embedding_dim=512, hidden_dim=768):
        super().__init__()
        self.embedding_dim = embedding_dim
        self.token_embedding = nn.Embedding(vocab_size, hidden_dim)
        self.position_embedding = nn.Embedding(512, hidden_dim)
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=hidden_dim, nhead=8, dim_feedforward=2048, batch_first=True
        )
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=6)
        self.fc = nn.Linear(hidden_dim, embedding_dim)

    def forward(self, token_ids: torch.Tensor) -> torch.Tensor:
        batch_size, seq_len = token_ids.shape
        positions = torch.arange(seq_len, device=token_ids.device).unsqueeze(0)
        x = self.token_embedding(token_ids) + self.position_embedding(positions)
        x = self.transformer(x)
        x = x[:, 0, :]  # [CLS] token
        x = self.fc(x)
        return F.normalize(x, p=2, dim=1)


class BehavioralEncoder(nn.Module):
    """Encode behavioral signals (co-purchase, co-view) to embeddings"""

    def __init__(self, num_products=1000000, embedding_dim=512):
        super().__init__()
        self.embedding_dim = embedding_dim
        self.product_embeddings = nn.Embedding(num_products, embedding_dim)

    def forward(self, product_ids: torch.Tensor) -> torch.Tensor:
        embeddings = self.product_embeddings(product_ids)
        return F.normalize(embeddings, p=2, dim=1)


class MultiModalProductEncoder(nn.Module):
    """Fuse image, text, and behavioral embeddings with attention"""

    def __init__(self, embedding_dim=512):
        super().__init__()
        self.embedding_dim = embedding_dim
        self.image_encoder = ImageEncoder(embedding_dim=embedding_dim)
        self.text_encoder = TextEncoder(embedding_dim=embedding_dim)
        self.behavioral_encoder = BehavioralEncoder(embedding_dim=embedding_dim)

        # Fusion network: combine modalities
        self.fusion = nn.Sequential(
            nn.Linear(embedding_dim * 3, embedding_dim * 2),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Linear(embedding_dim * 2, embedding_dim),
        )
        # Modality attention: learn importance of each modality
        self.modality_attention = nn.Sequential(
            nn.Linear(embedding_dim * 3, 3), nn.Softmax(dim=1)
        )

    def forward(
        self,
        images: Optional[torch.Tensor] = None,
        text: Optional[torch.Tensor] = None,
        product_ids: Optional[torch.Tensor] = None,
    ) -> torch.Tensor:
        batch_size = (
            images.size(0) if images is not None
            else text.size(0) if text is not None
            else product_ids.size(0)
        )

        # Encode each available modality
        modality_embeddings = []
        if images is not None:
            img_emb = self.image_encoder(images)
        else:
            img_emb = torch.zeros(batch_size, self.embedding_dim, device=text.device)
        modality_embeddings.append(img_emb)

        if text is not None:
            txt_emb = self.text_encoder(text)
        else:
            txt_emb = torch.zeros(batch_size, self.embedding_dim, device=images.device)
        modality_embeddings.append(txt_emb)

        if product_ids is not None:
            beh_emb = self.behavioral_encoder(product_ids)
        else:
            # Determine device from available inputs
            device = images.device if images is not None else text.device
            beh_emb = torch.zeros(batch_size, self.embedding_dim, device=device)
        modality_embeddings.append(beh_emb)

        # Attention-weighted fusion
        concat = torch.cat(modality_embeddings, dim=1)
        attention_weights = self.modality_attention(concat)
        weighted_sum = (
            attention_weights[:, 0:1] * modality_embeddings[0]
            + attention_weights[:, 1:2] * modality_embeddings[1]
            + attention_weights[:, 2:3] * modality_embeddings[2]
        )

        fused = self.fusion(concat)
        final_embedding = (weighted_sum + fused) / 2
        return F.normalize(final_embedding, p=2, dim=1)
TipProduct Discovery Best Practices

Data preparation:

  • Multi-modal alignment: Ensure images and text describe same product
  • Image quality: Multiple views (front, side, detail), consistent backgrounds
  • Text normalization: Standardize product titles, expand abbreviations
  • Attribute extraction: NER for brand, material, color, size from free text
  • Review mining: Extract product aspects from customer reviews

Modeling:

  • Pre-training: Use ImageNet for images, product corpus for text
  • Contrastive learning: (query, clicked product) positive, (query, skipped product) negative (see Chapter 15)
  • Hard negatives: Products with similar text but different visual style
  • Multi-task: Search relevance + category classification + price prediction
  • Cross-modal: Image query → text results, text query → image results

Production:

  • Indexing: FAISS/ScaNN for billion-scale ANN search
  • Freshness: New products indexed in real-time (<1 second)
  • Personalization: Adapt query embedding to user preferences
  • Diversity: Avoid returning 10 products from same brand
  • A/B testing: Measure impact on CTR, conversion, revenue

Challenges:

  • Cold start: New products with no behavioral data
  • Seasonal drift: “jacket” means different things in summer vs winter
  • Regional variation: Terminology differs by geography, language
  • Attribute sparsity: Not all products have complete metadata
  • Computational cost: Encoding products in real-time vs pre-computing

31.2 Visual Search and Style Transfer

Traditional text search breaks down when customers know what they want visually but struggle to describe it in words. Embedding-based visual search enables customers to find products by uploading photos, screenshots, or describing visual attributes, transforming product discovery from keyword dependency to intuitive visual browsing.

31.2.1 The Visual Search Challenge

Visual product search faces unique challenges:

  • Cross-domain gap: User’s photo (outdoor, poor lighting) vs catalog photos (studio, perfect lighting)
  • Partial views: User photos show part of product (sleeve pattern, shoe detail)
  • Style description: “Something like this but more casual” requires understanding style dimensions
  • Composition: User photo has multiple items, search for specific element
  • Style transfer: “Find jeans that match this shirt’s vibe”

Embedding approach: Learn visual embeddings that capture style attributes (color, pattern, silhouette, material) independently of photography conditions. Visual similarity becomes retrieval in embedding space where style-similar products cluster together regardless of exact appearance.

"""
Visual Search and Style Transfer

Architecture:
1. Image encoder: CNN/ViT trained on product images
2. Style extractor: Disentangle content vs style (color, texture, shape)
3. Cross-domain alignment: Map user photos to catalog photo space
4. Style transfer: Generate embeddings for "product A with style of B"
"""

from dataclasses import dataclass, field
from enum import Enum
from typing import Any, Dict, List, Optional, Tuple

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F


class StyleAttribute(Enum):
    """Visual style attributes"""
    COLOR = "color"
    PATTERN = "pattern"
    TEXTURE = "texture"
    SILHOUETTE = "silhouette"
    MATERIAL = "material"


class StyleAttributeExtractor(nn.Module):
    """
    Extract disentangled style attributes from images.
    Enables fine-grained style transfer: "Find dress with this color
    but different pattern" or "Same silhouette but different material"
    """

    def __init__(self, attribute_dim=128):
        super().__init__()
        self.attribute_dim = attribute_dim

        self.feature_extractor = nn.Sequential(
            nn.Conv2d(3, 64, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(64, 128, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(128, 256, 3, padding=1), nn.ReLU(),
            nn.AdaptiveAvgPool2d((7, 7)),
        )

        # Attribute-specific heads
        self.color_head = nn.Sequential(
            nn.Linear(256 * 7 * 7, 512), nn.ReLU(), nn.Linear(512, attribute_dim)
        )
        self.pattern_head = nn.Sequential(
            nn.Linear(256 * 7 * 7, 512), nn.ReLU(), nn.Linear(512, attribute_dim)
        )
        self.silhouette_head = nn.Sequential(
            nn.Linear(256 * 7 * 7, 512), nn.ReLU(), nn.Linear(512, attribute_dim)
        )
        self.material_head = nn.Sequential(
            nn.Linear(256 * 7 * 7, 512), nn.ReLU(), nn.Linear(512, attribute_dim)
        )

    def forward(self, images: torch.Tensor) -> Dict[str, torch.Tensor]:
        features = self.feature_extractor(images)
        features_flat = features.view(features.size(0), -1)
        return {
            "color": F.normalize(self.color_head(features_flat), p=2, dim=1),
            "pattern": F.normalize(self.pattern_head(features_flat), p=2, dim=1),
            "silhouette": F.normalize(self.silhouette_head(features_flat), p=2, dim=1),
            "material": F.normalize(self.material_head(features_flat), p=2, dim=1),
        }


class CrossDomainAdapter(nn.Module):
    """
    Adapt user-uploaded photos to catalog photo space.
    Bridges differences in lighting, background, angle, quality.
    """

    def __init__(self, embedding_dim=512):
        super().__init__()
        self.adapter = nn.Sequential(
            nn.Linear(embedding_dim, 512), nn.ReLU(),
            nn.Dropout(0.2), nn.Linear(512, embedding_dim)
        )

    def forward(self, user_embeddings: torch.Tensor) -> torch.Tensor:
        adapted = self.adapter(user_embeddings)
        adapted = user_embeddings + adapted  # Residual connection
        return F.normalize(adapted, p=2, dim=1)


class StyleTransferEngine(nn.Module):
    """
    Generate embedding for product A with style of B.
    Use cases: "Find jeans that match this shirt" (color coordination)
    """

    def __init__(self, embedding_dim=512, attribute_dim=128):
        super().__init__()
        self.style_extractor = StyleAttributeExtractor(attribute_dim)
        self.fusion = nn.Sequential(
            nn.Linear(embedding_dim + attribute_dim * 4, 1024),
            nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(1024, embedding_dim),
        )

    def transfer_style(
        self,
        content_emb: torch.Tensor,
        style_image: torch.Tensor,
        intensity: float = 0.5,
    ) -> torch.Tensor:
        style_attrs = self.style_extractor(style_image)
        style_vector = torch.cat([
            style_attrs["color"], style_attrs["pattern"],
            style_attrs["silhouette"], style_attrs["material"]
        ], dim=1)

        combined = torch.cat([content_emb, style_vector], dim=1)
        transferred = self.fusion(combined)
        transferred = intensity * transferred + (1 - intensity) * content_emb
        return F.normalize(transferred, p=2, dim=1)
TipVisual Search Best Practices

Data preparation:

  • Multi-view images: Front, side, back, detail shots for each product
  • Consistent quality: Standardize catalog photos (lighting, background, resolution)
  • User photo collection: Gather real user-uploaded images for training
  • Data augmentation: Vary lighting, angle, background for robustness
  • Object detection: Annotate bounding boxes to focus on product

Modeling:

  • Pre-training: ImageNet, fashion-specific datasets (DeepFashion)
  • Metric learning: Triplet loss with hard negative mining (see Chapter 16 and Chapter 15)
  • Multi-task: Visual similarity + category + attributes
  • Domain adaptation: Bridge user photos and catalog photos
  • Style disentanglement: Separate color, pattern, shape, material

Production:

  • Mobile optimization: Support various aspect ratios, low-resolution
  • Real-time encoding: <200ms for uploaded images
  • Object detection: Segment products from backgrounds
  • Privacy: Process images securely, delete after encoding
  • Explainability: Show matched attributes (color, pattern, style)

Challenges:

  • Lighting invariance: Same product looks different in different lighting
  • Pose variation: Products at different angles
  • Occlusion: Partial views, items blocking each other
  • Background clutter: User photos have distracting backgrounds
  • Cross-domain gap: User photos vs professional catalog photos

31.3 Inventory Optimization

Retail inventory management faces the classic trade-off: overstock ties up capital and leads to markdowns, while stockouts lose sales and frustrate customers. Embedding-based inventory optimization learns demand patterns from product features, temporal signals, and market dynamics to forecast demand at SKU-region-week granularity, enabling optimal stock levels that balance holding costs and lost sales.

31.3.1 The Inventory Challenge

Traditional inventory management faces limitations:

  • Cold start: New products have no sales history
  • Seasonal patterns: Complex seasonality (holidays, weather, trends)
  • Substitution effects: Stockouts of product A drive sales of product B
  • Regional variation: Same product, different demand by location
  • Promotion response: How do discounts affect demand?
  • Long-tail: 80% of SKUs have sparse, noisy demand signals

Embedding approach: Represent products as embeddings that encode attributes (category, brand, price, style), learn temporal embeddings of demand patterns, and model regional preferences. Similar products have similar demand curves; new products inherit forecast from similar items; promotion effects transfer across comparable SKUs.

"""
Inventory Optimization with Demand Embeddings

Architecture:
1. Product encoder: SKU → embedding (attributes, historical demand)
2. Temporal encoder: Time series embedding (seasonality, trends)
3. Regional encoder: Location-specific demand patterns
4. Demand forecaster: Product + time + region → demand prediction
"""

from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from typing import Any, Dict, Optional, Tuple

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F


class DemandRegime(Enum):
    """Demand pattern categories"""
    STEADY = "steady"
    SEASONAL = "seasonal"
    TRENDING_UP = "trending_up"
    TRENDING_DOWN = "trending_down"
    VOLATILE = "volatile"


class ProductEncoder(nn.Module):
    """Encode products for demand forecasting"""

    def __init__(self, num_categories=1000, num_brands=5000, embedding_dim=256):
        super().__init__()
        self.category_emb = nn.Embedding(num_categories, 64)
        self.brand_emb = nn.Embedding(num_brands, 64)
        self.numerical_proj = nn.Linear(10, 64)
        self.demand_lstm = nn.LSTM(
            input_size=1, hidden_size=128, num_layers=2,
            batch_first=True, dropout=0.2
        )
        self.fusion = nn.Sequential(
            nn.Linear(64 + 64 + 64 + 128, 512), nn.ReLU(),
            nn.Dropout(0.2), nn.Linear(512, embedding_dim),
        )

    def forward(self, category_ids, brand_ids, numerical_features, demand_history):
        cat_emb = self.category_emb(category_ids)
        brand_emb = self.brand_emb(brand_ids)
        num_emb = self.numerical_proj(numerical_features)
        demand_history = demand_history.unsqueeze(-1)
        _, (h_n, _) = self.demand_lstm(demand_history)
        demand_emb = h_n[-1]
        combined = torch.cat([cat_emb, brand_emb, num_emb, demand_emb], dim=1)
        return F.normalize(self.fusion(combined), p=2, dim=1)


class TemporalEncoder(nn.Module):
    """Encode time-dependent patterns (seasonality, trends, events)"""

    def __init__(self, embedding_dim=128):
        super().__init__()
        self.cyclical_proj = nn.Linear(8, 64)
        self.trend_proj = nn.Linear(3, 32)
        self.event_emb = nn.Embedding(100, 32)
        self.fusion = nn.Sequential(
            nn.Linear(64 + 32 + 32, embedding_dim), nn.ReLU()
        )

    def forward(self, timestamps, trends, event_ids):
        # Encode periodic patterns with sin/cos
        day = (timestamps % (7 * 24 * 3600)) / (7 * 24 * 3600)
        week = (timestamps % (52 * 7 * 24 * 3600)) / (52 * 7 * 24 * 3600)
        cyclical = torch.stack([
            torch.sin(2 * np.pi * day), torch.cos(2 * np.pi * day),
            torch.sin(2 * np.pi * week), torch.cos(2 * np.pi * week),
        ] + [torch.zeros_like(day)] * 4, dim=1)

        cyclical_emb = self.cyclical_proj(cyclical)
        trend_emb = self.trend_proj(trends)
        event_emb = self.event_emb(event_ids)
        return self.fusion(torch.cat([cyclical_emb, trend_emb, event_emb], dim=1))


class DemandForecaster(nn.Module):
    """Forecast demand with uncertainty quantification"""

    def __init__(self, embedding_dim=256):
        super().__init__()
        self.product_encoder = ProductEncoder(embedding_dim=embedding_dim)
        self.temporal_encoder = TemporalEncoder(embedding_dim=128)

        total_dim = embedding_dim + 128
        self.demand_head = nn.Sequential(
            nn.Linear(total_dim, 512), nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(512, 256), nn.ReLU(), nn.Linear(256, 1),
        )
        self.uncertainty_head = nn.Sequential(
            nn.Linear(total_dim, 256), nn.ReLU(), nn.Linear(256, 1)
        )
        self.regime_head = nn.Sequential(
            nn.Linear(total_dim, 256), nn.ReLU(), nn.Linear(256, len(DemandRegime))
        )

    def forward(self, category_ids, brand_ids, numerical_features,
                demand_history, timestamps, trends, event_ids):
        product_emb = self.product_encoder(
            category_ids, brand_ids, numerical_features, demand_history
        )
        temporal_emb = self.temporal_encoder(timestamps, trends, event_ids)
        combined = torch.cat([product_emb, temporal_emb], dim=1)

        demand = F.relu(self.demand_head(combined))
        log_variance = self.uncertainty_head(combined)
        uncertainty = torch.exp(0.5 * log_variance)
        regime_logits = self.regime_head(combined)

        return demand, uncertainty, regime_logits
TipInventory Optimization Best Practices

Data preparation:

  • Historical demand: Clean sales data (remove stockouts, promotions)
  • Product hierarchy: Category → subcategory → brand → SKU
  • External factors: Weather, events, competitor pricing, trends
  • Regional data: Demographics, store traffic, local preferences
  • Supply chain: Lead times, supplier reliability, minimum order quantities

Modeling:

  • Transfer learning: Similar products share demand patterns (see Chapter 14)
  • Hierarchical forecasting: Top-down (category) + bottom-up (SKU)
  • Multi-task: Demand + stockout probability + markdown risk
  • Uncertainty quantification: Prediction intervals, not just point estimates
  • Regime detection: Identify demand pattern changes (trending, seasonal)

Production:

  • Scale: Millions of SKUs × thousands of locations
  • Freshness: Daily forecast updates with latest sales
  • Cold start: Immediate forecasts for new products
  • Explainability: Why forecast changed, which factors matter
  • Integration: Forecasts → ordering systems → fulfillment

Challenges:

  • Sparse demand: Long-tail SKUs have intermittent sales
  • Promotion effects: Discounts create demand spikes
  • Substitution: Stockouts shift demand to alternatives
  • Cannibalization: New products steal sales from existing
  • Bullwhip effect: Demand variability amplifies upstream

31.4 Customer Journey Analysis

E-commerce customer journeys involve dozens of touchpoints across channels (web, mobile, email, ads) before conversion. Embedding-based customer journey analysis represents sessions, user actions, and customer states as vectors, enabling identification of conversion patterns, friction points, and optimal intervention moments for hyper-personalized experiences.

31.4.1 The Customer Journey Challenge

Traditional journey analytics face limitations:

  • High dimensionality: Thousands of possible page sequences, product views, interactions
  • Variable length: Journeys range from single visit to months of browsing
  • Multi-channel: Users switch between devices, channels mid-journey
  • Individual variation: No two customers follow same path
  • Causality: Did email cause purchase or coincide with intent?
  • Real-time personalization: Must predict next action in <50ms

Embedding approach: Learn sequential embeddings where customer states evolve through session history, similar journey patterns cluster together, and distance to conversion embedding predicts purchase probability. Enables real-time journey stage detection and micro-moment personalization based on implicit signals.

"""
Customer Journey Analysis with Sequential Embeddings

Architecture:
1. Session encoder: LSTM/Transformer over user actions
2. Journey stage classifier: Browse, consider, decide, convert
3. Friction detector: Identify abandonment risk signals
4. Next action predictor: Recommend optimal intervention
"""

from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from typing import Any, Dict, List, Optional, Set

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F


class ActionType(Enum):
    """User action types"""
    PAGE_VIEW = "page_view"
    PRODUCT_VIEW = "product_view"
    SEARCH = "search"
    ADD_TO_CART = "add_to_cart"
    CHECKOUT_START = "checkout_start"
    PURCHASE = "purchase"


class JourneyStage(Enum):
    """Customer journey stages"""
    AWARENESS = "awareness"
    CONSIDERATION = "consideration"
    INTENT = "intent"
    PURCHASE = "purchase"
    LOYALTY = "loyalty"


class ActionEncoder(nn.Module):
    """Encode user actions to embeddings"""

    def __init__(self, num_action_types=20, num_products=1000000, embedding_dim=128):
        super().__init__()
        self.action_type_emb = nn.Embedding(num_action_types, 64)
        self.product_emb = nn.Embedding(num_products, 64)
        self.temporal_proj = nn.Linear(5, 32)
        self.context_proj = nn.Linear(10, 32)
        self.fusion = nn.Sequential(
            nn.Linear(64 + 64 + 32 + 32, embedding_dim), nn.ReLU()
        )

    def forward(self, action_types, product_ids, temporal_features, context_features):
        action_emb = self.action_type_emb(action_types)
        product_emb = self.product_emb(product_ids)
        temporal_emb = self.temporal_proj(temporal_features)
        context_emb = self.context_proj(context_features)
        combined = torch.cat([action_emb, product_emb, temporal_emb, context_emb], dim=1)
        return self.fusion(combined)


class SessionEncoder(nn.Module):
    """Encode session history to embedding using LSTM + attention"""

    def __init__(self, action_dim=128, embedding_dim=256):
        super().__init__()
        self.lstm = nn.LSTM(
            input_size=action_dim, hidden_size=embedding_dim,
            num_layers=2, batch_first=True, dropout=0.2,
        )
        self.attention = nn.MultiheadAttention(
            embed_dim=embedding_dim, num_heads=8, batch_first=True
        )

    def forward(self, action_embeddings, sequence_lengths=None):
        lstm_out, (h_n, _) = self.lstm(action_embeddings)
        attended, _ = self.attention(lstm_out, lstm_out, lstm_out)
        session_emb = (h_n[-1] + attended.mean(dim=1)) / 2
        return F.normalize(session_emb, p=2, dim=1)


class JourneyAnalyzer(nn.Module):
    """Analyze customer journey and predict outcomes"""

    def __init__(self, embedding_dim=256):
        super().__init__()
        self.action_encoder = ActionEncoder(embedding_dim=128)
        self.session_encoder = SessionEncoder(action_dim=128, embedding_dim=embedding_dim)

        self.stage_classifier = nn.Sequential(
            nn.Linear(embedding_dim, 128), nn.ReLU(),
            nn.Dropout(0.3), nn.Linear(128, len(JourneyStage)),
        )
        self.conversion_predictor = nn.Sequential(
            nn.Linear(embedding_dim, 128), nn.ReLU(),
            nn.Dropout(0.3), nn.Linear(128, 1), nn.Sigmoid(),
        )
        self.friction_detector = nn.Sequential(
            nn.Linear(embedding_dim, 128), nn.ReLU(),
            nn.Dropout(0.3), nn.Linear(128, 1), nn.Sigmoid(),
        )

    def forward(self, action_embeddings, sequence_lengths=None):
        session_emb = self.session_encoder(action_embeddings, sequence_lengths)
        return {
            "stage_logits": self.stage_classifier(session_emb),
            "conversion_prob": self.conversion_predictor(session_emb),
            "friction_score": self.friction_detector(session_emb),
            "embedding": session_emb,
        }
TipCustomer Journey & Hyperpersonalization Best Practices

Data collection:

  • Event tracking: Capture all interactions (views, clicks, time spent)
  • Cross-device: Link sessions across devices via login, fingerprinting
  • Multi-channel: Web, mobile app, email, ads, in-store
  • Temporal granularity: Millisecond timestamps for precise sequencing
  • Privacy: Anonymize PII, respect GDPR/CCPA, allow opt-out

Modeling:

  • Sequential models: LSTM/Transformer for action sequences
  • Attention mechanisms: Learn which past actions predict future
  • Multi-task learning: Stage + conversion + next action + friction
  • Transfer learning: Similar product categories share journey patterns
  • Real-time updating: Stream new actions, update embeddings incrementally

Hyper-personalization:

  • Individual-level: Not segments, actual individual behavior
  • Real-time: Adapt during session, not batch overnight
  • Multi-dimensional: Content, layout, pricing, timing, channel
  • Contextual: Consider time of day, device, location, weather
  • A/B testing: Continuous testing of personalization strategies

Production:

  • Low latency: <50ms end-to-end for real-time personalization
  • Streaming: Process events as they arrive, update embeddings live
  • Scalability: Millions of concurrent sessions
  • Explainability: Why this personalization for this user?
  • Privacy: On-device processing where possible, secure data handling

Challenges:

  • Cold start: New users with no history
  • Sparse data: Many users have few interactions
  • Concept drift: User preferences change over time
  • Attribution: Which touchpoints caused conversion?
  • Privacy: Balance personalization with data protection

31.5 Dynamic Pricing

Pricing is complex: consider product attributes, customer willingness-to-pay, competitive positioning, inventory levels, time-of-day demand. Embedding-based dynamic pricing represents products and customers as vectors, enabling price optimization that considers hundreds of implicit factors.

31.5.1 The Dynamic Pricing Challenge

Traditional pricing approaches:

  • Cost-plus: Price = cost × markup (ignores demand)
  • Competitive: Match competitor prices (race to bottom)
  • Segmented: Fixed tiers (doesn’t capture individual WTP)
  • Regression: Linear models (misses non-linear patterns)

Embedding approach: Learn product embeddings (quality, brand, features) and customer embeddings (purchase history, preferences). Price = f(product_emb, customer_emb, context). See Chapter 14 for approaches to building these embeddings.

Show Dynamic Pricing Engine
import torch
import torch.nn as nn
import numpy as np
from dataclasses import dataclass
from typing import Tuple


@dataclass
class Product:
    """Product with pricing attributes."""
    product_id: str
    category: str
    brand: str
    cost: float
    base_price: float
    embedding: np.ndarray = None


class DemandModel(nn.Module):
    """Predict purchase probability as function of price."""
    def __init__(self, embedding_dim: int = 128, context_dim: int = 10):
        super().__init__()
        input_dim = embedding_dim * 2 + 1 + context_dim
        self.demand_predictor = nn.Sequential(
            nn.Linear(input_dim, 256),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(128, 1),
            nn.Sigmoid()
        )

    def forward(self, product_emb, customer_emb, price, context):
        """Predict purchase probability."""
        combined = torch.cat([product_emb, customer_emb, price, context], dim=1)
        purchase_prob = self.demand_predictor(combined)
        return purchase_prob


class DynamicPricingEngine:
    """Dynamic pricing using embeddings."""
    def __init__(self, demand_model, min_margin: float = 0.2):
        self.demand_model = demand_model
        self.min_margin = min_margin

    def optimize_price(self, product_emb, customer_emb, cost: float,
                       base_price: float, num_price_points: int = 20) -> Tuple[float, float]:
        """Optimize price for product-customer pair."""
        min_price = cost * (1 + self.min_margin)
        max_price = base_price * 1.2
        prices = np.linspace(min_price, max_price, num_price_points)

        best_price = None
        best_profit = -float('inf')

        with torch.no_grad():
            for price in prices:
                price_t = torch.tensor([[price]]).float()
                context_t = torch.zeros(1, 10).float()
                purchase_prob = self.demand_model(
                    product_emb, customer_emb, price_t, context_t
                ).item()

                expected_profit = purchase_prob * (price - cost)
                if expected_profit > best_profit:
                    best_profit = expected_profit
                    best_price = price

        return best_price, best_profit

# Usage example
demand_model = DemandModel(embedding_dim=128)
pricing_engine = DynamicPricingEngine(demand_model, min_margin=0.2)

product_emb = torch.randn(1, 128)
customer_emb = torch.randn(1, 128)
optimal_price, expected_profit = pricing_engine.optimize_price(
    product_emb, customer_emb, cost=50.0, base_price=100.0
)
print(f"Optimal price: ${optimal_price:.2f}, Expected profit: ${expected_profit:.2f}")
Optimal price: $101.05, Expected profit: $28.83
TipDynamic Pricing Best Practices

Demand modeling:

  • Price elasticity: Encode in customer embedding (price sensitivity)
  • Competitive response: Monitor competitor prices, adjust accordingly
  • Temporal patterns: Time-of-day, day-of-week, seasonality
  • Inventory pressure: Increase discount as stock ages

Optimization:

  • Expected profit: price × P(purchase | price) × (price - cost)
  • Multi-objective: Balance revenue, margin, market share
  • Constraints: Minimum margin, maximum discount, price stability
  • A/B testing: Randomized experiments to measure elasticity

Production:

  • Real-time: Recompute prices as conditions change (hourly/daily)
  • Personalization: Different prices for different customer segments
  • Fairness: Avoid discriminatory pricing (same price for same features)
  • Transparency: Explain price changes to customers when asked

Challenges:

  • Strategic behavior: Customers learn to wait for discounts
  • Fairness: Personalized pricing can seem unfair
  • Complexity: Many factors interact non-linearly
  • Adverse selection: Low prices attract low-value customers

31.6 Dynamic Catalog Management

Retail catalogs with millions of SKUs require constant curation: which products to feature, how to organize collections, what to cross-sell, which items to discontinue. Embedding-based dynamic catalog management automates merchandising decisions by learning product relationships, trend dynamics, and customer preferences to continuously optimize product presentation and inventory composition.

31.6.1 The Catalog Management Challenge

Traditional catalog management faces limitations:

  • Manual curation: Merchandisers manually create collections, rules
  • Static taxonomies: Fixed categories don’t adapt to trends
  • Limited relationships: Can only capture explicit attributes
  • Seasonal lag: Slow to respond to emerging trends
  • Scale limitations: Can’t optimize millions of SKUs individually
  • Substitution complexity: Which products are true alternatives?

Embedding approach: Products as vectors enable automatic discovery of relationships (complementary, substitute, seasonal), trend detection through embedding drift, and dynamic collection generation based on learned preferences. Catalog structure emerges from data rather than predetermined by merchandisers.

"""
Dynamic Catalog Management with Product Embeddings

Architecture:
1. Product relationship graph: Learned from co-purchase, co-view, substitution
2. Trend detector: Identify emerging product clusters, seasonal shifts
3. Collection generator: Auto-create curated sets based on coherence
4. Merchandising optimizer: Feature products maximizing engagement + margin
"""

from collections import defaultdict
from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from typing import Any, Dict, List, Optional, Tuple

import numpy as np
import torch
import torch.nn as nn


class ProductRelationType(Enum):
    """Types of product relationships"""
    COMPLEMENT = "complement"  # Bought together (camera + lens)
    SUBSTITUTE = "substitute"  # Alternatives (two similar dresses)
    UPGRADE = "upgrade"  # Premium alternative
    ACCESSORY = "accessory"


class TrendStatus(Enum):
    """Product trend status"""
    EMERGING = "emerging"
    TRENDING = "trending"
    STABLE = "stable"
    DECLINING = "declining"


class ProductRelationshipLearner(nn.Module):
    """Learn product relationships from behavioral data"""

    def __init__(self, num_products=1000000, embedding_dim=256):
        super().__init__()
        self.product_embeddings = nn.Embedding(num_products, embedding_dim)
        self.relation_embeddings = nn.Embedding(len(ProductRelationType), embedding_dim)
        self.relation_scorer = nn.Sequential(
            nn.Linear(embedding_dim * 3, 512), nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(512, 256), nn.ReLU(),
            nn.Linear(256, 1), nn.Sigmoid(),
        )

    def forward(self, product_a_ids, relation_types, product_b_ids):
        prod_a_emb = self.product_embeddings(product_a_ids)
        relation_emb = self.relation_embeddings(relation_types)
        prod_b_emb = self.product_embeddings(product_b_ids)
        combined = torch.cat([prod_a_emb, relation_emb, prod_b_emb], dim=1)
        return self.relation_scorer(combined)


class TrendDetector:
    """Detect emerging trends and product lifecycle stages"""

    def __init__(self):
        self.historical_sales: Dict[str, List[Tuple[datetime, float]]] = defaultdict(list)

    def track_product(self, product_id: str, sales: float, timestamp: datetime):
        self.historical_sales[product_id].append((timestamp, sales))

    def detect_trend(self, product_id: str) -> Tuple[TrendStatus, float]:
        if product_id not in self.historical_sales:
            return TrendStatus.STABLE, 0.0

        sales_history = self.historical_sales[product_id]
        if len(sales_history) < 4:
            return TrendStatus.STABLE, 0.0

        recent_sales = [s for _, s in sales_history[-8:]]
        first_half = np.mean(recent_sales[: len(recent_sales) // 2])
        second_half = np.mean(recent_sales[len(recent_sales) // 2 :])

        if first_half > 0:
            momentum = (second_half - first_half) / first_half
        else:
            momentum = 0.0

        if momentum > 0.3:
            return TrendStatus.EMERGING, momentum
        elif momentum > 0.1:
            return TrendStatus.TRENDING, momentum
        elif momentum < -0.2:
            return TrendStatus.DECLINING, momentum
        return TrendStatus.STABLE, momentum


@dataclass
class MerchandisingDecision:
    """Merchandising decision for product"""
    product_id: str
    action: str  # "feature", "promote", "clearance", "discontinue"
    rationale: str
    urgency: float
    expected_impact: float


class MerchandisingOptimizer:
    """Optimize merchandising decisions based on trends and inventory"""

    def __init__(self, trend_detector: TrendDetector):
        self.trend_detector = trend_detector

    def optimize(self, product_id: str, performance: Dict, inventory: Dict) -> MerchandisingDecision:
        trend_status, momentum = self.trend_detector.detect_trend(product_id)
        stock_level = inventory.get("stock_level", 1.0)

        if trend_status == TrendStatus.EMERGING and stock_level < 0.8:
            return MerchandisingDecision(
                product_id, "feature", "Emerging trend - maximize opportunity",
                0.9, performance.get("sales_velocity", 0.5) * 2.5
            )
        elif trend_status == TrendStatus.DECLINING and stock_level > 1.2:
            return MerchandisingDecision(
                product_id, "clearance", "Declining trend with overstock",
                0.8, -performance.get("margin", 0.3) * 0.3
            )
        return MerchandisingDecision(
            product_id, "maintain", "Stable performance",
            0.2, performance.get("sales_velocity", 0.5)
        )
TipDynamic Catalog Management Best Practices

Data sources:

  • Behavioral: Co-purchase, co-view, cart patterns, substitution
  • Content: Product attributes, descriptions, images
  • Performance: Sales, margin, conversion, returns
  • Inventory: Stock levels, turnover rates, lead times
  • External: Trends, seasonality, competitor pricing, social media

Modeling:

  • Graph neural networks: Product relationship graphs
  • Temporal models: Track trends over time
  • Clustering: Discover natural product groups
  • Multi-objective optimization: Revenue, margin, inventory, diversity
  • Transfer learning: Apply successful patterns across categories

Production:

  • Scale: Millions of products, billions of relationships
  • Freshness: Daily updates to relationships, trends
  • Explainability: Why these products go together?
  • Business rules: Honor brand guidelines, margin requirements
  • A/B testing: Validate automated decisions

Challenges:

  • Cold start: New products with no behavioral data
  • Seasonality: Relationships change seasonally (winter coats + boots)
  • Trend timing: Early detection vs false positives
  • Cannibalization: Featuring one product hurts another
  • Strategic fit: Automated decisions must align with brand strategy
TipVideo Analytics for Retail

For in-store video surveillance and analytics applications—including loss prevention (shoplifting detection, checkout exception monitoring), customer analytics (traffic patterns, dwell time, queue management), and operations (staffing optimization, planogram compliance)—see the Retail Loss Prevention section in Chapter 27.

31.7 Key Takeaways

  • Multi-modal product embeddings enable semantic search beyond keyword matching: Image encoders (CNN/ViT) learn visual features, text encoders (BERT) capture semantic meaning, and behavioral encoders extract implicit utility signals from co-purchase and co-view patterns, enabling discovery of products that solve similar needs even with different terminology or categories

  • Visual search transforms product discovery through style understanding: Vision models trained with metric learning can match user-uploaded photos to catalog products despite differences in lighting, angle, and background, while style disentanglement enables attribute-specific search (“this pattern but different color”) and style transfer (“jeans that match this shirt’s vibe”)

  • Embedding-based demand forecasting enables inventory optimization at scale: Product embeddings enable transfer learning where new products inherit demand patterns from similar items, solving the cold start problem, while temporal and regional embeddings capture seasonality and location-specific preferences, optimizing stock levels for millions of SKU-location-week combinations

  • Sequential embeddings power real-time customer journey analysis and hyper-personalization: LSTM/Transformer models over user action sequences learn journey stages, conversion probability, and friction points, enabling individual-level personalization that adapts content, offers, and interventions in real-time (<50ms) based on current session state rather than static demographic segments

  • Hyper-personalization operates at individual level in real-time: Unlike segment-based personalization (millennials, high-value customers), embeddings enable truly individual experiences where every customer sees personalized content, layout, pricing, and interventions based on their specific behavior patterns, current journey stage, and predicted next actions

  • Dynamic catalog management automates merchandising at scale: Graph neural networks learn product relationships (complements, substitutes, upgrades) from behavioral data, trend detection identifies emerging products before they peak, and collection generators automatically curate coherent product sets, scaling merchandising decisions across millions of SKUs

  • Retail embeddings require multi-objective optimization: Systems must balance multiple goals—conversion rate, average order value, margin, inventory turnover, customer satisfaction—rather than optimizing single metrics, requiring careful tuning of embedding losses and business rule constraints to align with strategic objectives

31.8 Looking Ahead

Part V (Industry Applications) continues with Chapter 32, which applies embeddings to manufacturing and Industry 4.0: predictive quality control through sensor embeddings that detect defects before they occur, supply chain intelligence using shipment and supplier embeddings for optimization, equipment optimization with machine embeddings that predict maintenance needs and optimize utilization, process automation using workflow embeddings to identify bottlenecks and improvement opportunities, and digital twin implementations creating virtual representations of physical assets for simulation and optimization.

31.9 Further Reading

31.9.1 Product Search and Discovery

  • Grbovic, Mihajlo, and Haibin Cheng (2018). “Real-time Personalization using Embeddings for Search Ranking at Airbnb.” KDD.
  • Covington, Paul, Jay Adams, and Emre Sargin (2016). “Deep Neural Networks for YouTube Recommendations.” RecSys.
  • Liu, Qi, et al. (2018). “Product Search Engine with Multi-modal Search Architecture.” SIGIR.
  • He, Ruining, and Julian McAuley (2016). “VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback.” AAAI.

31.9.2 Visual Search and Style

  • Kiapour, M. Hadi, et al. (2015). “Where to Buy It: Matching Street Clothing Photos in Online Shops.” ICCV.
  • Liu, Ziwei, et al. (2016). “DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations.” CVPR.
  • Hsiao, Wei-Lin, and Kristen Grauman (2018). “Creating Capsule Wardrobes from Fashion Images.” CVPR.
  • Veit, Andreas, et al. (2017). “Conditional Similarity Networks.” CVPR.

31.9.3 Demand Forecasting and Inventory

  • Ren, Kan, et al. (2019). “End-to-End Deep Learning Model for Underground Utilities Localization Using GPR.” Automation in Construction.
  • Laptev, Nikolay, et al. (2017). “Time-series Extreme Event Forecasting with Neural Networks at Uber.” ICML Workshop.
  • Salinas, David, et al. (2020). “DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.” International Journal of Forecasting.
  • Rangapuram, Syama Sundar, et al. (2018). “Deep State Space Models for Time Series Forecasting.” NeurIPS.

31.9.4 Customer Journey and Personalization

  • Beutel, Alex, et al. (2018). “Latent Cross: Making Use of Context in Recurrent Recommender Systems.” WSDM.
  • Hidasi, Balázs, et al. (2016). “Session-based Recommendations with Recurrent Neural Networks.” ICLR.
  • Chen, Xu, et al. (2019). “Sequential Recommendation with User Memory Networks.” WSDM.
  • Rendle, Steffen, Christoph Freudenthaler, and Lars Schmidt-Thieme (2010). “Factorizing Personalized Markov Chains for Next-basket Recommendation.” WWW.

31.9.5 Dynamic Catalog and Merchandising

  • McAuley, Julian, et al. (2015). “Image-based Recommendations on Styles and Substitutes.” SIGIR.
  • He, Ruining, et al. (2016). “Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering.” WWW.
  • Bai, Yang, et al. (2019). “Taxonomy-aware Multi-hop Reasoning Networks for Sequential Recommendation.” WSDM.
  • Wang, Xiang, et al. (2019). “Explainable Reasoning over Knowledge Graphs for Recommendation.” AAAI.

31.9.6 Hyper-Personalization Systems

  • Covington, Paul, Jay Adams, and Emre Sargin (2016). “Deep Neural Networks for YouTube Recommendations.” RecSys.
  • Agarwal, Deepak, et al. (2009). “Click Shaping to Optimize Multiple Objectives.” KDD.
  • Chapelle, Olivier, et al. (2015). “Simple and Scalable Response Prediction for Display Advertising.” ACM TIST.
  • Zhou, Guorui, et al. (2018). “Deep Interest Network for Click-Through Rate Prediction.” KDD.

31.9.7 Multi-Modal Learning for Retail

  • Kiapour, M. Hadi, et al. (2015). “Where to Buy It: Matching Street Clothing Photos in Online Shops.” ICCV.
  • Bell, Sean, and Kavita Bala (2015). “Learning Visual Similarity for Product Design with Convolutional Neural Networks.” SIGGRAPH.
  • Liu, Si, et al. (2012). “Hi, Magic Closet, Tell Me What to Wear!” ACM MM.
  • Shankar, Shashank, et al. (2017). “Deep Learning Based Large Scale Visual Recommendation and Search for E-Commerce.” arXiv:1703.02344.

31.9.8 Business Impact and ROI

  • Ding, Yi, et al. (2019). “Buying Intention Prediction and Analysis for E-commerce.” IEEE BigComp.
  • Kumar, V., and Werner Reinartz (2016). “Creating Enduring Customer Value.” Journal of Marketing.
  • Blattberg, Robert C., Byung-Do Kim, and Scott A. Neslin (2008). “Database Marketing: Analyzing and Managing Customers.” Springer.
  • Lemon, Katherine N., and Peter C. Verhoef (2016). “Understanding Customer Experience Throughout the Customer Journey.” Journal of Marketing.