Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Embedding-Based Anomaly Detection for Observability

A comprehensive 9-part tutorial series on building production-ready anomaly detection systems using ResNet embeddings for OCSF (Open Cybersecurity Schema Framework) observability data.

What you’ll learn: How to build, train, and deploy a custom embedding model (TabularResNet) specifically designed for OCSF observability data. This model transforms observability logs and system metrics into vector representations. Anomaly detection happens entirely through vector database similarity search—no separate detection model needed. The system processes streaming OCSF events in near real-time to automatically identify unusual behavior.


Series Overview

This tutorial series takes you from ResNet fundamentals to deploying and monitoring a complete anomaly detection system in production. You’ll learn how to:

Target Audience: ML engineers, operations engineers, and data scientists working with observability data

Applicability: While this series uses OCSF observability logs as the running example, the TabularResNet embedding approach applies to any structured observability data:

The key requirement: Your data can be represented as rows with categorical and numerical features. If you can create a pandas DataFrame from your data, you can use this approach.

Prerequisites:

Key Terms (explained in detail throughout the series):

Why OCSF?

Without OCSF, you would need separate models for each log format:

With OCSF, all sources map to the same schema (class_uid, activity_id, actor.user.name), enabling one embedding model to work across all OCSF-compliant sources.


Tutorial Series

Part 1: Understanding ResNet Architecture

Learn the core concepts behind Residual Networks:

  • The degradation problem in deep networks

  • Skip connections and why they work

  • Gradient flow visualization

  • Architecture patterns (basic and bottleneck blocks)

Foundation · 35 min read

Part 2: Adapting ResNet for Tabular Data

Adapt ResNet for observability data:

  • Replace convolutions with linear layers

  • Categorical embeddings for high-cardinality features

  • Complete TabularResNet implementation

  • Design considerations for OCSF data

Architecture · 30 min read

Part 3: Feature Engineering for OCSF Data

Transform OCSF JSON to model input:

  • Flattening nested JSON structures

  • Temporal and derived features

  • Aggregation and rolling windows

  • High cardinality handling

  • End-to-end feature pipeline

Data Engineering · 40 min read

Part 4: Self-Supervised Training

Train on unlabelled data:

  • Masked Feature Prediction (MFP)

  • Contrastive learning with augmentation

  • Complete training pipeline

  • Hyperparameter tuning strategies

Training · 35 min read

Part 5: Evaluating Embedding Quality

Validate embedding quality before deployment:

  • t-SNE and UMAP visualization

  • Cluster quality metrics (Silhouette, Davies-Bouldin)

  • Embedding robustness testing

  • Production readiness checklist

Verification · 30 min read

Part 6: Anomaly Detection Methods

Apply detection algorithms:

  • Local Outlier Factor (LOF)

  • Isolation Forest

  • Distance-based methods

  • Sequence anomaly detection (LSTMs)

  • Method comparison framework

Detection · 40 min read

Part 7: Production Deployment

Deploy to production:

  • REST API with FastAPI

  • Docker containerization

  • Model versioning with MLflow

  • A/B testing framework

  • Real-time vs batch inference

Deployment · 45 min read

Part 8: Production Monitoring

Monitor and maintain the system:

  • Embedding drift detection

  • Alert quality metrics

  • Automated retraining triggers

  • Incident response tools

  • Cost optimization

Monitoring · 35 min read

Part 9: Multi-Source Correlation

Extend to multiple data sources for root cause analysis:

  • Training separate models for logs, metrics, traces, config

  • Unified vector database with metadata tags

  • Temporal correlation across sources

  • Causal graph construction

  • Automated root cause ranking

Advanced · 50 min read

Complete Series

Total: ~6 hours of comprehensive, hands-on content

All code examples are executable and production-ready.


Appendices


What You’ll Build

By the end of this series, you’ll have:

  1. Custom TabularResNet Embedding Model: Trained from scratch on your OCSF data using self-supervised learning

  2. Embedding Service: FastAPI REST API that serves the custom TabularResNet model, generating embeddings for OCSF events via HTTP requests

  3. Vector Database: Stores embeddings and performs k-NN similarity search at scale

  4. Vector-Based Anomaly Detection: Detection through pure vector DB operations (k-NN distance, density)—no classical DL detection model

  5. Monitoring & Alerting: Track embedding drift, detection quality, and system health

  6. Automated Retraining: Triggers retraining of the custom embedding model based on drift and performance degradation

Optional Extension (Part 9): For advanced production deployments, extend the system to correlate anomalies across multiple observability data sources (logs, metrics, traces, configuration changes) for automated root cause analysis.

System Architecture

This diagram shows the complete end-to-end system you’ll build. OCSF events stream in near real-time through the following pipeline:

  1. Preprocessing: Extract and normalize features from each OCSF event

  2. Embedding generation: TabularResNet (the only ML model) generates a vector for each event

  3. Vector DB storage: Embeddings are indexed for fast k-NN similarity search

  4. Anomaly scoring: Simple code logic computes scores using vector DB distances—NOT a separate ML model, just threshold-based calculations

  5. Alerting: Trigger alerts for high-scoring anomalies

The monitoring components (shown in red/purple) continuously track embedding drift and system health, triggering automatic retraining of the embedding model when needed.

Key architectural point:

Diagram legend:


Key Concepts

Why ResNet for Tabular Data?

Research by Gorishniy et al. (2021) found that ResNet:

Why Embeddings for Anomaly Detection?

Embeddings compress high-dimensional OCSF data (300+ fields) into dense vectors that:

Why a Vector Database?

A vector database makes similarity search the central mechanism for anomaly detection by:


Code Repository

All code from this series is available in executable notebooks. Each part includes:


Prerequisites


Get Started

Ready to build your anomaly detection system? Start with Part 1: Understanding ResNet Architecture!

Want to jump straight to hands-on code? See Appendix: Notebooks & Sample Data to download notebooks and sample data.


Further Reading

For deeper understanding of embedding concepts and vector databases used in this series:


References