Defense and intelligence organizations face unique challenges: processing vast streams of multi-source data under time pressure, identifying threats in adversarial environments, and making high-stakes decisions with incomplete information. This chapter applies embeddings to national security applications: geospatial intelligence using satellite and aerial imagery embeddings for object detection, change monitoring, and activity pattern recognition across global areas of interest, signals intelligence with embeddings for communication analysis, entity resolution, and pattern discovery in intercepted data, open-source intelligence aggregating and analyzing public information from news, social media, and technical sources at scale, cybersecurity and threat intelligence using behavioral embeddings for intrusion detection, malware classification, and threat actor attribution, autonomous systems leveraging embeddings for perception, navigation, and coordinated operations, and command and control decision support synthesizing multi-source intelligence into actionable insights for commanders. These techniques transform intelligence analysis from manual review to automated pattern recognition while maintaining human oversight for critical decisions.
After exploring scientific computing applications (Chapter 34), embeddings enable defense and intelligence transformation at unprecedented scale. Traditional intelligence analysis relies on human analysts reviewing individual reports, images, and signals—an approach overwhelmed by modern data volumes. Embedding-based intelligence systems represent diverse data sources in unified vector spaces, enabling automated triage, pattern discovery across sources, and rapid response to emerging threats while augmenting rather than replacing human judgment.
35.1 Geospatial Intelligence (GEOINT)
Geospatial intelligence encompasses satellite imagery, aerial photography, and geographic data for monitoring activities, tracking changes, and understanding terrain. Embedding-based GEOINT enables automated analysis of imagery at global scale.
35.1.1 The GEOINT Challenge
Traditional geospatial analysis faces limitations:
Data volume: Commercial satellites generate terabytes daily; analysts cannot review all imagery
Revisit frequency: Daily global coverage requires automated change detection
Object diversity: Must detect vehicles, structures, vessels, aircraft across varied terrain
Camouflage and denial: Adversaries actively conceal activities
Multi-sensor fusion: Combining optical, radar, infrared, and hyperspectral data
Embedding approach: Learn representations of geographic regions from multi-modal imagery. Similar scenes cluster together; changes manifest as embedding drift. Enable rapid search across global imagery archives.
Human-in-the-loop: Analyst verification of automated detections
35.2 Signals Intelligence (SIGINT)
Signals intelligence involves collecting and analyzing electronic communications and emissions. Embedding-based SIGINT enables automated processing of communications for entity resolution, topic discovery, and pattern analysis.
35.2.1 The SIGINT Challenge
Traditional signals analysis faces limitations:
Volume: Billions of communications daily exceed human review capacity
Languages: Content spans hundreds of languages and dialects
Encryption: Increasing use of encryption limits content access
Entity resolution: Linking identities across platforms and time
Timeliness: Intelligence value decays rapidly
Embedding approach: Learn representations of communications that capture semantic content, behavioral patterns, and network relationships. Similar communications cluster together; entity embeddings link identities across sources.
Multilingual embeddings: Unified representation across languages
Domain adaptation: Fine-tune on intelligence-relevant vocabulary
Named entity recognition: Extract persons, organizations, locations
Coreference resolution: Link mentions across documents
Translation-invariant: Similar content similar regardless of language
Entity resolution:
Multi-source fusion: Link identities across platforms
Temporal consistency: Track entities over time
Behavioral signatures: Distinguish entities with similar names
Graph embeddings: Capture network position and relationships
Uncertainty quantification: Confidence in identity linkages
Pattern analysis:
Topic modeling: Discover themes in communication streams
Anomaly detection: Identify unusual communication patterns
Trend detection: Track emerging topics and concerns
Sentiment analysis: Gauge intent and emotional state
Network analysis: Map communication networks and hierarchies
Operational:
Real-time processing: Sub-second latency for time-sensitive intelligence
Scalability: Handle billions of communications
Privacy controls: Minimize collection on protected communications
Audit logging: Complete records of queries and access
35.3 Open-Source Intelligence (OSINT)
Open-source intelligence leverages publicly available information from news, social media, academic publications, and technical sources. Embedding-based OSINT enables comprehensive monitoring and analysis of the public information environment.
35.3.1 The OSINT Challenge
Traditional open-source analysis faces limitations:
Information overload: Millions of relevant sources publishing continuously
Verification: Distinguishing reliable from unreliable sources
Synthesis: Connecting fragments across disparate sources
Foreign language: Important sources in dozens of languages
Multimedia: Images, video, and audio alongside text
Embedding approach: Learn unified representations of documents, images, and videos from public sources. Enable semantic search across all modalities, cluster related content, and identify coordinated information operations.
Cyber defense requires detecting intrusions, analyzing malware, and attributing attacks. Embedding-based cybersecurity enables behavioral detection, malware family classification, and threat actor profiling.
35.4.1 The Cybersecurity Challenge
Traditional cyber defense faces limitations:
Signature evasion: Attackers modify malware to evade detection
Zero-day attacks: No signatures for novel vulnerabilities
Alert fatigue: Security teams overwhelmed by false positives
Attribution: Linking attacks to threat actors is difficult
Speed: Attackers move faster than manual analysis
Embedding approach: Learn behavioral representations of network traffic, system activity, and malware that capture attack patterns. Similar attacks cluster together; novel attacks appear as anomalies. Enable attribution through technique and infrastructure embeddings.
TTP extraction: Map attacks to MITRE ATT&CK framework
Infrastructure tracking: Link C2 servers, domains, IPs
Actor profiling: Characterize threat actor capabilities and intent
Campaign correlation: Link related attacks across time
Predictive: Anticipate actor next moves
Operations:
Real-time detection: Sub-second alerting on threats
Automated response: Containment actions for confirmed threats
False positive reduction: Minimize analyst burden
Integration: Connect to SIEM, SOAR, threat feeds
35.5 Autonomous Systems
Defense autonomous systems include unmanned vehicles (air, ground, maritime), robotics, and semi-autonomous weapons. Embedding-based autonomy enables perception, navigation, and multi-agent coordination.
35.5.1 The Autonomous Systems Challenge
Traditional autonomy faces limitations:
Perception: Robust sensing in degraded/contested environments
Navigation: GPS-denied and dynamic environments
Coordination: Multi-agent collaboration and deconfliction
Adversarial: Resilience to jamming, spoofing, deception
Trust: Human confidence in autonomous decisions
Embedding approach: Learn representations of scenes, terrain, and mission context that enable robust perception and planning. Similar situations map to similar actions; novel situations trigger human oversight.
Dynamic environments: Avoid moving obstacles, adapt to changes
Semantic mapping: Understand scene meaning, not just geometry
Long-range planning: Hierarchical planning at multiple scales
Contingency: Fallback behaviors when primary fails
Multi-agent:
Communication-limited: Function with intermittent connectivity
Decentralized coordination: No single point of failure
Task allocation: Distribute missions across heterogeneous platforms
Deconfliction: Avoid collisions and interference
Human teaming: Seamless handoff between autonomous and manned
Safety:
Behavior bounds: Constrain actions to safe envelope
Monitoring: Continuous assessment of system health
Graceful degradation: Safe behavior as capabilities reduce
Human override: Operator can always intervene
Verification: Formal methods for safety-critical behaviors
35.6 Command and Decision Support
Command and control requires synthesizing intelligence from multiple sources to support decisions under uncertainty and time pressure. Embedding-based decision support aggregates information, identifies options, and presents relevant precedents.
35.6.1 The Decision Support Challenge
Traditional command support faces limitations:
Information overload: Commanders overwhelmed by data
Synthesis: Integrating intelligence from diverse sources
Timeliness: Decisions needed before complete information
Uncertainty: Acting under ambiguity and fog of war
Precedent: Learning from historical situations
Embedding approach: Learn representations of situations that capture operationally relevant features. Similar situations map to similar successful responses; enable rapid retrieval of relevant precedents and courses of action.
Drill-down: Enable exploration of supporting evidence
Collaboration: Share assessments across echelons
Human factors:
Cognitive load: Minimize information overload
Trust calibration: Appropriate confidence in AI recommendations
Explainability: Justify recommendations with evidence
Override: Human decision authority always preserved
Training: Familiarize operators before high-stakes use
WarningEthical Considerations
Defense applications of embeddings raise significant ethical considerations:
Lethal autonomy:
Humans must remain in the loop for lethal decisions
Embeddings for targeting require extensive verification
Fail-safe defaults when uncertainty is high
Clear accountability chains for all decisions
Surveillance:
Collection must comply with legal authorities
Minimize impact on protected populations
Implement access controls and audit trails
Regular oversight and policy review
Adversarial use:
Techniques can be used by adversaries
Defensive applications also enable offense
Responsible disclosure of vulnerabilities
International norms and arms control considerations
Bias and fairness:
Training data may embed historical biases
Evaluate performance across populations
Regular audits for discriminatory impacts
Human review of high-stakes decisions
Dual use:
Same techniques apply to civilian and military
Consider proliferation implications
Export controls on sensitive capabilities
Academic-government research partnerships
TipVideo Surveillance Analytics
For video-based security applications—including perimeter monitoring, crowd analytics, incident detection, person re-identification, and forensic video search—see the techniques covered in Chapter 27.
35.7 Key Takeaways
Note
The performance figures below are illustrative based on published research and hypothetical scenarios. They represent achievable improvements but are not verified results from specific operational systems.
GEOINT at global scale requires automated analysis: Object detection models achieve 90%+ accuracy on military vehicles and infrastructure, change detection identifies facility activity patterns over time, and embedding-based search enables rapid retrieval across petabyte imagery archives—transforming satellite imagery from periodic review to continuous monitoring
SIGINT benefits from behavioral and semantic embeddings: Multilingual embeddings enable cross-language analysis without translation, entity resolution links identities across platforms with 85%+ precision, and pattern analysis discovers topics and networks in communication streams—handling billions of messages that exceed human review capacity
OSINT at scale requires multi-modal embeddings: Unified representations enable search across text, images, and video in any language, influence detection identifies coordinated campaigns through behavioral clustering, and verification tools assess source credibility and detect manipulated media
Cybersecurity shifts from signatures to behaviors: Behavioral embeddings detect novel attacks without prior signatures, malware family clustering enables rapid triage of new samples, and threat actor profiling supports attribution through technique and infrastructure analysis—reducing detection time from days to seconds
Autonomous systems require robust perception embeddings: Multi-sensor fusion provides reliable perception in degraded conditions, GPS-denied navigation uses learned terrain representations, and multi-agent coordination scales through distributed embeddings—enabling operations in contested environments
Decision support synthesizes multi-source intelligence: Situation embeddings capture operationally relevant features across GEOINT, SIGINT, and OSINT, precedent retrieval surfaces relevant historical cases, and risk assessment quantifies uncertainty—augmenting commander judgment without replacing human authority
Defense applications require exceptional verification: Higher stakes demand more rigorous testing, adversarial robustness is essential, human oversight must be preserved for critical decisions, and ethical considerations constrain acceptable applications
35.8 Looking Ahead
Part VI (Future-Proofing & Optimization) begins with Chapter 36, which covers performance optimization for embedding systems: hardware acceleration strategies including GPU clusters, TPUs, and specialized inference chips, memory optimization techniques for billion-parameter models, latency reduction for real-time applications, throughput scaling for batch processing, and cost optimization balancing quality against infrastructure spend.
35.9 Further Reading
35.9.1 Geospatial Intelligence
Shermeyer, Jacob, et al. (2020). “SpaceNet 6: Multi-Sensor All Weather Mapping Dataset.” CVPR Workshops.
Christie, Gordon, et al. (2018). “Functional Map of the World.” CVPR.
Gupta, Ritwik, et al. (2019). “xBD: A Dataset for Assessing Building Damage from Satellite Imagery.” CVPR Workshops.
Van Etten, Adam, et al. (2019). “SpaceNet MVOI: A Multi-View Overhead Imagery Dataset.” ICCV.
Mundhenk, T. Nathan, et al. (2016). “A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning.” ECCV.
35.9.2 Signals Intelligence and Communications
Conneau, Alexis, et al. (2020). “Unsupervised Cross-lingual Representation Learning at Scale.” ACL.
Artetxe, Mikel, and Holger Schwenk (2019). “Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer.” TACL.
Mudrakarta, Pramod Kaushik, et al. (2018). “It Was the Training Data Pruning Too!” EMNLP.
Lample, Guillaume, et al. (2018). “Word Translation Without Parallel Data.” ICLR.
Huang, Haoyang, et al. (2019). “Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks.” EMNLP.
35.9.3 Open-Source Intelligence
Starbird, Kate, et al. (2019). “Disinformation as Collaborative Work.” CSCW.
Wardle, Claire, and Hossein Derakhshan (2017). “Information Disorder: Toward an Interdisciplinary Framework for Research and Policy Making.” Council of Europe.
Horne, Benjamin D., and Sibel Adali (2017). “This Just In: Fake News Packs a Lot in Title.” AAAI Workshop.
Shu, Kai, et al. (2017). “Fake News Detection on Social Media: A Data Mining Perspective.” ACM SIGKDD Explorations.
Nguyen, Dong, et al. (2020). “FANG: Leveraging Social Context for Fake News Detection Using Graph Representation.” CIKM.
35.9.4 Cybersecurity
Mirsky, Yisroel, et al. (2018). “Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection.” NDSS.
Raff, Edward, et al. (2018). “Malware Detection by Eating a Whole EXE.” AAAI Workshops.
Saxe, Joshua, and Konstantin Berlin (2015). “Deep Neural Network Based Malware Detection Using Two Dimensional Binary Program Features.” MALWARE.
Milajerdi, Sadegh M., et al. (2019). “HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows.” IEEE S&P.
Rosenberg, Ishai, et al. (2018). “Generic Black-Box End-to-End Attack Against State of the Art API Call Based Malware Classifiers.” RAID.
35.9.5 Autonomous Systems
Bojarski, Mariusz, et al. (2016). “End to End Learning for Self-Driving Cars.” arXiv:1604.07316.
Sadeghi, Fereshteh, and Sergey Levine (2017). “CAD2RL: Real Single-Image Flight without a Single Real Image.” RSS.
Chen, Yilun, et al. (2020). “LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention.” CVPR.
Loquercio, Antonio, et al. (2021). “Learning High-Speed Flight in the Wild.” Science Robotics.
Zhou, Brady, and Philipp Krähenbühl (2022). “Cross-view Transformers for Real-time Map-view Semantic Segmentation.” CVPR.
35.9.6 Decision Support and Multi-Source Fusion
Steinberg, Alan N., Christopher L. Bowman, and Franklin E. White (1999). “Revisions to the JDL Data Fusion Model.” SPIE.
Llinas, James, and David L. Hall (2009). “An Introduction to Multi-Sensor Data Fusion.” ISIF.
Castanedo, Federico (2013). “A Review of Data Fusion Techniques.” The Scientific World Journal.
Khaleghi, Bahador, et al. (2013). “Multisensor Data Fusion: A Review of the State-of-the-Art.” Information Fusion.
Rogova, Galina L., and Eugene Bosse (2010). “Information Quality in Information Fusion.” FUSION.
35.9.7 Ethics and Policy
Scharre, Paul (2018). “Army of None: Autonomous Weapons and the Future of War.” W.W. Norton.
Horowitz, Michael C. (2019). “When Speed Kills: Lethal Autonomous Weapon Systems, Deterrence and Stability.” Journal of Strategic Studies.
Altmann, Jürgen, and Frank Sauer (2017). “Autonomous Weapon Systems and Strategic Stability.” Survival.
Boulanin, Vincent, and Maaike Verbruggen (2017). “Mapping the Development of Autonomy in Weapon Systems.” SIPRI.
Roff, Heather M., and David Danks (2018). “‘Trust but Verify’: The Difficulty of Trusting Autonomous Weapons Systems.” Journal of Military Ethics.