← Back to Projects
May 6, 2026 · Research proposal

Memory-augmented Multimodal RAG for Long-horizon Agents

A research-oriented project direction connecting memory, multimodal retrieval, and reliable long-horizon interaction.

Role: Research direction and proposal development

Confidentiality: Public research direction; no internal company data included.

MemoryMultimodal RAGLong-horizon AgentEvaluation

Motivation

Current agents often lose context when tasks unfold across many steps. This becomes harder when the interaction contains text, screenshots, documents, audio, or video.

Research Question

How can an agent remember useful information, retrieve relevant multimodal context, and complete long-horizon tasks without drifting?

Possible Method

A possible system may include:

  • Episodic memory for interaction history
  • Semantic memory for reusable knowledge
  • Multimodal retrieval over documents and media
  • A controller that decides what to remember and what to ignore
  • Evaluation over task trajectories instead of single-turn answers

Why It Matters

This direction connects industrial agent problems with research questions in memory, retrieval, multimodal learning, and reliable reasoning.