Memory-augmented Multimodal RAG for Long-horizon Agents
A research-oriented project direction connecting memory, multimodal retrieval, and reliable long-horizon interaction.
Role: Research direction and proposal development
Confidentiality: Public research direction; no internal company data included.
Motivation
Current agents often lose context when tasks unfold across many steps. This becomes harder when the interaction contains text, screenshots, documents, audio, or video.
Research Question
How can an agent remember useful information, retrieve relevant multimodal context, and complete long-horizon tasks without drifting?
Possible Method
A possible system may include:
- Episodic memory for interaction history
- Semantic memory for reusable knowledge
- Multimodal retrieval over documents and media
- A controller that decides what to remember and what to ignore
- Evaluation over task trajectories instead of single-turn answers
Why It Matters
This direction connects industrial agent problems with research questions in memory, retrieval, multimodal learning, and reliable reasoning.