LLM-powered Autonomous Agents
Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer, and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays, and programs; it can be framed as a powerful general problem solver.
Agent System Overview
In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:
Planning
Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.
Memory
Short-term memory: Utilizes in-context learning to learn.
Long-term memory: Provides the agent with the capability to retain and recall information over extended periods, often by leveraging an external vector store and fast retrieval.
Tool Use
The agent learns to call external APIs for extra information that is missing from the model weights, including current information, code execution capability, access to proprietary information sources, and more.
Component One: Planning
Task Decomposition: Chain of thought (CoT) and Tree of Thoughts (ToT) techniques are used to decompose complex tasks into simpler steps.
Self-Reflection: ReAct and Reflexion frameworks integrate reasoning and acting within LLM, allowing it to improve iteratively by refining past action decisions and correcting previous mistakes.
Component Two: Memory
Types of Memory:
- Sensory Memory: Retains impressions of sensory information.
- Short-Term Memory (STM): Stores information needed for complex cognitive tasks.
- Long-Term Memory (LTM): Stores information for a long time with unlimited capacity.
Maximum Inner Product Search (MIPS): Utilizes algorithms like LSH, ANNOY, HNSW, FAISS, and ScaNN for fast retrieval of stored information.
Tool Use
MRKL, TALM, and Toolformer frameworks augment LLMs with external tool APIs, enhancing their capabilities.
ChatGPT Plugins and OpenAI API function calling are practical examples of LLMs augmented with tool use capability.
HuggingGPT: Uses ChatGPT as the task planner to select models available in HuggingFace platform and summarize the response based on the execution results.
Proof-of-Concept Examples
AutoGPT and GPT-Engineer are notable projects demonstrating the potential of LLM-centered agents, despite some reliability issues.
Challenges
Finite context length, long-term planning, and task decomposition, and reliability of natural language interface are common limitations in building LLM-centered agents.
Citation: Weng, Lilian. (Jun 2023). “LLM-powered Autonomous Agents”. Lil’Log. https://lilianweng.github.io/posts/2023-06-23-agent/.
The original article: https://lilianweng.github.io/posts/2023-06-23-agent/