Preface
When it comes to the application of LLM, AI Agent is one of the hottest topics. Furthermore, there are more and more open-source projects related with ai-agent and multi-agent showing up to the public. As a result, I would like to do a small research about it.
MetaGPT
MetaGPT is a Multi-Agent framework that I have researched extensively. I rewrote the official Python version in TypeScript (MetaGPT-ts repo) in order to understand its details deeply.
The core idea of MetaGPT is to abstract an Agent as a Role, and a Role can have different Actions — these Actions are the actual functions that call the LLM or other tool functions. When different Roles need to “communicate”, they “subscribe” to Actions at creation time.
Execution happens on the Environment (or Team) object: register the Roles that should participate, then run. While the Environment is running, it publishes the messages produced by Roles in the previous batch into each Role’s buffer; in the next batch, each Role checks whether the message comes from a subscribed Action — if it does, it takes that message into account, forming “communication”.
The downsides I see: MetaGPT requires you to set upfront how many batches the Environment/Team should run. Also, after role.run, the Role throws the result back into the Environment instead of returning the action’s result, so any side-effect functions need to be defined directly inside the Action — which makes it harder to plug into an existing workflow.
CrewAI
CrewAI is somewhat similar to MetaGPT, but each Agent only handles one Task, and all Agents are registered under a Crew.
The Crew’s kickoff function is the entry point. It then organizes the agents under the crew and fills in any empty attributes. There are two process types: hierarchical and sequential.
- For sequential, tasks are executed in order (
_run_sequential_process):- run each task’s
executemethod in sequence - the task has
self.contextto remember the previous agent’s output - if defined as async, execution is handed off to a thread; if sync, it is executed directly
- the agent in the task runs
execute_task, which also fetches memory from the Crew, injects it into the prompt, and then calls the LLM
- run each task’s
- For hierarchical, there is a Boss agent whose goal is to monitor whether the agents’ actions align with the overall goal.
Overall it looks fairly similar to MetaGPT — the differences are mostly in how Agents are defined and how memory is accessed.
LangGraph
LangGraph directly abstracts the way Agents interact as a directed graph. Although it is a project under Langchain, it is not tightly coupled to LangChain — it just uses the types defined by LangChain as the interaction Interface.
That said, calling LangGraph a Multi-Agent framework feels like a stretch — it is more like a low-code library that heavily abstracts work functions. I originally wanted to dig deep into the source code, but seeing so many LangChain types throughout the LangGraph-js source, it feels like it would take more time to untangle. Leaving that for next time…
Thoughts
Today’s Multi-Agent frameworks feel more like an abstraction over real-world workflows — using prompt engineering to make the LLM play the role of a human. What differs across frameworks is how they abstract, how the details are implemented, and how Agents communicate.
That said, if there really is a business opportunity that requires abstracting a particular workflow, the engineer who understands that workflow could just call the functions and the LLM SDK directly and wire them up themselves — they probably don’t really need a Multi-Agent framework. It is similar to how Langchain wraps multiple LLM SDKs: it looks like it unifies the interface, but it also constrains how you use it and how you modify the internal prompts. LLM applications are still at an early stage, and trying to define and optimize frameworks right now feels a bit premature to me.
Compare this to how various ORMs wrap databases: because the bulk operations across relational databases can be abstracted out (with database-specific operations like locks handled as special cases), developers can easily pick up any open-source ORM library.
Alternatively, when there are one or more very strong Agent applications built on the native LLM SDK and shared patterns emerge that can be extracted into a framework — at that point, optimizing the framework would probably yield much more benefit.
ChangeLog
- 20240712-init
- 20260501–translate by claude code