Technical_Architecture
Last updated: 3/24/2025, 6:40:29 PM
AgentSociety: Technical Architecture
System Architecture Overview
AgentSociety's technical architecture is designed to support large-scale social simulations with thousands of LLM-driven agents interacting in a realistic societal environment. The architecture consists of three main components:
- Shared Services: Common infrastructure used across all simulations
- Simulation Tasks: Experiment-specific computational tasks
- GUI Component: Optional visualization and interaction interface

Shared Services
LLM API: The core intelligence behind agents, providing a standard request-response interface
- Supports public services (OpenAI, DeepSeek) or local deployment (vllm, ollama)
- Handles token management and response parsing
MQTT Server: High-performance messaging system for inter-agent communication
- Uses the emqx implementation for reliability and scalability
- Enables protocol-compliant message delivery between agents
Database: PostgreSQL database for storing simulation results
- Optimized for high-performance batch writing using COPY FROM commands
- Stores agent states, interactions, and experimental outcomes
Metric Recorder: mlflow-based system for tracking experimental metrics
- Centralized server capabilities for research collaboration
- Records key performance indicators and experimental results
Simulation Tasks
Each experiment corresponds to an Agent Simulation object that manages:
Environment Simulators: Run as subprocesses to maintain separation
- Urban environment (roads, POIs, transportation)
- Social environment (networks, interactions)
- Economic environment (firms, markets, policies)
Agent Groups: Organized as Ray actors operating in separate processes
- Each group contains multiple agents sharing client connections
- Enables distributed computing across multiple machines
- Balances communication costs with parallel acceleration
GUI Component
Backend: Connects to database and MQTT server
- Retrieves simulation data for visualization
- Processes user inputs for agent interaction
Frontend: Provides visualization and interaction interface
- Displays agent states, locations, and interactions
- Enables direct communication with agents through chat or surveys
Group-based Distributed Execution
AgentSociety addresses the challenge of scaling to thousands of agents through an innovative group-based execution model:
Agent Grouping Strategy
- Agents are evenly distributed into multiple groups
- Each group operates within a single process
- Groups share client connections to shared services
- Reduces TCP port resource consumption while maintaining agent independence
Parallel Execution
- Uses Ray framework for multi-process parallel execution
- Leverages Python's asyncio for asynchronous I/O within processes
- Enables concurrent LLM requests while maximizing CPU utilization
- Supports horizontal scaling across multiple machines

Performance Optimization
- Connection pooling for LLM API calls
- Asynchronous environment interactions
- Efficient message routing through MQTT
- Batch processing of database operations
MQTT-powered Agent Messaging System
The messaging system is a critical component enabling agent-to-agent communication and external interaction:
Topic Structure
exps/<exp_uuid>/agents/<agent_uuid>/agent-chat: For agent-to-agent messagesexps/<exp_uuid>/agents/<agent_uuid>/user-chat: For user-to-agent messagesexps/<exp_uuid>/agents/<agent_uuid>/user-survey: For structured surveys
Implementation Benefits
- Supports hundreds of thousands of connected agents
- Provides reliable message delivery with minimal resource consumption
- Enables publish/subscribe architecture for efficient message routing
- Facilitates external interaction through standardized interfaces
Performance Metrics
- Achieves 44,702 messages per second throughput
- Outperforms alternatives like RabbitMQ (23,667 msg/s)
- Provides built-in GUI tools for monitoring and debugging
Utilities and Toolbox
AgentSociety includes comprehensive utilities to support development and research:
Core Utilities
- LLM API Adapter: Supports multiple LLM providers with consistent interface
- Retry Mechanism: Automatically handles LLM API errors
- JSON Parser: Processes structured responses from LLMs
- Metric Recorder: Tracks statistical metrics during experiments
- Logging and Saving: Archives simulation data in AVRO format and PostgreSQL
Social Science Toolbox
Intervention Tools:
- Agent Configuration: Modifies internal settings before simulation
- State Manipulation: Alters agent states during simulation
- Message Notification: Sends external stimuli to agents
Interview System:
- Enables direct questioning of agents
- Processes responses without interrupting ongoing actions
- Distributes questions via MQTT messaging
Survey System:
- Distributes structured questionnaires to agents
- Collects formatted responses for analysis
- Supports various response formats (multiple-choice, ranking, etc.)
Performance Evaluation
Comprehensive performance testing reveals AgentSociety's capabilities and limitations:
Environment Performance
- Successfully handles 1,000,000 agents with minimal degradation
- Mean time per simulation step scales efficiently with agent count:
- 10³ agents: 8.578×10⁻³ seconds
- 10⁶ agents: 0.1680 seconds
Messaging System Performance
- MQTT achieves 44,702 messages per second
- Redis Pub/Sub: 81,216 messages per second (higher throughput but lacks built-in tools)
- RabbitMQ: 23,667 messages per second
Overall System Performance
- Successfully simulates 10,000+ agents with realistic behaviors
- LLM API calls remain the primary bottleneck
- Parallel execution significantly improves performance:
- 10⁴ agents, 8 processes: 5,681 seconds per round
- 10⁴ agents, 32 processes: 458 seconds per round
Technical Challenges and Solutions
Challenge: TCP Port Exhaustion
Problem: Individual agent processes would exhaust available TCP ports (65,535 limit)
Solution: Group-based execution with connection sharing
- Multiple agents operate within single processes
- Shared client connections to services
- Maintains agent independence through asynchronous execution
Challenge: LLM API Latency
Problem: LLM API calls introduce significant latency
Solution: Asynchronous execution and parallelization
- Concurrent LLM requests through asyncio
- Multi-process execution through Ray
- Connection pooling for efficient resource utilization
Challenge: Inter-agent Communication
Problem: Efficient message routing between thousands of agents
Solution: MQTT-based messaging system
- Lightweight publish/subscribe architecture
- Topic-based routing for efficient delivery
- Scalable to hundreds of thousands of connections
Challenge: Data Management
Problem: Storing and analyzing massive simulation data
Solution: Hybrid storage approach
- PostgreSQL for structured data with COPY FROM optimization
- AVRO format for local file storage
- mlflow for metric tracking and experiment comparison
Conclusion
AgentSociety's technical architecture represents a significant advancement in large-scale social simulation, addressing key challenges in scalability, communication, and computational efficiency. By leveraging distributed computing, asynchronous execution, and efficient messaging, the platform enables unprecedented scale and realism in agent-based social modeling.
The integration of sophisticated LLM-driven agents with a realistic societal environment and powerful simulation engine opens new possibilities for social science research, policy evaluation, and complex system modeling. As computational resources and LLM capabilities continue to advance, this architecture provides a foundation for even more ambitious simulations of human society.