Technical_Architecture

Technical_Architecture

Last updated: 3/24/2025, 6:40:29 PM

AgentSociety: Technical Architecture

System Architecture Overview

AgentSociety's technical architecture is designed to support large-scale social simulations with thousands of LLM-driven agents interacting in a realistic societal environment. The architecture consists of three main components:

  1. Shared Services: Common infrastructure used across all simulations
  2. Simulation Tasks: Experiment-specific computational tasks
  3. GUI Component: Optional visualization and interaction interface

System Architecture

Shared Services

  • LLM API: The core intelligence behind agents, providing a standard request-response interface

    • Supports public services (OpenAI, DeepSeek) or local deployment (vllm, ollama)
    • Handles token management and response parsing
  • MQTT Server: High-performance messaging system for inter-agent communication

    • Uses the emqx implementation for reliability and scalability
    • Enables protocol-compliant message delivery between agents
  • Database: PostgreSQL database for storing simulation results

    • Optimized for high-performance batch writing using COPY FROM commands
    • Stores agent states, interactions, and experimental outcomes
  • Metric Recorder: mlflow-based system for tracking experimental metrics

    • Centralized server capabilities for research collaboration
    • Records key performance indicators and experimental results

Simulation Tasks

Each experiment corresponds to an Agent Simulation object that manages:

  • Environment Simulators: Run as subprocesses to maintain separation

    • Urban environment (roads, POIs, transportation)
    • Social environment (networks, interactions)
    • Economic environment (firms, markets, policies)
  • Agent Groups: Organized as Ray actors operating in separate processes

    • Each group contains multiple agents sharing client connections
    • Enables distributed computing across multiple machines
    • Balances communication costs with parallel acceleration

GUI Component

  • Backend: Connects to database and MQTT server

    • Retrieves simulation data for visualization
    • Processes user inputs for agent interaction
  • Frontend: Provides visualization and interaction interface

    • Displays agent states, locations, and interactions
    • Enables direct communication with agents through chat or surveys

Group-based Distributed Execution

AgentSociety addresses the challenge of scaling to thousands of agents through an innovative group-based execution model:

Agent Grouping Strategy

  • Agents are evenly distributed into multiple groups
  • Each group operates within a single process
  • Groups share client connections to shared services
  • Reduces TCP port resource consumption while maintaining agent independence

Parallel Execution

  • Uses Ray framework for multi-process parallel execution
  • Leverages Python's asyncio for asynchronous I/O within processes
  • Enables concurrent LLM requests while maximizing CPU utilization
  • Supports horizontal scaling across multiple machines

Parallel Execution

Performance Optimization

  • Connection pooling for LLM API calls
  • Asynchronous environment interactions
  • Efficient message routing through MQTT
  • Batch processing of database operations

MQTT-powered Agent Messaging System

The messaging system is a critical component enabling agent-to-agent communication and external interaction:

Topic Structure

  • exps/<exp_uuid>/agents/<agent_uuid>/agent-chat: For agent-to-agent messages
  • exps/<exp_uuid>/agents/<agent_uuid>/user-chat: For user-to-agent messages
  • exps/<exp_uuid>/agents/<agent_uuid>/user-survey: For structured surveys

Implementation Benefits

  • Supports hundreds of thousands of connected agents
  • Provides reliable message delivery with minimal resource consumption
  • Enables publish/subscribe architecture for efficient message routing
  • Facilitates external interaction through standardized interfaces

Performance Metrics

  • Achieves 44,702 messages per second throughput
  • Outperforms alternatives like RabbitMQ (23,667 msg/s)
  • Provides built-in GUI tools for monitoring and debugging

Utilities and Toolbox

AgentSociety includes comprehensive utilities to support development and research:

Core Utilities

  • LLM API Adapter: Supports multiple LLM providers with consistent interface
  • Retry Mechanism: Automatically handles LLM API errors
  • JSON Parser: Processes structured responses from LLMs
  • Metric Recorder: Tracks statistical metrics during experiments
  • Logging and Saving: Archives simulation data in AVRO format and PostgreSQL

Social Science Toolbox

  • Intervention Tools:

    • Agent Configuration: Modifies internal settings before simulation
    • State Manipulation: Alters agent states during simulation
    • Message Notification: Sends external stimuli to agents
  • Interview System:

    • Enables direct questioning of agents
    • Processes responses without interrupting ongoing actions
    • Distributes questions via MQTT messaging
  • Survey System:

    • Distributes structured questionnaires to agents
    • Collects formatted responses for analysis
    • Supports various response formats (multiple-choice, ranking, etc.)

Performance Evaluation

Comprehensive performance testing reveals AgentSociety's capabilities and limitations:

Environment Performance

  • Successfully handles 1,000,000 agents with minimal degradation
  • Mean time per simulation step scales efficiently with agent count:
    • 10³ agents: 8.578×10⁻³ seconds
    • 10⁶ agents: 0.1680 seconds

Messaging System Performance

  • MQTT achieves 44,702 messages per second
  • Redis Pub/Sub: 81,216 messages per second (higher throughput but lacks built-in tools)
  • RabbitMQ: 23,667 messages per second

Overall System Performance

  • Successfully simulates 10,000+ agents with realistic behaviors
  • LLM API calls remain the primary bottleneck
  • Parallel execution significantly improves performance:
    • 10⁴ agents, 8 processes: 5,681 seconds per round
    • 10⁴ agents, 32 processes: 458 seconds per round

Technical Challenges and Solutions

Challenge: TCP Port Exhaustion

Problem: Individual agent processes would exhaust available TCP ports (65,535 limit)

Solution: Group-based execution with connection sharing

  • Multiple agents operate within single processes
  • Shared client connections to services
  • Maintains agent independence through asynchronous execution

Challenge: LLM API Latency

Problem: LLM API calls introduce significant latency

Solution: Asynchronous execution and parallelization

  • Concurrent LLM requests through asyncio
  • Multi-process execution through Ray
  • Connection pooling for efficient resource utilization

Challenge: Inter-agent Communication

Problem: Efficient message routing between thousands of agents

Solution: MQTT-based messaging system

  • Lightweight publish/subscribe architecture
  • Topic-based routing for efficient delivery
  • Scalable to hundreds of thousands of connections

Challenge: Data Management

Problem: Storing and analyzing massive simulation data

Solution: Hybrid storage approach

  • PostgreSQL for structured data with COPY FROM optimization
  • AVRO format for local file storage
  • mlflow for metric tracking and experiment comparison

Conclusion

AgentSociety's technical architecture represents a significant advancement in large-scale social simulation, addressing key challenges in scalability, communication, and computational efficiency. By leveraging distributed computing, asynchronous execution, and efficient messaging, the platform enables unprecedented scale and realism in agent-based social modeling.

The integration of sophisticated LLM-driven agents with a realistic societal environment and powerful simulation engine opens new possibilities for social science research, policy evaluation, and complex system modeling. As computational resources and LLM capabilities continue to advance, this architecture provides a foundation for even more ambitious simulations of human society.