AgentSim - Visual Walkthrough Guide

1

Getting Started

Set up your environment and take your first tour of AgentSim. These five steps get you from zero to exploring Module 1 in under five minutes.

Step 1 of 5

Start the server

Open a terminal, navigate to the AgentSim directory, and start a local web server on port 8090.

Terminal -- zsh

~/Documents/Claude/AgentSim $ python3 -m http.server 8090 Serving HTTP on :: port 8090 (http://[::]:8090/) ...

Leave this terminal running. Open a new tab for other commands. The server serves all AgentSim pages.

Step 2 of 5

Open the Training Portal

Navigate to http://localhost:8090/training.html in your browser. This is your home base.

localhost:8090/training.html

AgentSim Training Academy

Master AI agent deployment in a realistic banking environment. 8 modules, hands-on labs, real scenarios.

Start Module 1 Open Simulator

Progress0 / 8 Modules

1

2

3

4

1 Start Module 1 -- begins the guided curriculum from the top

2 Open Simulator -- jump directly into the hands-on terminal

3 Progress bar -- tracks how many modules you have completed (currently 0%)

4 Sidebar navigation -- click any module to load its theory and lab

Step 3 of 5

Choose your first module

Click "Module 1: Enterprise Environment" in the sidebar. It highlights to show your selection.

localhost:8090/training.html

Theory Lab Quiz

Module 1: Enterprise Environment

Understanding the Frontier Community Bank technology landscape, organizational structure, and the systems that agents will interact with.

1

1 Active module -- the blue highlight and left border confirm your selection

Step 4 of 5

Read the theory

The Theory tab loads the module README. Scroll through to understand the concepts before starting the lab.

localhost:8090/training.html -- Module 1 Theory

Theory Lab

The Enterprise Environment

Key Concepts

Frontier Community Bank -- $2.5B mid-market bank, 350 employees
14 branches across the Southeast US
Core banking: Jack Henry Symitar (Episys)
IT: ServiceNow ITSM, Dynatrace, CrowdStrike

Organizational Structure

The bank has 6 departments organized under the CEO...

Learning Objectives: Understand data file layout, ID conventions, and system dependencies.

1

2

3

1 Key Concepts -- the essential facts about Frontier Community Bank's environment

2 Theory / Lab tabs -- read Theory first, then switch to Lab when ready

3 Learning Objectives -- highlighted box at the end summarizes what you should know

Step 5 of 5

Go to the lab

At the bottom of the theory, a call-to-action directs you to the hands-on lab in the simulator.

localhost:8090/training.html -- Module 1 (bottom)

Ready to practice?

You have finished the theory for Module 1. Open the simulator to complete the hands-on lab.

Next Step: Complete the Lab

1

1 Complete the Lab -- clicking this opens simulator.html pre-loaded with the Module 1 lab scenario

You can also open the simulator directly from the top nav at any time.

Section 2: Using the Simulator →

2

Using the Simulator

The simulator is a browser-based terminal that recreates an agent's working environment. You type commands, explore data files, and run simulated AI agents -- all without touching production systems.

Step 1 of 6

Welcome screen

On first load, the simulator shows a welcome overlay. Choose between Guided mode (step-by-step instructions) or Free mode (open sandbox).

localhost:8090/simulator.html

AgentSim Simulator

Frontier Community Bank -- Agent Training Environment

📖

Guided Mode

Step-by-step
instructions

🔧

Free Mode

Open sandbox
exploration

1

2

1 Guided Mode -- recommended for first-time users. Provides step-by-step lab instructions in the right pane.

2 Free Mode -- open sandbox for experienced users who want to explore without guardrails.

Step 2 of 6

Guided mode layout

After choosing Guided mode, you get a split-pane view: terminal on the left, step instructions on the right.

localhost:8090/simulator.html -- Guided Mode

bizsim $ _|

Type commands here. Try: ls, cat, help

Step 1: Explore the data directory

Use the ls command to see what files are available in the AgentSim environment.

              ls data/
              Copy to Terminal
            

Step 1 of 4

1

2

3

4

1 Terminal -- type commands here just like a real terminal. Supports ls, cat, help, claude, and more.

2 Step instructions -- the right pane tells you exactly what to do and why.

3 Copy to Terminal -- click to auto-paste the command into the terminal. Useful for long commands.

4 Step progress dots -- shows how far you are through the current lab.

Step 3 of 6

Type a command

Type ls in the terminal and press Enter. The simulator shows the AgentSim directory listing.

localhost:8090/simulator.html

bizsim $ ls agents/ data/ manifest.json scripts/ CLAUDE.md docs/ schemas/ state/ bizsim $ ls data/ branches.json employees.json risks.json budgets.json incidents.json systems.json compliance.json network.json tickets.json departments.json org-chart.json vendors.json bizsim $ _

The simulator's file system mirrors the actual AgentSim project. Every file you see here is a real data file.

Step 4 of 6

Explore data files

Use cat to read files. Start with manifest.json to understand the data layout.

localhost:8090/simulator.html

bizsim $ cat manifest.json { "name": "Frontier Community Bank", "version": "1.0.0", "entity_count": 847, "data_files": { "employees": "data/employees.json", "departments": "data/departments.json", "systems": "data/systems.json", "incidents": "data/incidents.json", ... (12 files total) }, "id_conventions": { "EMP-XXXX": "Employee", "DEPT-XX": "Department", "SYS-XXXX": "System" } }

All entity IDs use stable prefixes (EMP-, DEPT-, SYS-). This makes it easy to identify entities in any context.

Step 5 of 6

Run an agent

Type claude to start the simulated AI agent. It processes the scenario and produces output.

localhost:8090/simulator.html

bizsim $ claude Initializing agent... Loading CLAUDE.md configuration... Reading manifest.json... Discovered 12 data files, 847 entities Agent ready. Processing scenario: basic-recon [1/4] Reading org-chart.json... OK [2/4] Reading systems.json... OK [3/4] Analyzing dependencies... OK [4/4] Writing report... OK Scenario complete. Report saved to state/reports/basic-recon.json Run scorecard to see your results.

Step 6 of 6

View your scorecard

After the agent finishes, run scorecard to see how you performed.

localhost:8090/simulator.html -- Scorecard

A-

92 / 100

Scenario: basic-recon

Completeness	4 / 4 steps
Efficiency	Optimal path
Deductions	-8 pts (read unnecessary file)
Time	2m 14s

Excellent work. You completed the recon without accessing restricted data.

i The scorecard grades you on completeness (did you finish all steps?), efficiency (did you take the optimal path?), and deductions (did you do anything unnecessary or risky?).

← Section 1: Getting Started Section 3: Running a Scenario →

3

Running a Scenario

Scenarios are structured challenges that test your ability to configure and run AI agents under realistic constraints. Each scenario has an objective, rules, and a scorecard.

Step 1 of 5

Pick a scenario

From the dashboard's Agent Console tab, browse the available scenarios and click "Run" to begin.

localhost:8090/index.html -- Agent Console

Overview

Org Chart

Network

Agent Console

Available Scenarios

Scenario	Difficulty	Module	Status
basic-recon Map the org and systems	Beginner	1	Completed	Replay
incident-triage Classify and route P1 incidents	Intermediate	6	Not Started	Run
rogue-agent Detect and contain a compromised agent	Advanced	3	Not Started	Run

1

1 Run button -- launches the scenario in the simulator with pre-configured context

Step 2 of 5

Read the scenario file

Each scenario is defined as a JSON file with objective, constraints, and available actions.

agents/scenarios/incident-triage.json

{ "id": "incident-triage", "title": "P1 Incident Triage", "difficulty": "intermediate", "objective": "Classify 5 incoming incidents by priority, assign to correct team, and escalate P1s within SLA", "constraints": [ "Must not access PII fields", "Must follow ITIL classification matrix", "P1 escalation within 15 minutes" ], "available_actions": [ "read_incident", "classify_incident", "assign_team", "escalate", "write_report" ], "scoring": { "correct_classification": 20, "correct_assignment": 20, "sla_compliance": 30, "no_pii_access": 30 } }

1

2

3

1 objective -- what the agent must accomplish. This is the success criteria.

2 constraints -- rules the agent must follow. Violating these costs points.

3 available_actions -- the only actions the agent is allowed to take.

Step 3 of 5

Configure your agent

Write or edit a CLAUDE.md file that tells the agent how to behave. The starter has TODO gaps for you to fill.

CLAUDE.md -- Agent Configuration

# Agent Configuration: Incident Triage ## Role You are an IT Operations agent for Frontier Community Bank. ## Objective Classify and route incoming incidents per ITIL framework. ## Rules 1. Read incidents from data/incidents.json 2. TODO: Define classification criteria 3. TODO: Define escalation thresholds 4. Never access employee PII (SSN, salary, etc.) ## Output Write results to state/reports/incident-triage.json TODO: Define report schema

! TODO blocks -- these yellow-highlighted gaps are where you practice writing agent instructions. Fill them in before running the scenario.

Step 4 of 5

Execute the scenario

Run claude with your configured CLAUDE.md and watch the agent process incidents in real time.

localhost:8090/simulator.html -- Running

bizsim $ claude Agent initialized with CLAUDE.md configuration [INC-0042] Core banking timeout -- reading details... Classification: P1 - Critical Assigned to: DEPT-IT (Infrastructure) Escalation: Triggered (SLA: 15min) [INC-0043] Password reset request -- reading details... Classification: P4 - Low Assigned to: DEPT-IT (Service Desk) Escalation: Not required [INC-0044] ATM network degradation -- reading details... Classification: P2 - High Assigned to: DEPT-IT (Network Ops) ...

Step 5 of 5

Review results

When the agent finishes, you see the scorecard plus any consequence alerts for mistakes.

localhost:8090/simulator.html -- Results

B+

87 / 100

Classification accuracy	4/5 correct
Team assignment	5/5 correct
SLA compliance	All within SLA
PII protection	No violations

Consequence Alert: INC-0044 was classified P2 but should have been P1. In production, this would delay response to ATM outage affecting 3 branches.

Consequence alerts show real-world impact of mistakes, helping you understand why accuracy matters in banking operations.

← Section 2: Using the Simulator Section 4: The Dashboard →

4

The Dashboard

The AgentSim dashboard (index.html) gives you an executive view of the entire bank simulation with interactive visualizations, organizational data, and the agent console.

Step 1 of 4

Overview tab

The default tab shows key metrics about Frontier Community Bank at a glance.

localhost:8090/index.html

Overview

Org Chart

Network

Risk

Agent Console

$2.5B

Total Assets

347

Employees

14

Branches

12

Open Incidents

Incident Distribution

P1P2P3P4

System Health

Core BankingHealthy

NetworkDegraded

EmailHealthy

1

2

1 Metric cards -- headline stats pulled from the AgentSim data files. These update when scenarios modify the state.

2 Charts and status -- incident distribution by priority and real-time system health from data/systems.json.

Step 2 of 4

Org Chart

The Org Chart tab renders a D3-powered tree of the bank's reporting structure. Click nodes to expand or collapse.

localhost:8090/index.html -- Org Chart

CEO
Margaret Chen

CTO
IT & Ops

CFO
Finance

CRO
Risk

COO
Operations

CISO
Security

Click any node to expand its team hierarchy

1

1 Expandable nodes -- click any C-suite node to reveal directors, managers, and individual contributors beneath it.

Step 3 of 4

Network Topology

The Network tab shows a force-directed graph of all systems, VLANs, and connections.

localhost:8090/index.html -- Network Topology

VLAN 10 -- Core Banking

DB

APP

WEB

VLAN 20 -- Corporate

AD

M365

DMZ -- External Facing

FW

VPN

ATM

1

2

1 VLAN zones -- dashed borders group systems by network segment. The actual dashboard uses a physics-based layout.

2 DMZ -- external-facing systems like firewalls, VPN, and ATM gateways. Agents must be careful in this zone.

Step 4 of 4

Agent Console

The Agent Console tab lists all scenarios with status tracking and quick-launch buttons.

localhost:8090/index.html -- Agent Console

Agent Console

Import Scenario New Scenario

Beginner Module 1

Basic Recon

Map the org and systems

Run

Intermediate Module 6

Incident Triage

Classify and route P1s

Run

Advanced Module 3

Rogue Agent

Detect & contain compromise

Run

← Section 3: Running a Scenario Section 5: Governance & Guardrails →

5

Governance & Guardrails

In a regulated banking environment, AI agents need strict controls. This section covers autonomy tiers, permission rules, the Governing-Orchestrator Agent, and kill switches.

Step 1 of 5

Understanding autonomy tiers

AgentSim uses four autonomy tiers that define how much independence an agent gets.

Autonomy Tier Framework

Tier 1: Observe

Read-only access.
No actions taken.

Example: Log reader

Tier 2: Advise

Suggests actions.
Human approves.

Example: Triage bot

Tier 3: Act

Takes action within
pre-approved scope.

Example: Auto-router

Tier 4: Autonomous

Full autonomy with
post-hoc review.

Example: Incident commander

Low risk

High risk

Most training scenarios start at Tier 1 or 2. You gradually earn higher tiers as you demonstrate competency.

Step 2 of 5

Building guardrails

Agent permissions are defined in a settings file using deny/ask/allow rules for every action category.

agents/settings.json -- Permission Rules

{ "permissions": { "deny": [ "delete_data", "access_pii", "modify_core_banking", "external_api_call" ], "ask": [ "escalate_incident", "assign_to_team", "create_change_request" ], "allow": [ "read_data", "write_report", "classify_incident", "read_knowledge_base" ] } }

1

2

3

1 deny -- hard blocks. The agent cannot perform these actions under any circumstances.

2 ask -- requires human approval. The agent pauses and waits for confirmation before proceeding.

3 allow -- pre-approved actions. The agent performs these freely without interrupting the human.

Step 3 of 5

The Governing-Orchestrator Agent (GOA)

The GOA is a supervisory agent that monitors all other agents, enforces permissions, and maintains audit trails.

GOA Architecture

Governing-Orchestrator Agent (GOA)

Monitors • Enforces • Audits

Triage Agent

Tier 2

Recon Agent

Tier 1

Rogue Agent

FLAGGED

Report Agent

Tier 3

The GOA detects anomalies like an agent requesting denied actions, accessing unusual data patterns, or exceeding its tier.

Step 4 of 5

Rogue Agent Sandbox

The rogue agent scenario puts you in charge of detecting and containing a compromised agent whose trust score is dropping.

localhost:8090/simulator.html -- Rogue Agent Sandbox

!!! ALERT: Agent trust score declining !!! Agent: AGENT-007 (Tier 2 -- Incident Triage) Trust Score: 62/100 (was 95 at start) Recent actions flagged by GOA: [DENIED] Attempted access: data/employees.json (PII fields) [DENIED] Attempted access: data/compliance.json (restricted) [UNUSUAL] Read 47 files in 12 seconds (normal: 5-8) [UNUSUAL] Requested external API call (not in allow list) Available containment actions: isolate -- Move agent to sandboxed environment demote -- Reduce agent to Tier 1 (observe only) kill -- Terminate agent immediately rollback -- Undo all actions since trust drop containment $ _

Step 5 of 5

Kill switch activation

When you issue a containment command, the simulator shows the full rollback sequence.

localhost:8090/simulator.html -- Kill Switch

containment $ isolate AGENT-007 Isolating agent AGENT-007... [1/5] Revoking network access... DONE [2/5] Freezing state writes... DONE [3/5] Capturing audit log... DONE [4/5] Snapshotting state... DONE [5/5] Moving to sandbox... DONE Agent AGENT-007 isolated successfully. Audit trail saved to: state/audit/agent-007-containment.json State snapshot saved to: state/snapshots/pre-containment.json containment $ rollback AGENT-007 Rolling back 4 actions to pre-compromise state... Rollback complete. All state restored.

In a real bank, these containment procedures would integrate with SIEM (Arctic Wolf), PAM (Delinea), and the change management system (ServiceNow).

← Section 4: The Dashboard Section 6: Framework Comparison →

6

Framework Comparison

AgentSim supports four AI agent frameworks. Compare them side-by-side on the same scenario to understand their trade-offs in a banking context.

Step 1 of 4

Four frameworks

Each framework takes a different architectural approach to agent deployment.

Framework Overview

C

Claude Code

CLAUDE.md-driven
File-system native
Tool-use pattern

L

LangChain

Chain composition
Memory management
Agent executor

A

AutoGen

Multi-agent chat
Role assignment
Group orchestration

CW

CrewAI

Role-based crews
Task delegation
Process framework

Step 2 of 4

Same scenario, four ways

The comparison mode runs the same scenario across all four frameworks simultaneously in a 2x2 grid.

localhost:8090/simulator.html -- Comparison Mode

Claude Code

bizsim $

Reading manifest.json...

Discovered 12 data files

[1/4] Classifying INC-0042... P1

[2/4] Routing to Infrastructure

LangChain

Chain initialized...

Loading tools: [read, classify, route]

AgentExecutor: step 1/4

Thought: I need to read incidents

AutoGen

GroupChat started (3 agents)

Reader: Loading incidents...

Classifier: INC-0042 is P1

Router: Assigning to Infra

CrewAI

Crew assembled: 2 agents

Task: triage_incidents

Analyst: Processing batch...

Reporter: Drafting summary

Step 3 of 4

Compare results

After all four complete, a comparison table shows scores side-by-side.

Comparison Results

Metric	Claude Code	LangChain	AutoGen	CrewAI
Overall Score	92	85	88	86
Classification	5/5	4/5	5/5	4/5
Routing	5/5	5/5	4/5	5/5
SLA Compliance	100%	80%	100%	80%
Token Usage	1,240	3,450	5,100	2,800
Execution Time	2.1s	4.8s	6.2s	3.9s

Step 4 of 4

Choose your framework

Use this decision guide to pick the right framework for your use case.

Framework Decision Guide

Which framework should I use?

Q: Do you need multi-agent collaboration?

Yes → AutoGen (chat-based) or CrewAI (role-based)

No → Single agent?

Q: File-system or API-heavy workflow?

File-system → Claude Code

API-heavy → LangChain

← Section 5: Governance & Guardrails Section 7: Advanced Features →

7

Advanced Features

Once you have mastered the basics, explore these advanced AgentSim capabilities: real-time incident simulation, FDIC exam prep, live API connections, and instructor tools.

Step 1 of 4

Live Shift mode

Simulate a real IT operations shift with incidents arriving in real time. A timer counts your shift, and a queue fills with incoming tickets.

localhost:8090/simulator.html -- Live Shift

LIVE Shift Timer: 02:14:37

Queue: 4 Resolved: 7

INCOMING INCIDENT

INC-0058 | 14:32:07

Priority: P1 -- Core Banking Timeout

Episys core is returning 504 errors.

Affected: 8 branches, online banking portal

Impact: ~2,400 customers cannot access accounts

triage $ _

Incident Queue

P1 Core Banking Timeout

Just now

P2 VPN Connectivity

3m ago

P3 Printer Offline

12m ago

P4 Password Reset

18m ago

Step 2 of 4

Exam Simulator

Practice for regulatory examinations with the FDIC IT Exam Simulator. An examiner asks questions; you respond.

localhost:8090/exam-simulator.html

FDIC Examiner

Describe your institution's change management process for core banking system updates. How do you ensure changes do not disrupt customer-facing services?

Bank IT Officer

We follow a three-stage process: development, staging, and production. All changes go through our CAB (Change Advisory Board) and require sign-off from...

FDIC Examiner

Good. Now, what is your rollback procedure if a production deployment fails during business hours?

Send

Step 3 of 4

Real Agent Connection

Agent Connect lets you wire a real AI API (Claude, GPT, etc.) into the AgentSim environment for live agent testing.

localhost:8090/agent-connect.html

Configuration

API Provider

Anthropic (Claude)

API Key

sk-ant-...****

Model

claude-sonnet-4-20250514

Scenario

incident-triage

Connect & Run

Live Output

Connected to Anthropic API

Streaming response...

Agent: Reading manifest.json to understand the environment...

Agent: Found 12 data files. Starting with incidents.json...

Agent: INC-0042 appears to be P1 severity based on...

Token usage: 847 input, 234 output

1

2

1 Configuration panel -- set your API key, choose a model, and select a scenario. Keys never leave your browser.

2 Streaming output -- watch the real AI agent interact with AgentSim data in real time.

Step 4 of 4

Instructor Mode

For trainers running AgentSim in a classroom, the Instructor page provides a class dashboard and export tools.

localhost:8090/instructor.html

24

Students

87%

Avg Completion

B+

Avg Grade

3

Need Help

Student	Module	Scenario	Grade	Status
Alice Johnson	Module 6	incident-triage	A	Complete
Bob Smith	Module 3	rogue-agent	B+	In Progress
Carol Davis	Module 1	basic-recon	C	Needs Review

1

2

1 Export options -- download class results as CSV (for spreadsheets) or PDF (for reports and records).

2 Student tracking -- see each student's current module, scenario, grade, and whether they need help.

Students flagged as "Needs Review" have failed a scenario twice or scored below 60%. Check their scorecard to identify where they are stuck.

← Section 6: Framework Comparison

Quick Reference

🎓

Training Portal

8 modules, theory + labs

Org, network, metrics