Training an Agent

In my Spring 2026 semester at USC, I took the Applied Neural Networks class as part of my specialization in Artificial Intelligence Applications. As a class, we discussed the foundations of neural network architecture, from perceptrons to deep learning models like CNNs and RNNs, as well as real-world applications of neural networks across industries, from computer vision to natural language processing.

For the final project, we were broken up into groups and tasked with training a reinforcement learning agent using Deep Q-Learning (DQN). My team built an interactive HTML dashboard to visualize training progress in real time and experimented with "fast" versus "full" training modes to balance training speed against model performance.

Work Focus

Reinforcement Learning,

Model Evaluation,

Data Visualization

Role

Machine Learning Engineer,

Data Visualization Developer

Timeline

Spring 2026

Design Question & Problem Space

For our Applied Neural Networks final project, my team set out to train a reinforcement learning agent to solve the Taxi-v3 environment. It is a classic RL benchmark where an agent must learn to navigate a grid, pick up a passenger, and drop them off at the correct destination as efficiently as possible. Rather than relying on hardcoded rules, we wanted to see whether a Deep Q-Network (DQN) could learn this task purely through trial and error and reward feedback, and how much training time and hyperparameter tuning actually mattered to getting there.

Analysis & Tools

Using the dashboard, we tracked four key metrics across both training modes: total reward per episode, epsilon decay, average loss, and steps per episode, comparing how each evolved as training progressed. This let us directly compare the two training regimes side by side rather than relying on a single end-state number, and made it easy to visually confirm when the agent shifted from random exploration to a more deliberate, learned strategy. We also used Claude and Gemini throughout the project to help debug code and refine our written analysis, reviewing and testing all AI-assisted output against lecture material to confirm it matched what we'd learned in class.

Key Findings

The results made the impact of training duration unmistakable:

In fast mode, the agent's average reward only improved to -210 over 200 episodes and still hit the maximum step cap during evaluation
- It hadn't yet learned a reliable strategy
In full mode, the agent's average reward jumped from -507 early in training to nearly +5 by the end, while its average steps per episode dropped from 178 to just 15.6
- The agent had learned to complete pickups and drop-offs with minimal wasted movement
Post-training evaluation confirmed this gap:
- Full mode agent averaging a reward of -11.8 in just 31 steps
- Fast mode's -200 reward and a full 200-step timeout.

Reflection

This project pushed me to think about machine learning not just as code, but as an iterative experiment. Every hyperparameter we tuned told us something different about how the agent was learning, and watching the same model fail in Fast mode but succeed in Full mode taught me how much training duration and patience matter in reinforcement learning. Building the interactive dashboard also reinforced a skill I'd developed in my other data projects: translating raw training metrics into a visual story that someone without an RL background could still follow.

Working through this collaboratively, including reviewing and testing each other's AI-assisted code against lecture material, sharpened my ability to validate technical work critically rather than taking results at face value — a habit I want to carry into any data or AI-driven role I take on next.