Building AI Agents That Actually Work in Engineering

When I built the AI-powered Regression Analyzer at Tietoevry, the goal was simple: stop engineers from spending 15–45 minutes manually triaging each test failure in our 5G Core regression cycles.

The result was a multi-agent pipeline that does it automatically. Here’s what I learned.

The Problem with Single-Agent Approaches

A single LLM call asking “why did this test fail?” works for toy examples. In production it falls apart because:

Context windows fill up fast when you have hundreds of test cases
Sequential processing is too slow for regression cycles
A single agent can’t hold the full picture needed to cross-correlate failures

The answer is parallel specialised agents.

Parallel Agents > One Big Prompt

Instead of one agent trying to do everything, I designed a pipeline where:

A coordinator ingests the failing test list and spawns agents
Each analysis agent gets exactly one failure — its logs, config, and relevant spec context
Agents run in parallel, each producing a structured triage report
An aggregation agent cross-correlates findings and identifies patterns

This kept each agent’s context focused, made the pipeline horizontally scalable, and reduced total wall-clock time dramatically.

Structured Output Is Non-Negotiable

In engineering automation, you can’t parse free-text agent output reliably. Every agent in the pipeline outputs structured JSON:

{
  "test_id": "TC_5GC_UPF_042",
  "failure_category": "configuration_mismatch",
  "root_cause": "...",
  "confidence": 0.87,
  "relevant_log_lines": [...]
}

This makes aggregation deterministic and lets the pipeline feed results into dashboards, tickets, or downstream systems without fragile string parsing.

Trust But Verify

AI agents make mistakes. In a regression analysis context, a wrong root cause hypothesis is annoying but not catastrophic — an engineer reviews and confirms. Design your human-in-the-loop accordingly:

Flag low-confidence findings for mandatory human review
Log all agent reasoning, not just conclusions
Track false positive/negative rates over time and tune prompts accordingly

The Real Win

The biggest benefit wasn’t the time saving per se — it was consistency. Human engineers triage differently on Monday morning vs. Friday afternoon. The agents are consistent every time. That consistency makes it possible to track trends across regression cycles in ways that were impossible before.

Interested in AI agent architecture or applying this to your engineering workflow? Reach out at bartlomiej@paszkiewicz.tech.

The Problem with Single-Agent Approaches#

Parallel Agents > One Big Prompt#

Structured Output Is Non-Negotiable#

Trust But Verify#

The Real Win#

The Problem with Single-Agent Approaches

Parallel Agents > One Big Prompt

Structured Output Is Non-Negotiable

Trust But Verify

The Real Win