
C01 · 2026
AI Requirements Agent
- Client
- Defence prime contractor
- Sector
- Defence
- Duration
- 10 weeks
- Outcome
- 3× requirements throughput
- Tech Stack
- Llama 3 · Python · ISO 15288
The brief.
The problem
The client held a large backlog of legacy tender PDFs accumulated over more than a decade of programme work. Converting those documents into traceable, ISO 15288-style requirements was an entirely manual process — analysts read each document, extracted candidate statements, applied syntax rules, and manually populated a requirements register. With dozens of tenders in the backlog and a programme gate approaching, the existing approach could not scale. Requirements engineers were spending the majority of their time on extraction and formatting rather than analysis and validation.
The constraint
Data sovereignty was non-negotiable. The tender documents contained sensitive programme information subject to Australian defence handling requirements, and the client's security posture prohibited any data leaving the on-premises network. Cloud-based LLM APIs were entirely off the table. The solution had to run entirely on hardware already within the client's security perimeter, which meant smaller open-weight models and careful prompt engineering to compensate for reduced raw capability.
How we built it.
The engagement opened with a three-day discovery sprint. Allayze embedded with the client's requirements engineering team to understand how analysts read tender documents, how they judged whether a sentence constituted a requirement, and what the programme's specific ISO 15288 syntax rules looked like in practice. We captured twenty annotated examples — good extractions, missed statements, and false positives — that would become the foundation of our prompt strategy.
With those examples in hand, we selected Llama 3 8B as the inference model and deployed it via Ollama on the client's existing GPU server. A Python orchestration layer handled PDF ingestion, chunking, and context assembly. Each document was split into overlapping windows of roughly 600 tokens, passed through a two-stage pipeline: a classifier stage that identified candidate requirement sentences, followed by a structuring stage that applied ISO 15288 syntax, assigned unique identifiers, and linked each requirement back to its source page and paragraph for full traceability.

Human-in-the-loop review was designed in from day one. The system produced a candidate register in a structured spreadsheet format familiar to the analysts, with confidence scores and source citations alongside each generated requirement. Analysts could accept, edit, or reject each statement with a single keystroke. Rejected items fed back into a refinement loop, progressively improving the model's understanding of the client's specific style and standards. By week six, the rejection rate had dropped from 28% to under 9%.
“The throughput gain was immediately obvious — but what surprised us most was the consistency. Every requirement came out in the same voice, which is something three engineers writing in parallel never managed.”
The final two weeks were dedicated to integration testing against the programme's existing requirements management tooling, hardening the error-handling for malformed PDFs and scanned documents with poor OCR quality, and producing operator documentation. The client's security team conducted an independent review of the deployment architecture and confirmed no data egress paths existed. The system was handed over to the requirements engineering team with two days of onsite training and a set of worked examples covering the full range of tender document types in the backlog.
What moved.
What we shipped.


