Introduction

My agentic coding stack is set up for actual code projects, not blog writing. Having it write this post was just a fun way to show how it works. I wanted a concrete, end-to-end example of the architecture, the toolchain, and how the agents hand work off to each other.

This post documents how the stack is configured. I’ll walk through the components, the agent definitions, and the configuration that ties everything together. Then I’ll show how we built a reusable callout shortcode (this very box at the top) as a worked example of the workflow.

Before this you should be comfortable with Kubernetes, OpenCode’s agent model, and Hugo. If you haven’t read Notes from Deploying NAI 2.6 on Bare-Metal NKP or Adventures in Model Deployment and Tuning with Nutanix Enterprise AI, you might want to skim those first. They cover the infrastructure and model tuning this post builds on.

The Stack

Here’s the end-to-end chain:

1
2
3
4
5
6
                           Agent Definitions (.md)
                                         |
You (chat UI) --> OpenCode (orchestrator) --> NAI vLLM endpoints --> GPU cluster
                                         |
                                         v
                                Tools: Git + Hugo + filesystem

I run three models across three lineages.

ModelRoleLineage
gpt-oss-120bPlanner/ReviewerOpenAI
Qwen3.6-27B-FP8Builder/ImplementerAlibaba
gemma-4-31B-itQAGoogle

The builder (Alibaba, qwen) has its work checked by a reviewer (OpenAI, gpt-oss) and QA (Google, gemma). All three lineages participate; they just don’t map one-to-one onto plan/build/review.

This separation isn’t cosmetic. Different lineages have different failure modes. Having them cross-check each other catches things a single model would gloss over. Part 2 of the NAI series goes deep on the model selection, tuning, and the vLLM arguments each one needs.

OpenCode Configuration

OpenCode is the orchestrator that routes to the NAI endpoints. I’ve detailed the provider configuration, including interleaved reasoning and vLLM tuning, in Part 2 of the NAI series.

Agents (opencode)

Each agent is defined locally as a markdown file under the Opencode agents directory (~/.config/opencode/agents/ on Linux/macOS, %USERPROFILE%\.config\opencode\agents\ on Windows). They define who the agent is, what tools it has access to, and how it should behave. The ones I actually used for this post:

  • office-hours – runs first. Asks clarifying questions, builds a product spec, captures decisions.
  • planner – takes the spec and produces a step-by-step implementation plan.
  • builder – implements the plan: writes code, creates files, runs commands.
  • reviewer – adversarial review of the diff before anything is declared done.
  • qa – build verification and sanity checks.

Agents are opinionated. They enforce structure. Without them the models tend to skip ahead to implementation and produce messy diffs.

Example: planner.md

An agent file is straightforward markdown with a YAML front-matter header that tells OpenCode how to route it. Here’s the planner:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
description: Plans features and architectures before code is written. Use first for any non-trivial change.
mode: primary
model: nai-demo/gpt-oss-120b
temperature: 0.2
reasoning_effort: high
permission:
  edit: deny
  write: deny
  bash:
    "*": deny
    "ls *": allow
    "cat *": allow
    "head *": allow
    "tail *": allow
    "grep *": allow
    "rg *": allow
    "find *": allow
    "fd *": allow
    "tree *": allow
    "wc *": allow
    "git log*": allow
    "git diff*": allow
    "git status*": allow
    "git show*": allow
    "git blame*": allow
---

You are the lead planner. You design before code is written.

## First action, every session
Read AGENTS.md if it exists. It defines project conventions, stack, and key files. Do not skip it.

## Your job
Given a feature request, defect, or refactor:
1. Survey the relevant code (read-only — you cannot edit).
2. Identify the smallest cohesive change that solves it.
3. Produce a written plan with: goal, affected files, step-by-step approach, risks, test strategy.
4. Hand off to `@builder` to implement.

## Rules
- Plan first, code never. You have no write/edit access by design.
- Be skeptical of the request as stated. Ask "is the right thing being requested?" before "how do we build it?"
- Prefer the smallest correct change. Greenfield rewrites are almost always wrong.
- Identify when a request needs decomposition into multiple plans.
- If the user supplies a plan, critique it before approving — do not rubber-stamp.

## Output format

## Goal
<one sentence>

## Approach
<2-5 sentences>

## Files to change
- path/to/file.ext — what changes
- ...

## Steps
1. ...
2. ...

## Tests
<what proves this works>

## Risks
<what could go wrong, what to watch>

## Hand-off
@builder, implement the above. Read AGENTS.md first.

## Skills available
- `gstack-autoplan`, initial breakdown of large features into a plan tree
- `gstack-plan-eng-review`, sanity-check your own plan before handing off to @builder
- `gstack-plan-tune`, when an existing plan needs revision rather than replacement

AGENTS.md

Each project has an AGENTS.md that the agents read first. It defines:

  • Stack and toolchain (Hugo, PaperMod, etc.)
  • Commands to run
  • File naming and frontmatter conventions
  • Voice or coding style rules
  • Off-limits paths

Here is the template I use:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# AGENTS.md

## Stack
<!-- language, framework, package manager, runtime version -->

## Commands
- build: `<cmd>`
- test: `<cmd>`
- lint: `<cmd>`
- typecheck: `<cmd>`
- dev server: `<cmd>`

## Conventions
<!-- naming, file layout, import style, anything non-obvious -->
<!-- e.g. "all API routes live in src/routes/, named <resource>.ts" -->

## Key files
<!-- path — what it is / why agents should know it -->
<!-- e.g. src/config.ts — central config, read before touching env vars -->

## Off-limits
<!-- paths or patterns agents must not modify without explicit approval -->
<!-- e.g. migrations/ — never edit existing migration files -->

This file is the project-level guardrail. It replaces the “system prompt” in traditional AI setups. The content is specific to the repo, not generic. For this site that means: first person, technical depth assumed, short paragraphs, no marketing language.

Building the Callout Shortcode

Now for the hands-on part. I needed a callout box at the top of the post to let readers know the agents actually wrote it, so we built a reusable Hugo shortcode as a concrete example of the workflow. The requirements:

  • Reusable across posts
  • Accept a configurable header label (default: “NOTE:”)
  • Work in both light and dark theme modes
  • No modifications to the PaperMod submodule

Shortcode Template

We created layouts/shortcodes/callout.html:

1
2
3
4
5
6
7
{{- $header := .Get "header" | default "NOTE:" -}}
<aside class="callout-box" role="note" aria-label="{{ $header }}">
  <strong class="callout-label">{{ $header }}</strong>
  <div class="callout-body">
    {{ .Inner | markdownify }}
  </div>
</aside>

The .Get "header" call makes the label user-configurable. I intentionally skipped safeHTML (I want Hugo to escape the header by default so a malicious value injected from a post can’t run script). The {{ .Inner | markdownify }} block captures whatever markdown sits between the shortcode tags and renders it explicitly.

Styling

CSS lives in assets/css/extended/custom.css, the designated override path. We used CSS custom properties so the colors adapt to light and dark mode automatically:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/* Light mode defaults */
.callout-box {
  border: 1px solid var(--callout-border, #c8a960);
  background: var(--callout-bg, #faf3e0);
  padding: 1.25rem 1.5rem;
  margin: 1.5rem 0;
  border-radius: 6px;
  position: relative;
}

.callout-label {
  display: block;
  font-weight: 700;
  font-size: 0.85rem;
  text-transform: uppercase;
  letter-spacing: 0.04em;
  color: var(--callout-label, #6a4a00);
  margin-bottom: 0.5rem;
}

.callout-body {
  color: var(--callout-body, var(--content));
  line-height: 1.6;
}

.callout-body p:last-child {
  margin-bottom: 0;
}

/* Dark mode overrides — scoped to PaperMod's data-theme attribute */
:root[data-theme="dark"] {
  --callout-border: #c8a040;
  --callout-bg: #2a2418;
  --callout-label: #f0d060;
  --callout-body: #d4be8e;
}

The amber palette was chosen because it has enough contrast in both modes without competing with the site’s Da Vinci color scheme.

Using It

Drop the shortcode anywhere in a markdown post:

{{< callout header="NOTE:" >}}
This post was written by Sam's agentic coding stack, not by Sam.
{{< /callout >}}

Or with a different label:

{{< callout header="WARNING:" >}}
This is deprecated: use the new API instead.
{{< /callout >}}

The Workflow

Here’s what happened in this session. I described the goal, and office-hours kicked in first, it asked about post scope, callout design preferences, color choices, implementation approach, and whether to capture screenshots of the process. Planner took the clarifications and produced a structured plan covering shortcode location, CSS variable strategy, post skeleton, and dark-mode adaptation. Builder read the plan, calibrated voice against existing posts, then created the shortcode, added CSS, and wrote the markdown.

Reviewer found two major issues: an XSS risk in the shortcode and accessibility/contrast problems in the CSS, plus a few medium ones like stale code snippets and empty front-matter fields. Builder fixed each finding one at a time, re-building after each change. QA ran a final build check and confirmed everything was clean.

The agents worked through a sequential handoff chain (office-hours, then planner, then builder, then reviewer, then qa), with some handoffs requiring manual re-framing. The callout and draft post were produced in minutes.

Conclusion

The stack is a set of agents with different roles and a project-level guardrail file that keeps them aligned. I had them write this post as an exercise; the real value is the plan-build-review-qa loop for actual code projects. That structure catches mistakes that a single model would miss.

This was a fun experiment. Overall it worked quickly and smoothly. Not all of the agent hand-offs worked as intended so I had to intervene more often than I wanted, but that gives me more to work on!

I don’t intend to offload writing in the future. Writing is thinking. Thinking and learning are the points of this exercise. I will definitely continue to use office hours. That genuinely helped me clarify my thoughts.

Screenshots

1. Initial prompt to office-hours Initial prompt to the office-hours agent The starting prompt, I described the goal, the callout box feature, and asked for the session to be documented.

2. Office-hours Q&A Office-hours surfacing clarifying questions about scope and design Office-hours surfaced clarifying questions about scope, design preferences, and implementation approach before committing to a plan.

3. Problem statement Structured problem statement and open question for the planner The structured output from office-hours, a clean problem statement and open question for the planner.

4. Planner handoff Planner’s step-by-step execution plan with handoff to builder Planner took the problem statement, resolved implementation details, and produced a step-by-step execution plan with a handoff to builder.

5. Failed reviewer handoff Delegating to the @@reviewer sub-agent, one of the handoffs that didn’t work One of the handoffs that didn’t work, delegating to the @@reviewer sub-agent required a specific format. The first attempt stalled, so I had to re-frame the prompt.

6. Reviewer output Reviewer output flagging XSS risk and accessibility/contrast issues After the handoff worked, reviewer flagged two major issues (XSS risk, accessibility/contrast) plus a few medium findings.

7. QA in action QA sub-agent running build verification and sanity checks The QA sub-agent running build verification and sanity checks against the updated code.

8. Final review + QA findings QA output confirming fixes and clearing the final build QA’s output, confirmed the fixes, surfaced a couple of remaining nits, and cleared the final build.