LLM Field Notes as a vocabulary reference

In Short

Problem: articles about agents quickly become blurry when technical words are used as decoration.
What I read: LLM Field Notes, mostly as a vocabulary guardrail.
Result: useful for clarifying terms, not enough to prove a technical conclusion.
Lesson: a word like harness, eval, or memory should change something in the tool being described. Otherwise it mostly adds noise.

Source

This note comments on LLM Field Notes. The source is not mine: it belongs to its author. I use it as a reading point for Project Pezzos, not as a glossary to copy or translate wholesale.

The repository tomerjann/llm-field-notes presents a compact set of terms around LLM systems. That format is useful precisely because it keeps the words close to their operational meaning.

Why I Keep This Note

In Project Pezzos articles around Codex, skills, agent memory, evals, or MCP, the risk is not only being technically wrong. It is also using blurry words. Harness, memory, context, eval, and agentic loop can become labels that sound explanatory while hiding the details.

That is why I like short glossaries when they stay practical. They do not replace tests, architecture, or evidence. They do help avoid writing sentences where impressive words carry no engineering consequence.

For example, saying that a system has “memory” is not enough. What is stored? Where? For how long? Who can read it? Does it change the next action, or is it just context pasted back into a prompt?

The same applies to harness, eval, tool, or agent. Each word should force a concrete question about the implementation.

I do not want to make the source a final authority. I want to use it as a reading reference, then verify important terms with primary sources when those terms carry a technical conclusion.

What I Would Reuse

I would reuse the glossary as a small checkpoint before writing about agents:

does this term describe a real mechanism?
can I point to the file, command, prompt, or behavior behind it?
would removing the term change the explanation?
is the term helping the reader, or just making the note look more technical?

That is enough value for me. The glossary is not proof, but it can prevent sloppy language before a draft turns into a confident claim.

The SEO benchmark gives a simple example. Context should mean the files actually available during the run: the data README, CSV exports, repo instructions, article draft, and site pages used to understand recommendations. If a report talks about a query that is absent from the exports, that is no longer context; it is a hypothesis.

An eval is not just “does this look good”. It needs criteria, comparable data, and a score or judgment that can be repeated. Without that, I should probably write “manual test” instead of using the cleaner word.

The rule I keep is simple: a technical word should point to an observable difference in the tool being built. If it does not, the plain word is usually better.