Six hard-won lessons from testing AI systems daily
Jaskaran Singh, a developer who transitioned from building apps to testing AI, shares six practical lessons about how AI models actually work — covering tokens, context windows, and temperature settings.
Score breakdown
Understanding token budgets, context window limits, and temperature settings helps AI/coding practitioners diagnose subtle model failures — like forgotten instructions or erratic outputs — before they cause real problems in production tools.
- 01Author Jaskaran Singh spent five years building consumer apps before transitioning to AI testing.
- 02AI models break text into tokens — e.g., 'hamburger' may be three tokens — and every token counts against a model's maximum budget.
- 03When a conversation fills the context window, the oldest content is silently erased with no warning.
Jaskaran Singh transitioned from five years of consumer app development into AI testing, where his daily work involves probing models for failure modes that look fine on the surface. His Dev.to post frames six foundational AI concepts not as textbook definitions but as lessons learned through firsthand surprises and quiet failures.
Singh explains that AI models don't read words — they process fragments called tokens, and every token counts against a model's maximum budget.
The first two lessons center on tokens and context windows. Singh explains that AI models don't read words — they process fragments called tokens, and every token counts against a model's maximum budget. A word like "hamburger" may consume multiple tokens, while punctuation and spaces add more. When a long conversation exhausts that budget, the oldest content is silently dropped, which Singh discovered when his own conversations started producing off-target answers. He recommends OpenAI's Tokenizer Playground as a hands-on way to see token counts in real text. The context window lesson came from a monitoring tool he built to track Canada's immigration website for unannounced program openings. After enough check cycles, the tool's original instructions were pushed out of the context window entirely, causing it to re-alert on old updates and miss new ones. The fix was to feed the model only what it needed per task rather than accumulating the full history.
The third concept covered is temperature — the setting that governs how adventurous a model's word choices are. Singh tested the same prompt at low and high temperature settings: the low setting produced a clear, immediately usable answer, while the high setting produced something more interesting but less predictable. He notes that higher temperature suits creative or brainstorming tasks, while factual or precise tasks call for lower settings. The source text is truncated before the remaining three lessons are described.
Key facts
- 01Author Jaskaran Singh spent five years building consumer apps before transitioning to AI testing.
- 02AI models break text into tokens — e.g., 'hamburger' may be three tokens — and every token counts against a model's maximum budget.
- 03When a conversation fills the context window, the oldest content is silently erased with no warning.
- 04Singh built a tool to monitor Canada's immigration website for unannounced program openings; it malfunctioned when its original instructions were pushed out of the context window.
- 05The fix for context overflow was to give the AI only what it needs per task, not the full accumulated history.