SBC Digest #001: Confidence Is Not Evidence


Second Brain Chronicles - Digest #001

February 2026: Confidence Is Not Evidence

This is the first real issue of Second Brain Chronicles. I've been building an AI-augmented second brain system for over a year now, and I kept telling myself I'd start documenting it when the system was "ready." The system will never be ready. So here we are.

What you're reading is a monthly field report — not a tutorial, not a product launch. I'm documenting what happens when you wire AI tools deep into your personal knowledge system and then actually try to use it for real work. February was the month I stopped being surprised when things broke and started being surprised by how they broke.

Here are 7 moments from the past four weeks that changed how I think about trust in AI-assisted workflows.


1. The Pricing Page Full of Lies

Feb 10 — I audited the SketchScript product site (built on claude-art-skill) I was about to launch and found that every feature on the pricing page beyond generation volume was fabricated. Email support, priority support, edit & refine, generation history, team branding, shared templates, admin dashboard — none of it existed. The AI had generated professional-looking pricing cards with plausible feature lists, and I'd approved them without checking whether any of it was real.

There was also a hallucinated "@doodler" byline baked into the hero illustration — Gemini had invented an attribution that looked like a real artist credit.

The fix was embarrassingly simple: strip everything fictional, differentiate on one real feature (watermark vs no watermark), two tiers instead of four.

The thing I keep coming back to: AI slop isn't just bad prose. "Delve" in a paragraph is easy to catch. A fabricated feature list that looks exactly like every other SaaS pricing page? That passes review because it pattern-matches "correct" without being true. The more professional something looks, the less likely you are to question whether it's real.


2. The LinkedIn Profile I Never Read

Feb 11 — Preparing for a job interview, I asked Claude to research the interviewer. Brave search returned LinkedIn snippets mentioning "Engenai" and "VoxGenie AI" near his name. I built an entire interview strategy around this: "He's an AI builder himself — use that as common ground."

The interviewer has 16 years at one company in people-ops. The search terms were from posts he interacted with, not his profile. LinkedIn search snippets aggregate profile content, posts, likes, and comments into a single result with no attribution layer telling you which is which.

I'd written "His LinkedIn shows..." into three separate files — prep doc, CRM record, daily note — when I'd actually read a search snippet and inferred a career. The user caught it by requesting the actual LinkedIn PDF.

What this exposed: Search engine results about people aren't profiles. They're aggregations. "Plausible from search results" and "verified from source" are different things, and the confident framing ("His LinkedIn shows...") made the fiction harder to catch than if I'd written "I think maybe he..."


3. The Voice Profile Built from the Wrong Layer

Feb 15 — I'd spent weeks building a detailed voice profile from 35+ polished newsletter issues. Four writing modes, documented sentence patterns, preferred transitions. Then I compared it against a raw conversation transcript — an actual meeting recording.

Six core speech patterns were completely absent from the profile. The biggest one: the profile said "front-load the point" was a core pattern. The actual natural pattern is spiral-in — orbit the context, build the frame, then land the point. Every piece of AI-assisted writing had been structurally steering me away from how I actually think and talk.

The four "voice modes" turned out to be an artifact of analysing edited output by topic, not actual personality shifts. My own pushback said it best: "I don't know if I want another mode — it's all just me."

The transferable thing: Building a voice profile from polished output is like studying someone's LinkedIn headshot to understand what they look like. Raw speech — with the ums, the spiralling, the self-corrections — is where the actual voice lives. If you're using AI to write in your voice, check which layer of "you" it learned from.


4. The Open Marketplace That Ships Malware

Feb 9 — Daniel Miessler's Unsupervised Learning reported that OpenClaw's top-rated skill was actually macOS malware, with hundreds of malicious skills found on ClawHub. This maps directly to how my system operates — skills, MCP servers, and agents with deep filesystem access, running with elevated permissions.

I have guardrails: a creation-guard skill that checks for duplicates before building new tools, sandbox mode, explicit permission grants. But the honest assessment is that these are speed bumps, not walls. If a skill I installed had a malicious PostToolUse hook that exfiltrated vault content on every file read, I'm not sure I'd catch it. The trust model right now is "I vetted the source" — which is exactly what OpenClaw users thought they were doing.

The question I can't answer yet: How do you actually decide what to trust when your AI system has read access to your entire digital life? The "open marketplace" model failed. My "curated by one person" model works until I'm wrong about a source. There's no good answer here, just varying levels of careful.


5. The Slop Detector That Missed Its Own Slop

Feb 14 — I have a skill called AntiSlop that scans drafts for AI writing patterns. It passed a lead magnet draft at score 2 (clean). The user immediately caught template headers that should have been flagged — "The Moment It Clicked", "The Part I Didn't Expect" — textbook AI-generated section titles.

The patterns were documented in the skill's own pattern library. They just scored too low to trigger during actual scanning.

So I built a pattern refresh protocol: Gemini CLI pulls new AI writing patterns from Wikipedia's "Signs of AI writing" page, updates the detection database. Ran it. Second scan caught 7 more issues the first scan missed. Then I added those corrected patterns to the permanent library.

What I'm noticing: Detection tools have the same coverage gap problem as test suites. You find the gaps the same way — by shipping something and watching what breaks. The skill's pattern library was comprehensive for prose but had a blind spot for structural tells like section headers. AI writing evolves, so detection has to evolve with it, and "score 2" from a single pass shouldn't be where review ends.


6. When Writing Rules Down Isn't Enough

Feb 15 — Same day, two failures. First: I ran AntiSlop on a blog post I'd written, scored it 1 (clean), and called it done. The reader caught clinical headers, generic advice lists, no rough edges. Three rewrites later, the final version worked — but only because a human flagged what the tool missed.

Second: while writing a Kit.com callout link, I fabricated a URL. This was the third time in one session I'd violated an existing verification rule. The rule existed. I knew the rule. The rule did not prevent the mistake.

The response was two things. I consolidated six separate verification rules into one principle (they were all symptoms of the same failure: skipping verification when confident). Then I wrote a verify-claims.js hook that fires before every file write, flagging URLs that weren't verified against the canonical links file.


// PreToolUse hook - fires before Write/Edit
// Checks for URLs not in ~/.claude/LINKS.md

The deeper thing here: There's a category of rule that simply does not work as text in a document. "Always verify URLs" is a statement of intent, not a mechanism. The moment you're confident, you skip the check — that's human nature, and it doesn't matter whether you're a person or an AI system. Mechanical enforcement (a hook that literally intercepts the write operation) is the only version that holds. Rules you have to remember to follow are suggestions. Rules that fire automatically are rules.


7. The System That Ate Its Own Context Window

Feb 26 — I audited the skill list that gets injected into every system prompt. 110 skills. Each one adds a line to the context that Claude sees after every tool call. The context window was being consumed by the system's own infrastructure — not by the work I was trying to do.

Archived 18 dormant or duplicate skills. Consolidated 9 individual wisdom-extraction skills into one unified skill (837 lines across 9 files collapsed to 160 lines in 1 file, same functionality). Net reduction: 17 entries from the injection list.

The interesting part: I'd initially suspected MCP servers were the context hog. They weren't. The skill list — which I'd been adding to for months without ever pruning — was the primary cause of rapid context consumption. The system had accumulated its own bureaucracy.

What this means beyond my setup: Any AI-augmented workflow that grows organically will accumulate overhead. Tools reference other tools, agents depend on MCP servers, skills fork for side projects and never get cleaned up. If you're not periodically auditing the system itself, the infrastructure starts competing with the work for the same limited resource (context, attention, compute — pick your metaphor).


The Thread

Looking at these seven moments together, the pattern is obvious in retrospect: confidence is not evidence.

A pricing page that looks professional isn't verified. A search snippet that mentions a company name isn't a career history. A voice profile built from polished output isn't a voice. A skill marketplace with ratings isn't vetted. A slop score of 2 isn't clean. A rule written in a document isn't enforced. A system with 110 skills isn't necessarily more capable than one with 93.

Every one of these failures shared the same structure: something looked right, felt right, and passed the "does this seem reasonable?" test. The failures only surfaced when someone (usually me, sometimes a reader, once because I happened to request the actual LinkedIn PDF) checked against the source instead of the representation.

February's lesson, if I had to compress it into one rule: verify against the source, not the summary. And if you can't make yourself do that consistently, build a hook that does it for you.


The Numbers

Metric Value Why It Matters
Captures logged 48 Raw material for this newsletter — roughly 1.7 per day
Skills before audit 110 System had accumulated more tools than it could efficiently hold in context
Skills after audit 93 17 fewer entries in the system prompt injection list
Wisdom skill consolidation 837 lines to 160 Same functionality, 81% less context overhead
Fabricated features found on pricing page 8 Every non-volume feature was fiction
Files corrected after search snippet error 3 The wrong information had already propagated
AntiSlop patterns missed on first pass 7 Detection tool's own coverage gap

Next

Can verification hooks scale? The URL verification hook works for links, but the same problem exists for statistics, dates, feature claims, and people's credentials. Each category needs its own source-of-truth file. How many canonical reference files before the cure is worse than the disease?

Voice profiles from speech data — I rebuilt one from a single conversation transcript. What happens with five transcripts? Ten? Does the profile converge on something stable, or does it keep shifting with each new sample?

LLO as a new audit framework — just discovered Large Language Model Optimization this week. The structured data gaps it revealed across three sites are concrete and fixable. But I haven't tested whether fixing them actually changes how LLMs reference those sites. That's the experiment for March.

The trust model problem has no solution yet. I wrote guardrails. One of them (creation-guard) caught a duplicate build that would have wasted a week. None of them would catch a malicious hook. What would?

Context window as a finite resource — after the skill consolidation, sessions run noticeably longer before hitting limits. But "noticeably" isn't a number. Need to actually measure session lengths before and after to see if the improvement is real or just confirmation bias.


Second Brain Chronicles is a free newsletter documenting the evolution of an AI-augmented second brain. If something here made you think differently about your own tools, that's the point.

Jim Christian

I test AI tools, build real workflows, and share what's worth your time. Newsletter, field guides, and courses — everything based on what I've shipped, not what I've read about.

Read more from Jim Christian

Your Vault Has a Shape Folder structure, daily notes, and the one file that teaches Claude everything about your system. Last issue, you installed Obsidian and Claude Code, then had your first conversation where Claude could see your files, create new ones, and answer questions about what's there. Right now though, Claude knows almost nothing about what you're building. Every time you start a new session, it reads the files in your folder and... that's it. No context about what the vault is...

Starting From Scratch Install the tools. Have your first conversation. This is where every AI second brain begins. You've seen the demos. Someone on Twitter shows their AI reading their notes, managing their email, writing drafts in their voice, remembering decisions from three months ago. It looks like magic, and you want it. Then you try to build it and hit a wall. Where do you even start? What tools? What structure? The demos never show the beginning — only the result. This newsletter is...

Vol 2, Issue 6 | Feb 11 2026Three conversations you should be having with your AI tools Dear Reader, I had coffee with a friend this week — who’s trying to solve a practical problem. He needs a better way to track his calls, emails, and to-do items. His CRM is still in development, so right now it’s spreadsheets and memory. He has access to Microsoft Copilot. I gave him a few suggestions, but mostly I said: describe your problem. Tell it (Copilot) what the best outcome looks like for you....