vt-c-knowledge-index¶
Manage the knowledge index — generate summaries, update indexes, check health. Supports incremental updates, full rebuild, and health status reporting.
Plugin: core-standards
Category: Other
Command: /vt-c-knowledge-index
/vt-c-knowledge-index — Knowledge Index Manager¶
Generate and maintain the hierarchical knowledge index defined by SPEC-070. Creates per-document summaries, category indexes, and a top-level index for fast AI navigation of the knowledge base.
When to Use¶
- After adding new documents to
intake/knowledge/ - To check index health and find missing summaries
- To rebuild the entire index after schema changes
- To index any arbitrary folder of markdown documents
Invocation¶
/vt-c-knowledge-index # Incremental update (default)
/vt-c-knowledge-index --rebuild # Full rebuild from scratch
/vt-c-knowledge-index --status # Health check only
/vt-c-knowledge-index /path/to/folder # Index an arbitrary folder
/vt-c-knowledge-index /path/to/folder --rebuild # Full rebuild of arbitrary folder
Execution Steps¶
Step 1: Parse Arguments¶
Read the invocation arguments:
- Path: First non-flag argument. Default: intake/knowledge/
- Mode:
- No flags → incremental (default)
- --rebuild → full rebuild
- --status → health check only
Display:
Knowledge Index Manager
═══════════════════════
Target: {path}
Mode: {incremental | rebuild | status}
Step 2: Discover Structure¶
Determine if target path uses categories or is flat:
- List subdirectories of the target path
- For each subdirectory, check if it contains
.mdfiles (exclude hidden dirs like.summaries) - If at least one subdirectory contains
.mdfiles → category mode - If no subdirectories contain
.mdfiles → flat mode (documents directly in target path)
Display:
Step 3: Discover Documents¶
For each category (or root path in flat mode):
1. List all .md files (exclude index.md)
2. Build a document inventory: {category}/{filename}.md
Display: Found {N} documents across {M} categories
Step 4: Determine Processing Scope¶
If --rebuild mode:
All discovered documents need processing.
If --status mode:
Skip to Step 4b (health report).
If incremental mode (default): For each document, check if processing is needed:
# Get source file modification time
source_mtime=$(stat -f %m "{path}/{category}/{filename}.md")
# Get summary file modification time (0 if not exists)
summary_file="{path}/{category}/.summaries/{filename_without_ext}.json"
if [ -f "$summary_file" ]; then
summary_mtime=$(stat -f %m "$summary_file")
else
summary_mtime=0
fi
# Process if source is newer or summary doesn't exist
if [ "$source_mtime" -gt "$summary_mtime" ]; then
# needs processing
fi
Display:
If N == 0: Display "Index is current. No updates needed." and exit.
Step 4b: Health Report (--status mode)¶
For each category:
1. Count .md files (excluding index.md)
2. Count .summaries/*.json files
3. Identify documents missing summaries (source exists, no summary)
4. Identify orphaned summaries (summary exists, no source)
Read generated_at from index.json if it exists.
Display:
Knowledge Index Health
══════════════════════════════════════════
Category Docs Indexed Missing Orphaned
──────────────────────────────────────────
patterns 10 10 0 0
tools 6 6 0 0
workflows 9 9 0 0
──────────────────────────────────────────
Total 25 25 0 0
Last updated: 2026-03-17T12:07:09Z
Status: ✓ Index is healthy
If missing > 0, list the filenames. If orphaned > 0, list the filenames. Exit after displaying report (no modifications).
Step 5: Generate Summaries¶
For each document needing processing:
5a: Read Document Content¶
If the document exceeds 1000 lines (FR-5):
- Read only the first 200 lines and last 50 lines
- Note: word_count should still reflect the full document
5b: Parse Frontmatter¶
Extract YAML frontmatter fields:
- title, author, date (or date_saved), tags, category, relevance
- summary, deep_evaluated, evaluation_findings, proposals_generated
- document_type (if present)
If no frontmatter found (EC-1 from SPEC-070): set frontmatter_present: false and extract title from first H1 heading.
5c: Detect Document Type¶
Transcript detection heuristic:
- Read the first 100 lines of the document body (after frontmatter)
- Search for speaker attribution patterns matching:
**[+ timestamp +]+ name +:** - Pattern: lines containing text like
**[12:34] Speaker Name:**or**[1:23:45] Speaker:** - Count the number of matching lines
- If 3 or more matches →
document_type: "transcript" - If fewer than 3 matches →
document_type: "article"(default)
If transcript detected, extract:
participants: Collect unique speaker names from all**[MM:SS] Name:**patterns throughout the documentmeeting_date: Use frontmatterdatefield, or parse from filename date prefix (YYYY-MM-DD)projects_discussed: Scan for:- V-number references (e.g.,
V025,V004) → extract as project identifiers - H2/H3 headings that appear to be project names
- Build array of
{project, summary}objects
5d: Build Summary JSON¶
Construct the summary JSON per SPEC-070 schema:
{
"version": "1.0",
"generated_at": "{ISO-8601 timestamp}",
"source_file": "../{filename}",
"document_type": "{article|transcript}",
"title": "{from frontmatter or H1}",
"author": "{from frontmatter or empty}",
"date_saved": "{from frontmatter date or date_saved}",
"category": "{from path or frontmatter}",
"relevance": "{from frontmatter or 'reference'}",
"tags": ["{from frontmatter}"],
"summary": "{from frontmatter}",
"key_topics": ["{extracted from H2 headings, max 8}"],
"key_findings": [],
"entities_mentioned": [],
"word_count": {count},
"deep_evaluated": {from frontmatter or false},
"evaluation_findings": {from frontmatter or 0},
"proposals_generated": {from frontmatter or 0},
"frontmatter_present": {true|false}
}
If document_type is "transcript", add extension fields:
{
"meeting_date": "{YYYY-MM-DD}",
"participants": ["Name1", "Name2"],
"projects_discussed": [
{"project": "V025", "summary": "..."}
]
}
5e: Write Summary¶
- Create
.summaries/directory if it doesn't exist: - Write the JSON to
.summaries/{filename_without_ext}.json
Display: ✓ {filename} → .summaries/{filename_without_ext}.json
Step 6: Build Category Indexes¶
For each category that had documents processed (or all categories if --rebuild):
- Read all
.summaries/*.jsonfiles in the category - Build
index.json: Category descriptions: patterns→ "Coding patterns, skill patterns, agent patterns, workflow patterns"tools→ "Tool configurations, CLI techniques, IDE integrations, MCP servers"workflows→ "Process improvements, methodology articles, development workflows"-
Other → "{category} documents"
-
Build
index.md:
Flat mode alternative: If flat mode (Step 2), skip category indexes entirely. Generate a single index.json and index.md at the target path root containing all documents directly (no categories wrapper):
{
"version": "1.0",
"generated_at": "{timestamp}",
"total_documents": {N},
"documents": [{all documents sorted by date}]
}
Step 7: Build Top-Level Index¶
Category mode only (skip in flat mode — Step 6 already generated the single index):
- Read each
{category}/index.json - For each category, compute:
- Document count
- Top 3 tags by frequency
- Most recent document filename
- Build
index.json: - Build
index.md:
Step 8: Stage Generated Files (FR-14)¶
Stage all generated/modified files:
If any .gitkeep files were created in new .summaries/ directories, stage those too.
Step 9: Report¶
Display completion summary:
Knowledge Index Updated
═══════════════════════════════════════════
Summaries: {N} generated ({M} skipped — up to date)
Categories: {K} indexes rebuilt
Top-level: ✓ Updated ({total} documents)
Files staged for commit.
═══════════════════════════════════════════
Edge Cases¶
| ID | Edge Case | Handling |
|---|---|---|
| EC-1 | Source document has no frontmatter | Set frontmatter_present: false, extract title from H1 |
| EC-2 | Empty category directory | Generate empty index with "documents": [] |
| EC-3 | Summary generation fails for one document | Log warning, skip, continue with remaining |
| EC-4 | Non-markdown files in directory | Skip; only .md files get summaries |
| EC-5 | index.json exists but is invalid JSON |
Overwrite with freshly generated index |
| EC-6 | Document moved between categories | Old summary becomes orphaned; --rebuild cleans it up. --status reports it |
| EC-7 | .summaries/ directory does not exist |
Create it on first summary write (Step 5e) |
| EC-8 | Transcript with no identifiable projects | Set projects_discussed: []; still mark as document_type: "transcript" |
| EC-9 | Flat folder (no categories) | Single-level index only (Step 6 flat mode) |
Integration Points¶
| Skill | Relationship |
|---|---|
/inbox-qualify |
Step 5d calls this skill's logic after routing a document |
/content-evaluate |
Step 7c updates existing summaries with evaluation metadata |
scripts/generate-knowledge-index.sh |
Seed script from SPEC-070; --rebuild can delegate to it |