vt-c-knowledge-index¶

Manage the knowledge index — generate summaries, update indexes, check health. Supports incremental updates, full rebuild, and health status reporting.

Plugin: core-standards
Category: Other
Command: /vt-c-knowledge-index

/vt-c-knowledge-index — Knowledge Index Manager¶

Generate and maintain the hierarchical knowledge index defined by SPEC-070. Creates per-document summaries, category indexes, and a top-level index for fast AI navigation of the knowledge base.

When to Use¶

After adding new documents to intake/knowledge/
To check index health and find missing summaries
To rebuild the entire index after schema changes
To index any arbitrary folder of markdown documents

Invocation¶

/vt-c-knowledge-index                           # Incremental update (default)
/vt-c-knowledge-index --rebuild                  # Full rebuild from scratch
/vt-c-knowledge-index --status                   # Health check only
/vt-c-knowledge-index /path/to/folder            # Index an arbitrary folder
/vt-c-knowledge-index /path/to/folder --rebuild  # Full rebuild of arbitrary folder

Execution Steps¶

Step 1: Parse Arguments¶

Read the invocation arguments: - Path: First non-flag argument. Default: intake/knowledge/ - Mode: - No flags → incremental (default) - --rebuild → full rebuild - --status → health check only

Display:

Knowledge Index Manager
═══════════════════════
Target: {path}
Mode:   {incremental | rebuild | status}

Step 2: Discover Structure¶

Determine if target path uses categories or is flat:

List subdirectories of the target path
For each subdirectory, check if it contains .md files (exclude hidden dirs like .summaries)
If at least one subdirectory contains .md files → category mode
If no subdirectories contain .md files → flat mode (documents directly in target path)

Display:

Structure: {category | flat}
Categories: {list of category names, or "flat (no categories)"}

Step 3: Discover Documents¶

For each category (or root path in flat mode): 1. List all .md files (exclude index.md) 2. Build a document inventory: {category}/{filename}.md

Display: Found {N} documents across {M} categories

Step 4: Determine Processing Scope¶

If --rebuild mode: All discovered documents need processing.

If --status mode: Skip to Step 4b (health report).

If incremental mode (default): For each document, check if processing is needed:

# Get source file modification time
source_mtime=$(stat -f %m "{path}/{category}/{filename}.md")

# Get summary file modification time (0 if not exists)
summary_file="{path}/{category}/.summaries/{filename_without_ext}.json"
if [ -f "$summary_file" ]; then
  summary_mtime=$(stat -f %m "$summary_file")
else
  summary_mtime=0
fi

# Process if source is newer or summary doesn't exist
if [ "$source_mtime" -gt "$summary_mtime" ]; then
  # needs processing
fi

Display:

Documents needing update: {N} of {total}
  {list of filenames if N > 0}

If N == 0: Display "Index is current. No updates needed." and exit.

Step 4b: Health Report (`--status` mode)¶

For each category: 1. Count .md files (excluding index.md) 2. Count .summaries/*.json files 3. Identify documents missing summaries (source exists, no summary) 4. Identify orphaned summaries (summary exists, no source)

Read generated_at from index.json if it exists.

Display:

Knowledge Index Health
══════════════════════════════════════════
Category      Docs  Indexed  Missing  Orphaned
──────────────────────────────────────────
patterns       10      10       0        0
tools           6       6       0        0
workflows       9       9       0        0
──────────────────────────────────────────
Total          25      25       0        0

Last updated: 2026-03-17T12:07:09Z
Status: ✓ Index is healthy

If missing > 0, list the filenames. If orphaned > 0, list the filenames. Exit after displaying report (no modifications).

Step 5: Generate Summaries¶

For each document needing processing:

5a: Read Document Content¶

Read the document file.

If the document exceeds 1000 lines (FR-5): - Read only the first 200 lines and last 50 lines - Note: word_count should still reflect the full document

5b: Parse Frontmatter¶

Extract YAML frontmatter fields: - title, author, date (or date_saved), tags, category, relevance - summary, deep_evaluated, evaluation_findings, proposals_generated - document_type (if present)

If no frontmatter found (EC-1 from SPEC-070): set frontmatter_present: false and extract title from first H1 heading.

5c: Detect Document Type¶

Transcript detection heuristic:

Read the first 100 lines of the document body (after frontmatter)
Search for speaker attribution patterns matching: **[ + timestamp + ] + name + :**
Pattern: lines containing text like **[12:34] Speaker Name:** or **[1:23:45] Speaker:**
Count the number of matching lines
If 3 or more matches → document_type: "transcript"
If fewer than 3 matches → document_type: "article" (default)

If transcript detected, extract:

participants: Collect unique speaker names from all **[MM:SS] Name:** patterns throughout the document
meeting_date: Use frontmatter date field, or parse from filename date prefix (YYYY-MM-DD)
projects_discussed: Scan for:
V-number references (e.g., V025, V004) → extract as project identifiers
H2/H3 headings that appear to be project names
Build array of {project, summary} objects

5d: Build Summary JSON¶

Construct the summary JSON per SPEC-070 schema:

{
  "version": "1.0",
  "generated_at": "{ISO-8601 timestamp}",
  "source_file": "../{filename}",
  "document_type": "{article|transcript}",
  "title": "{from frontmatter or H1}",
  "author": "{from frontmatter or empty}",
  "date_saved": "{from frontmatter date or date_saved}",
  "category": "{from path or frontmatter}",
  "relevance": "{from frontmatter or 'reference'}",
  "tags": ["{from frontmatter}"],
  "summary": "{from frontmatter}",
  "key_topics": ["{extracted from H2 headings, max 8}"],
  "key_findings": [],
  "entities_mentioned": [],
  "word_count": {count},
  "deep_evaluated": {from frontmatter or false},
  "evaluation_findings": {from frontmatter or 0},
  "proposals_generated": {from frontmatter or 0},
  "frontmatter_present": {true|false}
}

If document_type is "transcript", add extension fields:

{
  "meeting_date": "{YYYY-MM-DD}",
  "participants": ["Name1", "Name2"],
  "projects_discussed": [
    {"project": "V025", "summary": "..."}
  ]
}

5e: Write Summary¶

Create .summaries/ directory if it doesn't exist:
```
mkdir -p {path}/{category}/.summaries
```
Write the JSON to .summaries/{filename_without_ext}.json

Display: ✓ {filename} → .summaries/{filename_without_ext}.json

Step 6: Build Category Indexes¶

For each category that had documents processed (or all categories if --rebuild):

Read all .summaries/*.json files in the category

Build index.json:

{
  "version": "1.0",
  "generated_at": "{timestamp}",
  "category": "{name}",
  "description": "{category description}",
  "documents": [{sorted by date_saved descending}]
}

Category descriptions:

patterns → "Coding patterns, skill patterns, agent patterns, workflow patterns"
tools → "Tool configurations, CLI techniques, IDE integrations, MCP servers"
workflows → "Process improvements, methodology articles, development workflows"
Other → "{category} documents"

Build index.md:

# {Category Title}

> Auto-generated. Do not edit manually. Last updated: {timestamp}

| Date | Title | Tags | Evaluated |
|------|-------|------|-----------|
| ... | ... | ... | Yes/No |

Flat mode alternative: If flat mode (Step 2), skip category indexes entirely. Generate a single index.json and index.md at the target path root containing all documents directly (no categories wrapper):

{
  "version": "1.0",
  "generated_at": "{timestamp}",
  "total_documents": {N},
  "documents": [{all documents sorted by date}]
}

Step 7: Build Top-Level Index¶

Category mode only (skip in flat mode — Step 6 already generated the single index):

Read each {category}/index.json
For each category, compute:
Document count
Top 3 tags by frequency
Most recent document filename

Build index.json:

{
  "version": "1.0",
  "generated_at": "{timestamp}",
  "total_documents": {sum},
  "categories": {
    "{name}": {"count": N, "description": "...", "top_tags": [...], "recent": [...]}
  }
}

Build index.md:

# Knowledge Index

> Auto-generated. Do not edit manually. Last updated: {timestamp}

| Category | Documents | Top Tags |
|----------|-----------|----------|
| [patterns](patterns/index.md) | 10 | tag1, tag2, tag3 |
| ... |

**Total**: {N} documents

Step 8: Stage Generated Files (FR-14)¶

Stage all generated/modified files:

git add {path}/**/index.json {path}/**/index.md {path}/**/.summaries/*.json

If any .gitkeep files were created in new .summaries/ directories, stage those too.

Step 9: Report¶

Display completion summary:

Knowledge Index Updated
═══════════════════════════════════════════
Summaries:  {N} generated ({M} skipped — up to date)
Categories: {K} indexes rebuilt
Top-level:  ✓ Updated ({total} documents)

Files staged for commit.
═══════════════════════════════════════════

Edge Cases¶

ID	Edge Case	Handling
EC-1	Source document has no frontmatter	Set `frontmatter_present: false`, extract title from H1
EC-2	Empty category directory	Generate empty index with `"documents": []`
EC-3	Summary generation fails for one document	Log warning, skip, continue with remaining
EC-4	Non-markdown files in directory	Skip; only `.md` files get summaries
EC-5	`index.json` exists but is invalid JSON	Overwrite with freshly generated index
EC-6	Document moved between categories	Old summary becomes orphaned; `--rebuild` cleans it up. `--status` reports it
EC-7	`.summaries/` directory does not exist	Create it on first summary write (Step 5e)
EC-8	Transcript with no identifiable projects	Set `projects_discussed: []`; still mark as `document_type: "transcript"`
EC-9	Flat folder (no categories)	Single-level index only (Step 6 flat mode)

Integration Points¶

Skill	Relationship
`/inbox-qualify`	Step 5d calls this skill's logic after routing a document
`/content-evaluate`	Step 7c updates existing summaries with evaluation metadata
`scripts/generate-knowledge-index.sh`	Seed script from SPEC-070; `--rebuild` can delegate to it

vt-c-knowledge-index¶

/vt-c-knowledge-index — Knowledge Index Manager¶

When to Use¶

Invocation¶

Execution Steps¶

Step 1: Parse Arguments¶

Step 2: Discover Structure¶

Step 3: Discover Documents¶

Step 4: Determine Processing Scope¶

Step 4b: Health Report (--status mode)¶

Step 5: Generate Summaries¶

5a: Read Document Content¶

5b: Parse Frontmatter¶

5c: Detect Document Type¶

5d: Build Summary JSON¶

5e: Write Summary¶

Step 6: Build Category Indexes¶

Step 7: Build Top-Level Index¶

Step 8: Stage Generated Files (FR-14)¶

Step 9: Report¶

Edge Cases¶

Integration Points¶

Step 4b: Health Report (`--status` mode)¶