Skip to content

vt-c-knowledge-index

Manage the knowledge index — generate summaries, update indexes, check health. Supports incremental updates, full rebuild, and health status reporting.

Plugin: core-standards
Category: Other
Command: /vt-c-knowledge-index


/vt-c-knowledge-index — Knowledge Index Manager

Generate and maintain the hierarchical knowledge index defined by SPEC-070. Creates per-document summaries, category indexes, and a top-level index for fast AI navigation of the knowledge base.

When to Use

  • After adding new documents to intake/knowledge/
  • To check index health and find missing summaries
  • To rebuild the entire index after schema changes
  • To index any arbitrary folder of markdown documents

Invocation

/vt-c-knowledge-index                           # Incremental update (default)
/vt-c-knowledge-index --rebuild                  # Full rebuild from scratch
/vt-c-knowledge-index --status                   # Health check only
/vt-c-knowledge-index /path/to/folder            # Index an arbitrary folder
/vt-c-knowledge-index /path/to/folder --rebuild  # Full rebuild of arbitrary folder

Execution Steps

Step 1: Parse Arguments

Read the invocation arguments: - Path: First non-flag argument. Default: intake/knowledge/ - Mode: - No flags → incremental (default) - --rebuild → full rebuild - --status → health check only

Display:

Knowledge Index Manager
═══════════════════════
Target: {path}
Mode:   {incremental | rebuild | status}

Step 2: Discover Structure

Determine if target path uses categories or is flat:

  1. List subdirectories of the target path
  2. For each subdirectory, check if it contains .md files (exclude hidden dirs like .summaries)
  3. If at least one subdirectory contains .md files → category mode
  4. If no subdirectories contain .md files → flat mode (documents directly in target path)

Display:

Structure: {category | flat}
Categories: {list of category names, or "flat (no categories)"}

Step 3: Discover Documents

For each category (or root path in flat mode): 1. List all .md files (exclude index.md) 2. Build a document inventory: {category}/{filename}.md

Display: Found {N} documents across {M} categories

Step 4: Determine Processing Scope

If --rebuild mode: All discovered documents need processing.

If --status mode: Skip to Step 4b (health report).

If incremental mode (default): For each document, check if processing is needed:

# Get source file modification time
source_mtime=$(stat -f %m "{path}/{category}/{filename}.md")

# Get summary file modification time (0 if not exists)
summary_file="{path}/{category}/.summaries/{filename_without_ext}.json"
if [ -f "$summary_file" ]; then
  summary_mtime=$(stat -f %m "$summary_file")
else
  summary_mtime=0
fi

# Process if source is newer or summary doesn't exist
if [ "$source_mtime" -gt "$summary_mtime" ]; then
  # needs processing
fi

Display:

Documents needing update: {N} of {total}
  {list of filenames if N > 0}

If N == 0: Display "Index is current. No updates needed." and exit.

Step 4b: Health Report (--status mode)

For each category: 1. Count .md files (excluding index.md) 2. Count .summaries/*.json files 3. Identify documents missing summaries (source exists, no summary) 4. Identify orphaned summaries (summary exists, no source)

Read generated_at from index.json if it exists.

Display:

Knowledge Index Health
══════════════════════════════════════════
Category      Docs  Indexed  Missing  Orphaned
──────────────────────────────────────────
patterns       10      10       0        0
tools           6       6       0        0
workflows       9       9       0        0
──────────────────────────────────────────
Total          25      25       0        0

Last updated: 2026-03-17T12:07:09Z
Status: ✓ Index is healthy

If missing > 0, list the filenames. If orphaned > 0, list the filenames. Exit after displaying report (no modifications).

Step 5: Generate Summaries

For each document needing processing:

5a: Read Document Content

Read the document file.

If the document exceeds 1000 lines (FR-5): - Read only the first 200 lines and last 50 lines - Note: word_count should still reflect the full document

5b: Parse Frontmatter

Extract YAML frontmatter fields: - title, author, date (or date_saved), tags, category, relevance - summary, deep_evaluated, evaluation_findings, proposals_generated - document_type (if present)

If no frontmatter found (EC-1 from SPEC-070): set frontmatter_present: false and extract title from first H1 heading.

5c: Detect Document Type

Transcript detection heuristic:

  1. Read the first 100 lines of the document body (after frontmatter)
  2. Search for speaker attribution patterns matching: **[ + timestamp + ] + name + :**
  3. Pattern: lines containing text like **[12:34] Speaker Name:** or **[1:23:45] Speaker:**
  4. Count the number of matching lines
  5. If 3 or more matchesdocument_type: "transcript"
  6. If fewer than 3 matches → document_type: "article" (default)

If transcript detected, extract:

  • participants: Collect unique speaker names from all **[MM:SS] Name:** patterns throughout the document
  • meeting_date: Use frontmatter date field, or parse from filename date prefix (YYYY-MM-DD)
  • projects_discussed: Scan for:
  • V-number references (e.g., V025, V004) → extract as project identifiers
  • H2/H3 headings that appear to be project names
  • Build array of {project, summary} objects

5d: Build Summary JSON

Construct the summary JSON per SPEC-070 schema:

{
  "version": "1.0",
  "generated_at": "{ISO-8601 timestamp}",
  "source_file": "../{filename}",
  "document_type": "{article|transcript}",
  "title": "{from frontmatter or H1}",
  "author": "{from frontmatter or empty}",
  "date_saved": "{from frontmatter date or date_saved}",
  "category": "{from path or frontmatter}",
  "relevance": "{from frontmatter or 'reference'}",
  "tags": ["{from frontmatter}"],
  "summary": "{from frontmatter}",
  "key_topics": ["{extracted from H2 headings, max 8}"],
  "key_findings": [],
  "entities_mentioned": [],
  "word_count": {count},
  "deep_evaluated": {from frontmatter or false},
  "evaluation_findings": {from frontmatter or 0},
  "proposals_generated": {from frontmatter or 0},
  "frontmatter_present": {true|false}
}

If document_type is "transcript", add extension fields:

{
  "meeting_date": "{YYYY-MM-DD}",
  "participants": ["Name1", "Name2"],
  "projects_discussed": [
    {"project": "V025", "summary": "..."}
  ]
}

5e: Write Summary

  1. Create .summaries/ directory if it doesn't exist:
    mkdir -p {path}/{category}/.summaries
    
  2. Write the JSON to .summaries/{filename_without_ext}.json

Display: ✓ {filename} → .summaries/{filename_without_ext}.json

Step 6: Build Category Indexes

For each category that had documents processed (or all categories if --rebuild):

  1. Read all .summaries/*.json files in the category
  2. Build index.json:
    {
      "version": "1.0",
      "generated_at": "{timestamp}",
      "category": "{name}",
      "description": "{category description}",
      "documents": [{sorted by date_saved descending}]
    }
    
    Category descriptions:
  3. patterns → "Coding patterns, skill patterns, agent patterns, workflow patterns"
  4. tools → "Tool configurations, CLI techniques, IDE integrations, MCP servers"
  5. workflows → "Process improvements, methodology articles, development workflows"
  6. Other → "{category} documents"

  7. Build index.md:

    # {Category Title}
    
    > Auto-generated. Do not edit manually. Last updated: {timestamp}
    
    | Date | Title | Tags | Evaluated |
    |------|-------|------|-----------|
    | ... | ... | ... | Yes/No |
    

Flat mode alternative: If flat mode (Step 2), skip category indexes entirely. Generate a single index.json and index.md at the target path root containing all documents directly (no categories wrapper):

{
  "version": "1.0",
  "generated_at": "{timestamp}",
  "total_documents": {N},
  "documents": [{all documents sorted by date}]
}

Step 7: Build Top-Level Index

Category mode only (skip in flat mode — Step 6 already generated the single index):

  1. Read each {category}/index.json
  2. For each category, compute:
  3. Document count
  4. Top 3 tags by frequency
  5. Most recent document filename
  6. Build index.json:
    {
      "version": "1.0",
      "generated_at": "{timestamp}",
      "total_documents": {sum},
      "categories": {
        "{name}": {"count": N, "description": "...", "top_tags": [...], "recent": [...]}
      }
    }
    
  7. Build index.md:
    # Knowledge Index
    
    > Auto-generated. Do not edit manually. Last updated: {timestamp}
    
    | Category | Documents | Top Tags |
    |----------|-----------|----------|
    | [patterns](patterns/index.md) | 10 | tag1, tag2, tag3 |
    | ... |
    
    **Total**: {N} documents
    

Step 8: Stage Generated Files (FR-14)

Stage all generated/modified files:

git add {path}/**/index.json {path}/**/index.md {path}/**/.summaries/*.json

If any .gitkeep files were created in new .summaries/ directories, stage those too.

Step 9: Report

Display completion summary:

Knowledge Index Updated
═══════════════════════════════════════════
Summaries:  {N} generated ({M} skipped — up to date)
Categories: {K} indexes rebuilt
Top-level:  ✓ Updated ({total} documents)

Files staged for commit.
═══════════════════════════════════════════

Edge Cases

ID Edge Case Handling
EC-1 Source document has no frontmatter Set frontmatter_present: false, extract title from H1
EC-2 Empty category directory Generate empty index with "documents": []
EC-3 Summary generation fails for one document Log warning, skip, continue with remaining
EC-4 Non-markdown files in directory Skip; only .md files get summaries
EC-5 index.json exists but is invalid JSON Overwrite with freshly generated index
EC-6 Document moved between categories Old summary becomes orphaned; --rebuild cleans it up. --status reports it
EC-7 .summaries/ directory does not exist Create it on first summary write (Step 5e)
EC-8 Transcript with no identifiable projects Set projects_discussed: []; still mark as document_type: "transcript"
EC-9 Flat folder (no categories) Single-level index only (Step 6 flat mode)

Integration Points

Skill Relationship
/inbox-qualify Step 5d calls this skill's logic after routing a document
/content-evaluate Step 7c updates existing summaries with evaluation metadata
scripts/generate-knowledge-index.sh Seed script from SPEC-070; --rebuild can delegate to it