Skip to content

vt-c-research-ingest

Scan knowledge folder and repos for new research content, analyze for toolkit relevance, and generate improvement proposals in Obsidian.

Plugin: core-standards
Category: Research
Command: /vt-c-research-ingest


Research Ingestion

Automatically scan your research folders for new content and generate toolkit improvement proposals.

Invocation

/vt-c-research-ingest              # Scan up to 10 files, skip PDFs (recommended)
/vt-c-research-ingest --quick      # Quick scan, summary only, no file reads
/vt-c-research-ingest --since 7d   # Only items from last 7 days
/vt-c-research-ingest --include-pdfs  # Include PDF transcripts (slower, more context)
/vt-c-research-ingest --batch 5    # Process only 5 files (for limited context)

Context Management: - Default processes max 10 files per run to prevent "Prompt is too long" errors - Run multiple times to process large backlogs - Use --batch 5 if still hitting context limits - PDFs are skipped by default (use --include-pdfs to include)

Configuration

Folders to Monitor

knowledge_folder: "ONEDRIVE_ROOT/02-Projekte/V025-VisiTrans kann AI/01-Knowledge"
repos_folder: "REPOS_ROOT/01-claude code"

Output Locations

state_file: "~/.claude/research-ingestion/state.yaml"
proposals_folder: "TOOLKIT_ROOT/intake/pending/from-research"
daily_notes_folder: "~/Documents/Obsidian Vault/Daily notes"

Execution Instructions

Step 1: Initialize State

  1. Check if ~/.claude/research-ingestion/state.yaml exists
  2. If missing, create with empty state:
    last_run: null
    knowledge:
      last_scan: null
      processed_files: []
    repos:
      last_scan: null
      known_repos: []
    
  3. Load existing state if present

Step 2: Scan Knowledge Folder (With Limits)

Scan the knowledge folder for new or modified files with batching to prevent context overflow:

# Find files modified since last scan - LIMIT TO 10 FILES PER RUN
find "$KNOWLEDGE_FOLDER" -type f \( -name "*.md" -o -name "*.pdf" \) -newer "$STATE_FILE" | head -10

CRITICAL: Context Management - Maximum 10 files per run to prevent "Prompt is too long" error - Skip PDFs in default mode - Only process PDFs with --include-pdfs flag - If more than 10 files found: Process 10, save remaining count, notify user to run again - Markdown priority: Process .md files before .pdf files

Target directories to prioritize (in order): 1. 04-Dokumente/02-claude code/ - Claude Code articles (highest priority) 2. 04-Dokumente/agents/ - Agent definitions 3. 04-Dokumente/bmad-core/ - BMAD framework components 4. 06-Notes/ - Daily notes (lower priority) 5. 02-Meetings_Transkripte_Notizen/ - Meeting transcripts (lowest - large PDFs)

File patterns to detect: - YYYY-MM-DD - *.md - Articles with date prefix - YYYYMMDD_*.md - Summaries and notes - *.pdf - Meeting transcripts (skipped unless --include-pdfs)

Step 2b: Scan Inbox Directory

Check intake/inbox/ for unprocessed markdown files that haven't been qualified yet:

  1. Glob for intake/inbox/*.md (excluding .gitkeep)
  2. For each file found, check if it's already in state.knowledge.processed_files (compare by path)
  3. Add new (unprocessed) files to the analysis queue alongside knowledge folder items
  4. Mark source as "inbox" in the scan results to distinguish from knowledge folder items
# Find unprocessed inbox items
ls intake/inbox/*.md 2>/dev/null | grep -v .gitkeep

Note: Items in intake/inbox/ are raw drops — they may lack frontmatter, dates, or structure. Apply the same lightweight reading strategy as Step 4 (first 50 + last 20 lines) during analysis.

Tip: For dedicated inbox triage with interactive routing, use /vt-c-inbox-qualify instead. This step provides a passive fallback that includes inbox items in the regular research scan.

Step 3: Scan Repos Folder

Check for new repositories or significant updates:

# List all git repos
for dir in "$REPOS_FOLDER"/*/; do
  if [ -d "$dir/.git" ]; then
    repo_name=$(basename "$dir")
    last_commit=$(git -C "$dir" log -1 --format="%H" 2>/dev/null)
    # Compare with stored state
  fi
done

Check for: - New repositories (not in state.known_repos) - Repos with new commits since last scan - Changes to CLAUDE.md, AGENTS.md, or plugin.json

Step 4: Analyze New Content (Context-Aware)

CRITICAL: Prevent context overflow by reading strategically

For each new/modified item, analyze for toolkit relevance using lightweight reading:

Markdown Articles Analysis (Max 200 lines per file)

1. Read FIRST 50 LINES to get title, summary, and key topics
2. Read LAST 20 LINES to get conclusions/takeaways
3. Grep for keywords: "skill", "agent", "workflow", "pattern", "best practice"
4. Only read full file if keywords found AND file < 500 lines

Identify from partial read:
- Main topic/technique described (from title/summary)
- Keywords suggesting toolkit relevance
- Skip if clearly irrelevant (meeting notes, unrelated topics)

Compare to existing toolkit categories (don't list all 37 agents inline):
- Security-related? → Check security agents
- Workflow-related? → Check workflow skills
- Code review? → Check reviewer agents

Generate proposal if relevant.

Repository Analysis (Targeted reads only)

For new or updated repos:
1. FIRST: Check if .git exists and get repo name
2. Read ONLY README.md first 100 lines
3. Check if CLAUDE.md exists (don't read yet)
4. Check if skills/ or agents/ directory exists
5. Only deep-read if potential toolkit relevance found

Skip repos that are clearly:
- Archives (last commit > 90 days)
- Unrelated projects (no CLAUDE.md, no agents/)
- Already processed (same commit hash in state)

Generate proposal if relevant.

PDF Transcript Analysis (SKIP BY DEFAULT)

PDFs are skipped by default to prevent context overflow.

With --include-pdfs flag:
- Read only the first 5 pages
- Extract only: action items, decisions, tool mentions
- Skip detailed transcriptions

Check if any decisions affect toolkit workflows.

Step 5: Generate Proposals

For each relevant finding, create a proposal with structured implementation details:

## Proposal N: [Title] [PRIORITY]

**Source**: [file path or repo name]

**Finding**: [What was discovered]

**Type**: [new-skill | modify-skill | new-agent | modify-agent | new-command | documentation]

**Suggestion**:
- [Specific action 1]
- [Specific action 2]

**Affected Workflow**: [Development/Knowledge Work/Both]

**Implementation Details**:
```yaml
type: [new-skill | modify-skill | new-agent | new-command | documentation]
name: vt-c-research-ingest
location: [path/to/file.md]
source_reference: [path to source material]
dependencies:
  - modify: [path/to/dependency.md]

Action: [ ] Review [ ] Implement [ ] Skip [ ] Completed

**Type Assignment:**
- **new-skill**: Create new skill folder with SKILL.md
- **modify-skill**: Update existing skill
- **new-agent**: Create new agent definition
- **modify-agent**: Update existing agent
- **new-command**: Create new slash command
- **documentation**: Update README or docs

**Priority Assignment:**
- **HIGH**: Security, performance, or core workflow improvements
- **MEDIUM**: New skills/agents that fill gaps
- **LOW**: Nice-to-have enhancements, documentation

**Implementation Details Block:**
The YAML block enables `/vt-c-research-implement` to automatically execute approved proposals.

### Step 6: Write Proposals to Toolkit Intake

1. Create dated proposals file in the toolkit's intake directory:
   ```
   TOOLKIT_ROOT/intake/pending/from-research/YYYY-MM-DD-proposals.md
   ```

2. Write proposals in standard format:
   ```markdown
   # Research Proposals - YYYY-MM-DD

   ## Summary
   - **New items analyzed**: N
   - **Proposals generated**: M
   - **High priority**: X

   ---

   [Proposals...]
   ```

### Step 7: Update Daily Note

1. Find today's daily note:
   ```
   ~/Documents/Obsidian Vault/Daily notes/YYYY-MM-DD.md
   ```

2. **If daily note doesn't exist, create from template**:
   ```markdown
   ## Tasks for today
   ### VisiTrans
   - [ ]  task

   ### WP
   - [ ]  task

   ### eparo
   - [ ] task

   ### personal/family
   - [ ] task

   ## Alte tasks
   ```tasks
     not done
     path does not include 99_Templates/Vorlage_Notes
     path does not include YYYY-MM-DD
     group by path
   ```


   ## 🗓️ Meetings Today


   ```

3. **Carry forward incomplete todos from previous day**:
   - Find yesterday's daily note (YYYY-MM-DD minus 1 day)
   - If it exists, extract all uncompleted tasks from `## Tasks for today` section
   - Parse each subsection (VisiTrans, WP, eparo, personal/family)
   - Extract lines starting with `- [ ]` (incomplete tasks)
   - Prepend these to the corresponding subsections in today's note
   - Skip if subsection already has non-placeholder tasks
   - Example logic:
     ```
     If today's VisiTrans section only has "- [ ] task" placeholder:
       Replace with incomplete tasks from yesterday's VisiTrans section
     Else:
       Keep today's existing tasks (user already added tasks)
     ```

4. **Clean up Meetings section**:
   - Find the `## 🗓️ Meetings Today` section
   - Remove any error messages (lines containing `❌ Error:`)
   - Remove duplicate meeting entries (keep the first occurrence of each meeting time)
   - Ensure proper spacing: 2 blank lines after the section header if no meetings
   - Keep clean meeting entries in format: `**HH:MM-HH:MM** [[Meeting Title]] ([[Attendee1]], [[Attendee2]])`

5. **Preserve existing content and templates**:
   - Read the entire file content first
   - Do NOT modify or replace any existing template sections
   - Look for existing `### Research` section anywhere in the file

6. **Add/Update Research section**:
   - If `### Research` section exists, update it in place
   - If it doesn't exist, **append to the bottom of the file** after all existing content
   - Ensure there's a blank line before the Research section

7. Append or update with:
   ```markdown

   ### Research
   📚 **N new proposals** from research ingestion
   - Proposals written to: `V025-claude-toolkit/intake/pending/from-research/YYYY-MM-DD-proposals.md`
   - High priority: X
   - Sources: Y articles, Z repos
   - Process with: `/vt-c-toolkit-review` or `/vt-c-research-implement`
   ```

**Important**:
- This skill must work with Obsidian template systems (Templater, core templates)
- Never replace the file content - only append to the bottom or update the Research section in place
- When carrying forward todos, respect existing user-added tasks (don't overwrite them)
- The "Alte tasks" query block already shows all incomplete tasks from other days, so only carry forward from the immediate previous day into the "Tasks for today" section
- Clean up the Meetings section by removing error messages and duplicates before adding Research section

### Step 8: Update State

Save the new state:
```yaml
last_run: "YYYY-MM-DDTHH:MM:SS"
knowledge:
  last_scan: "YYYY-MM-DDTHH:MM:SS"
  processed_files:
    - path: "path/to/file.md"
      hash: "sha256hash"
      processed_at: "YYYY-MM-DDTHH:MM:SS"
repos:
  last_scan: "YYYY-MM-DDTHH:MM:SS"
  known_repos:
    - name: "repo-name"
      last_commit: "commit-hash"
      processed_at: "YYYY-MM-DDTHH:MM:SS"

Step 9: Display Summary

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Research Ingestion Complete
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Scanned:
• Knowledge folder: N new/modified files
• Repos folder: M repos checked, X with updates

Generated:
• Proposals: Y (High: A, Medium: B, Low: C)
• Written to: ~/Documents/Obsidian Vault/Research Proposals/YYYY-MM-DD-proposals.md

Daily note updated with Research section.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Quick Mode (--quick)

When --quick flag is passed: - Skip detailed analysis - Only count new items - Show summary without generating proposals - Useful for checking if there's new content to review

Since Mode (--since Nd)

When --since 7d is passed: - Only scan items modified in last N days - Useful for catching up after a break

Toolkit Comparison Reference

DO NOT load full agent/skill lists into context. Instead, use targeted lookups:

# Find agents by category
Glob: ~/.claude/plugins/company-claude-toolkit/**/agents/**/*.md

# Find skills by name pattern
Glob: ~/.claude/skills/**/SKILL.md

# Check if specific capability exists
Grep: pattern="security|authentication" path=~/.claude/skills/

Toolkit Categories (for quick mental reference only)

  • Orchestrators: 5 (conceptual, implementation, deployment, bugfix, incident)
  • Security: 2 agents
  • Review: 14 specialized reviewers
  • Workflow: Development (0-6), Knowledge Work (kw-*)
  • Research: research-ingest, research-implement, content-evaluate

Total: ~37 agents, ~50 skills. Use Glob/Grep to check specifics.

Next Step: Implement Approved Proposals

After reviewing proposals in Obsidian: 1. Check [x] Implement on approved proposals 2. Run /vt-c-research-implement to execute them 3. Proposals are marked [x] Completed when done

See /vt-c-research-implement for details.

For deeper gap analysis of knowledge articles against the toolkit inventory, run /vt-c-content-evaluate --scan-knowledge.

Gaps to Look For

  • Missing specialized reviewers (new languages, frameworks)
  • Missing workflow automations
  • Missing integrations (tools, services)
  • Pattern improvements from community

Troubleshooting

"Prompt is too long" Error

This error means the combined skill instructions + file contents exceeded Claude's context window.

Immediate fixes: 1. Run /vt-c-research-ingest --batch 5 to process fewer files 2. Run /vt-c-research-ingest --quick for summary only 3. Run /vt-c-research-ingest --since 3d to limit to recent files

If error persists: 1. Check how many files are pending: find "$KNOWLEDGE_FOLDER" -type f -name "*.md" -newer ~/.claude/research-ingestion/state.yaml | wc -l 2. If > 20 files pending, run multiple batches 3. Avoid --include-pdfs until backlog is cleared

Prevention: - Run /vt-c-research-ingest daily or every few days - Large backlogs (50+ files) require multiple runs - PDFs consume 10x more context than markdown

Open Brain Capture (Optional)

After processing research documents, if the capture_thought MCP tool is available, capture findings to Open Brain.

When to capture: After each processed research document that produces a proposal or evaluation.

How: 1. Check if capture_thought tool is available. If not: skip silently. 2. If no proposals were generated: skip capture. 3. Call capture_thought with:

thought: "Research finding: {document title}. Source: {file path}. Assessment: {relevant/not relevant}. Proposals generated: {count}. Key insight: {1-2 sentence summary}."
4. On timeout or error: log debug message and continue. Never fail the skill.

This step is always last — it never interrupts the research-ingest workflow.