vt-c-research-ingest¶
Scan knowledge folder and repos for new research content, analyze for toolkit relevance, and generate improvement proposals in Obsidian.
Plugin: core-standards
Category: Research
Command: /vt-c-research-ingest
Research Ingestion¶
Automatically scan your research folders for new content and generate toolkit improvement proposals.
Invocation¶
/vt-c-research-ingest # Scan up to 10 files, skip PDFs (recommended)
/vt-c-research-ingest --quick # Quick scan, summary only, no file reads
/vt-c-research-ingest --since 7d # Only items from last 7 days
/vt-c-research-ingest --include-pdfs # Include PDF transcripts (slower, more context)
/vt-c-research-ingest --batch 5 # Process only 5 files (for limited context)
Context Management:
- Default processes max 10 files per run to prevent "Prompt is too long" errors
- Run multiple times to process large backlogs
- Use --batch 5 if still hitting context limits
- PDFs are skipped by default (use --include-pdfs to include)
Configuration¶
Folders to Monitor¶
knowledge_folder: "ONEDRIVE_ROOT/02-Projekte/V025-VisiTrans kann AI/01-Knowledge"
repos_folder: "REPOS_ROOT/01-claude code"
Output Locations¶
state_file: "~/.claude/research-ingestion/state.yaml"
proposals_folder: "TOOLKIT_ROOT/intake/pending/from-research"
daily_notes_folder: "~/Documents/Obsidian Vault/Daily notes"
Execution Instructions¶
Step 1: Initialize State¶
- Check if
~/.claude/research-ingestion/state.yamlexists - If missing, create with empty state:
- Load existing state if present
Step 2: Scan Knowledge Folder (With Limits)¶
Scan the knowledge folder for new or modified files with batching to prevent context overflow:
# Find files modified since last scan - LIMIT TO 10 FILES PER RUN
find "$KNOWLEDGE_FOLDER" -type f \( -name "*.md" -o -name "*.pdf" \) -newer "$STATE_FILE" | head -10
CRITICAL: Context Management
- Maximum 10 files per run to prevent "Prompt is too long" error
- Skip PDFs in default mode - Only process PDFs with --include-pdfs flag
- If more than 10 files found: Process 10, save remaining count, notify user to run again
- Markdown priority: Process .md files before .pdf files
Target directories to prioritize (in order):
1. 04-Dokumente/02-claude code/ - Claude Code articles (highest priority)
2. 04-Dokumente/agents/ - Agent definitions
3. 04-Dokumente/bmad-core/ - BMAD framework components
4. 06-Notes/ - Daily notes (lower priority)
5. 02-Meetings_Transkripte_Notizen/ - Meeting transcripts (lowest - large PDFs)
File patterns to detect:
- YYYY-MM-DD - *.md - Articles with date prefix
- YYYYMMDD_*.md - Summaries and notes
- *.pdf - Meeting transcripts (skipped unless --include-pdfs)
Step 2b: Scan Inbox Directory¶
Check intake/inbox/ for unprocessed markdown files that haven't been qualified yet:
- Glob for
intake/inbox/*.md(excluding.gitkeep) - For each file found, check if it's already in
state.knowledge.processed_files(compare by path) - Add new (unprocessed) files to the analysis queue alongside knowledge folder items
- Mark source as
"inbox"in the scan results to distinguish from knowledge folder items
Note: Items in intake/inbox/ are raw drops — they may lack frontmatter, dates, or structure. Apply the same lightweight reading strategy as Step 4 (first 50 + last 20 lines) during analysis.
Tip: For dedicated inbox triage with interactive routing, use /vt-c-inbox-qualify instead. This step provides a passive fallback that includes inbox items in the regular research scan.
Step 3: Scan Repos Folder¶
Check for new repositories or significant updates:
# List all git repos
for dir in "$REPOS_FOLDER"/*/; do
if [ -d "$dir/.git" ]; then
repo_name=$(basename "$dir")
last_commit=$(git -C "$dir" log -1 --format="%H" 2>/dev/null)
# Compare with stored state
fi
done
Check for: - New repositories (not in state.known_repos) - Repos with new commits since last scan - Changes to CLAUDE.md, AGENTS.md, or plugin.json
Step 4: Analyze New Content (Context-Aware)¶
CRITICAL: Prevent context overflow by reading strategically
For each new/modified item, analyze for toolkit relevance using lightweight reading:
Markdown Articles Analysis (Max 200 lines per file)¶
1. Read FIRST 50 LINES to get title, summary, and key topics
2. Read LAST 20 LINES to get conclusions/takeaways
3. Grep for keywords: "skill", "agent", "workflow", "pattern", "best practice"
4. Only read full file if keywords found AND file < 500 lines
Identify from partial read:
- Main topic/technique described (from title/summary)
- Keywords suggesting toolkit relevance
- Skip if clearly irrelevant (meeting notes, unrelated topics)
Compare to existing toolkit categories (don't list all 37 agents inline):
- Security-related? → Check security agents
- Workflow-related? → Check workflow skills
- Code review? → Check reviewer agents
Generate proposal if relevant.
Repository Analysis (Targeted reads only)¶
For new or updated repos:
1. FIRST: Check if .git exists and get repo name
2. Read ONLY README.md first 100 lines
3. Check if CLAUDE.md exists (don't read yet)
4. Check if skills/ or agents/ directory exists
5. Only deep-read if potential toolkit relevance found
Skip repos that are clearly:
- Archives (last commit > 90 days)
- Unrelated projects (no CLAUDE.md, no agents/)
- Already processed (same commit hash in state)
Generate proposal if relevant.
PDF Transcript Analysis (SKIP BY DEFAULT)¶
PDFs are skipped by default to prevent context overflow.
With --include-pdfs flag:
- Read only the first 5 pages
- Extract only: action items, decisions, tool mentions
- Skip detailed transcriptions
Check if any decisions affect toolkit workflows.
Step 5: Generate Proposals¶
For each relevant finding, create a proposal with structured implementation details:
## Proposal N: [Title] [PRIORITY]
**Source**: [file path or repo name]
**Finding**: [What was discovered]
**Type**: [new-skill | modify-skill | new-agent | modify-agent | new-command | documentation]
**Suggestion**:
- [Specific action 1]
- [Specific action 2]
**Affected Workflow**: [Development/Knowledge Work/Both]
**Implementation Details**:
```yaml
type: [new-skill | modify-skill | new-agent | new-command | documentation]
name: vt-c-research-ingest
location: [path/to/file.md]
source_reference: [path to source material]
dependencies:
- modify: [path/to/dependency.md]
Action: [ ] Review [ ] Implement [ ] Skip [ ] Completed
**Type Assignment:**
- **new-skill**: Create new skill folder with SKILL.md
- **modify-skill**: Update existing skill
- **new-agent**: Create new agent definition
- **modify-agent**: Update existing agent
- **new-command**: Create new slash command
- **documentation**: Update README or docs
**Priority Assignment:**
- **HIGH**: Security, performance, or core workflow improvements
- **MEDIUM**: New skills/agents that fill gaps
- **LOW**: Nice-to-have enhancements, documentation
**Implementation Details Block:**
The YAML block enables `/vt-c-research-implement` to automatically execute approved proposals.
### Step 6: Write Proposals to Toolkit Intake
1. Create dated proposals file in the toolkit's intake directory:
```
TOOLKIT_ROOT/intake/pending/from-research/YYYY-MM-DD-proposals.md
```
2. Write proposals in standard format:
```markdown
# Research Proposals - YYYY-MM-DD
## Summary
- **New items analyzed**: N
- **Proposals generated**: M
- **High priority**: X
---
[Proposals...]
```
### Step 7: Update Daily Note
1. Find today's daily note:
```
~/Documents/Obsidian Vault/Daily notes/YYYY-MM-DD.md
```
2. **If daily note doesn't exist, create from template**:
```markdown
## Tasks for today
### VisiTrans
- [ ] task
### WP
- [ ] task
### eparo
- [ ] task
### personal/family
- [ ] task
## Alte tasks
```tasks
not done
path does not include 99_Templates/Vorlage_Notes
path does not include YYYY-MM-DD
group by path
```
## 🗓️ Meetings Today
```
3. **Carry forward incomplete todos from previous day**:
- Find yesterday's daily note (YYYY-MM-DD minus 1 day)
- If it exists, extract all uncompleted tasks from `## Tasks for today` section
- Parse each subsection (VisiTrans, WP, eparo, personal/family)
- Extract lines starting with `- [ ]` (incomplete tasks)
- Prepend these to the corresponding subsections in today's note
- Skip if subsection already has non-placeholder tasks
- Example logic:
```
If today's VisiTrans section only has "- [ ] task" placeholder:
Replace with incomplete tasks from yesterday's VisiTrans section
Else:
Keep today's existing tasks (user already added tasks)
```
4. **Clean up Meetings section**:
- Find the `## 🗓️ Meetings Today` section
- Remove any error messages (lines containing `❌ Error:`)
- Remove duplicate meeting entries (keep the first occurrence of each meeting time)
- Ensure proper spacing: 2 blank lines after the section header if no meetings
- Keep clean meeting entries in format: `**HH:MM-HH:MM** [[Meeting Title]] ([[Attendee1]], [[Attendee2]])`
5. **Preserve existing content and templates**:
- Read the entire file content first
- Do NOT modify or replace any existing template sections
- Look for existing `### Research` section anywhere in the file
6. **Add/Update Research section**:
- If `### Research` section exists, update it in place
- If it doesn't exist, **append to the bottom of the file** after all existing content
- Ensure there's a blank line before the Research section
7. Append or update with:
```markdown
### Research
📚 **N new proposals** from research ingestion
- Proposals written to: `V025-claude-toolkit/intake/pending/from-research/YYYY-MM-DD-proposals.md`
- High priority: X
- Sources: Y articles, Z repos
- Process with: `/vt-c-toolkit-review` or `/vt-c-research-implement`
```
**Important**:
- This skill must work with Obsidian template systems (Templater, core templates)
- Never replace the file content - only append to the bottom or update the Research section in place
- When carrying forward todos, respect existing user-added tasks (don't overwrite them)
- The "Alte tasks" query block already shows all incomplete tasks from other days, so only carry forward from the immediate previous day into the "Tasks for today" section
- Clean up the Meetings section by removing error messages and duplicates before adding Research section
### Step 8: Update State
Save the new state:
```yaml
last_run: "YYYY-MM-DDTHH:MM:SS"
knowledge:
last_scan: "YYYY-MM-DDTHH:MM:SS"
processed_files:
- path: "path/to/file.md"
hash: "sha256hash"
processed_at: "YYYY-MM-DDTHH:MM:SS"
repos:
last_scan: "YYYY-MM-DDTHH:MM:SS"
known_repos:
- name: "repo-name"
last_commit: "commit-hash"
processed_at: "YYYY-MM-DDTHH:MM:SS"
Step 9: Display Summary¶
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Research Ingestion Complete
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scanned:
• Knowledge folder: N new/modified files
• Repos folder: M repos checked, X with updates
Generated:
• Proposals: Y (High: A, Medium: B, Low: C)
• Written to: ~/Documents/Obsidian Vault/Research Proposals/YYYY-MM-DD-proposals.md
Daily note updated with Research section.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Quick Mode (--quick)¶
When --quick flag is passed:
- Skip detailed analysis
- Only count new items
- Show summary without generating proposals
- Useful for checking if there's new content to review
Since Mode (--since Nd)¶
When --since 7d is passed:
- Only scan items modified in last N days
- Useful for catching up after a break
Toolkit Comparison Reference¶
DO NOT load full agent/skill lists into context. Instead, use targeted lookups:
# Find agents by category
Glob: ~/.claude/plugins/company-claude-toolkit/**/agents/**/*.md
# Find skills by name pattern
Glob: ~/.claude/skills/**/SKILL.md
# Check if specific capability exists
Grep: pattern="security|authentication" path=~/.claude/skills/
Toolkit Categories (for quick mental reference only)¶
- Orchestrators: 5 (conceptual, implementation, deployment, bugfix, incident)
- Security: 2 agents
- Review: 14 specialized reviewers
- Workflow: Development (0-6), Knowledge Work (kw-*)
- Research: research-ingest, research-implement, content-evaluate
Total: ~37 agents, ~50 skills. Use Glob/Grep to check specifics.
Next Step: Implement Approved Proposals¶
After reviewing proposals in Obsidian:
1. Check [x] Implement on approved proposals
2. Run /vt-c-research-implement to execute them
3. Proposals are marked [x] Completed when done
See /vt-c-research-implement for details.
For deeper gap analysis of knowledge articles against the toolkit inventory, run /vt-c-content-evaluate --scan-knowledge.
Gaps to Look For¶
- Missing specialized reviewers (new languages, frameworks)
- Missing workflow automations
- Missing integrations (tools, services)
- Pattern improvements from community
Troubleshooting¶
"Prompt is too long" Error¶
This error means the combined skill instructions + file contents exceeded Claude's context window.
Immediate fixes:
1. Run /vt-c-research-ingest --batch 5 to process fewer files
2. Run /vt-c-research-ingest --quick for summary only
3. Run /vt-c-research-ingest --since 3d to limit to recent files
If error persists:
1. Check how many files are pending: find "$KNOWLEDGE_FOLDER" -type f -name "*.md" -newer ~/.claude/research-ingestion/state.yaml | wc -l
2. If > 20 files pending, run multiple batches
3. Avoid --include-pdfs until backlog is cleared
Prevention:
- Run /vt-c-research-ingest daily or every few days
- Large backlogs (50+ files) require multiple runs
- PDFs consume 10x more context than markdown
Open Brain Capture (Optional)¶
After processing research documents, if the capture_thought MCP tool is available, capture findings to Open Brain.
When to capture: After each processed research document that produces a proposal or evaluation.
How:
1. Check if capture_thought tool is available. If not: skip silently.
2. If no proposals were generated: skip capture.
3. Call capture_thought with:
thought: "Research finding: {document title}. Source: {file path}. Assessment: {relevant/not relevant}. Proposals generated: {count}. Key insight: {1-2 sentence summary}."
This step is always last — it never interrupts the research-ingest workflow.