VibecoderMcSwaggins commited on
Commit
8d97867
·
1 Parent(s): dacd086

docs: Remove outdated architecture and implementation documents

Browse files

- Delete obsolete files related to the dual-mode architecture plan, including situation analysis, architecture specification, implementation phases, immediate actions, follow-up review request, and senior agent review prompt.
- These documents are no longer relevant following the recent architectural decisions and updates to the project structure.

docs/decisions/architecture-2025-11/00_SITUATION_AND_PLAN.md DELETED
@@ -1,189 +0,0 @@
1
- # Situation Analysis: Pydantic-AI + Microsoft Agent Framework Integration
2
-
3
- **Date:** November 27, 2025
4
- **Status:** ACTIVE DECISION REQUIRED
5
- **Risk Level:** HIGH - DO NOT MERGE PR #41 UNTIL RESOLVED
6
-
7
- ---
8
-
9
- ## 1. The Problem
10
-
11
- We almost merged a refactor that would have **deleted** multi-agent orchestration capability from the codebase, mistakenly believing pydantic-ai and Microsoft Agent Framework were mutually exclusive.
12
-
13
- **They are not.** They are complementary:
14
- - **pydantic-ai** (Library): Ensures LLM outputs match Pydantic schemas
15
- - **Microsoft Agent Framework** (Framework): Orchestrates multi-agent workflows
16
-
17
- ---
18
-
19
- ## 2. Current Branch State
20
-
21
- | Branch | Location | Has Agent Framework? | Has Pydantic-AI Improvements? | Status |
22
- |--------|----------|---------------------|------------------------------|--------|
23
- | `origin/dev` | GitHub | YES | NO | **SAFE - Source of Truth** |
24
- | `huggingface-upstream/dev` | HF Spaces | YES | NO | **SAFE - Same as GitHub** |
25
- | `origin/main` | GitHub | YES | NO | **SAFE** |
26
- | `feat/pubmed-fulltext` | GitHub | NO (deleted) | YES | **DANGER - Has destructive refactor** |
27
- | `refactor/pydantic-unification` | Local | NO (deleted) | YES | **DANGER - Redundant, delete** |
28
- | Local `dev` | Local only | NO (deleted) | YES | **DANGER - NOT PUSHED (thankfully)** |
29
-
30
- ### Key Files at Risk
31
-
32
- **On `origin/dev` (PRESERVED):**
33
- ```text
34
- src/agents/
35
- ├── analysis_agent.py # StatisticalAnalyzer wrapper
36
- ├── hypothesis_agent.py # Hypothesis generation
37
- ├── judge_agent.py # JudgeHandler wrapper
38
- ├── magentic_agents.py # Multi-agent definitions
39
- ├── report_agent.py # Report synthesis
40
- ├── search_agent.py # SearchHandler wrapper
41
- ├── state.py # Thread-safe state management
42
- └── tools.py # @ai_function decorated tools
43
-
44
- src/orchestrator_magentic.py # Multi-agent orchestrator
45
- src/utils/llm_factory.py # Centralized LLM client factory
46
- ```
47
-
48
- **Deleted in refactor branch (would be lost if merged):**
49
- - All of the above
50
-
51
- ---
52
-
53
- ## 3. Target Architecture
54
-
55
- ```text
56
- ┌─────────────────────────────────────────────────────────────────┐
57
- │ Microsoft Agent Framework (Orchestration Layer) │
58
- │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
59
- │ │ SearchAgent │→ │ JudgeAgent │→ │ ReportAgent │ │
60
- │ │ (BaseAgent) │ │ (BaseAgent) │ │ (BaseAgent) │ │
61
- │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
62
- │ │ │ │ │
63
- │ ▼ ▼ ▼ │
64
- │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
65
- │ │ pydantic-ai │ │ pydantic-ai │ │ pydantic-ai │ │
66
- │ │ Agent() │ │ Agent() │ │ Agent() │ │
67
- │ │ output_type= │ │ output_type= │ │ output_type= │ │
68
- │ │ SearchResult │ │ JudgeAssess │ │ Report │ │
69
- │ └──────────────┘ └──────────────┘ └──────────────┘ │
70
- └─────────────────────────────────────────────────────────────────┘
71
- ```
72
-
73
- **Why this architecture:**
74
- 1. **Agent Framework** handles: workflow coordination, state passing, middleware, observability
75
- 2. **pydantic-ai** handles: type-safe LLM calls within each agent
76
-
77
- ---
78
-
79
- ## 4. CRITICAL: Naming Confusion Clarification
80
-
81
- > **Senior Agent Review Finding:** The codebase uses "magentic" in file names (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT** the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework (`agent-framework-core`).
82
-
83
- **The naming confusion:**
84
- - `magentic` (PyPI package): A different library for structured LLM outputs
85
- - "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
86
- - `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
87
-
88
- **Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py` to eliminate confusion.
89
-
90
- ---
91
-
92
- ## 5. What the Refactor DID Get Right
93
-
94
- The refactor branch (`feat/pubmed-fulltext`) has some valuable improvements:
95
-
96
- 1. **`judges.py` unified `get_model()`** - Supports OpenAI, Anthropic, AND HuggingFace via pydantic-ai
97
- 2. **HuggingFace free tier support** - `HuggingFaceModel` integration
98
- 3. **Test fix** - Properly mocks `HuggingFaceModel` class
99
- 4. **Removed broken magentic optional dependency** from pyproject.toml (this was correct - the old `magentic` package is different from Microsoft Agent Framework)
100
-
101
- **What it got WRONG:**
102
- 1. Deleted `src/agents/` entirely instead of refactoring them
103
- 2. Deleted `src/orchestrator_magentic.py` instead of fixing it
104
- 3. Conflated "magentic" (old package) with "Microsoft Agent Framework" (current framework)
105
-
106
- ---
107
-
108
- ## 6. Options for Path Forward
109
-
110
- ### Option A: Abandon Refactor, Start Fresh
111
- - Close PR #41
112
- - Delete `feat/pubmed-fulltext` and `refactor/pydantic-unification` branches
113
- - Reset local `dev` to match `origin/dev`
114
- - Cherry-pick ONLY the good parts (judges.py improvements, HF support)
115
- - **Pros:** Clean, safe
116
- - **Cons:** Lose some work, need to redo carefully
117
-
118
- ### Option B: Cherry-Pick Good Parts to origin/dev
119
- - Do NOT merge PR #41
120
- - Create new branch from `origin/dev`
121
- - Cherry-pick specific commits/changes that improve pydantic-ai usage
122
- - Keep agent framework code intact
123
- - **Pros:** Preserves both, surgical
124
- - **Cons:** Requires careful file-by-file review
125
-
126
- ### Option C: Revert Deletions in Refactor Branch
127
- - On `feat/pubmed-fulltext`, restore deleted agent files from `origin/dev`
128
- - Keep the pydantic-ai improvements
129
- - Merge THAT to dev
130
- - **Pros:** Gets both
131
- - **Cons:** Complex git operations, risk of conflicts
132
-
133
- ---
134
-
135
- ## 7. Recommended Action: Option B (Cherry-Pick)
136
-
137
- **Step-by-step:**
138
-
139
- 1. **Close PR #41** (do not merge)
140
- 2. **Delete redundant branches:**
141
- - `refactor/pydantic-unification` (local)
142
- - Reset local `dev` to `origin/dev`
143
- 3. **Create new branch from origin/dev:**
144
- ```bash
145
- git checkout -b feat/pydantic-ai-improvements origin/dev
146
- ```
147
- 4. **Cherry-pick or manually port these improvements:**
148
- - `src/agent_factory/judges.py` - the unified `get_model()` function
149
- - `examples/free_tier_demo.py` - HuggingFace demo
150
- - Test improvements
151
- 5. **Do NOT delete any agent framework files**
152
- 6. **Create PR for review**
153
-
154
- ---
155
-
156
- ## 8. Files to Cherry-Pick (Safe Improvements)
157
-
158
- | File | What Changed | Safe to Port? |
159
- |------|-------------|---------------|
160
- | `src/agent_factory/judges.py` | Added `HuggingFaceModel` support in `get_model()` | YES |
161
- | `examples/free_tier_demo.py` | New demo for HF inference | YES |
162
- | `tests/unit/agent_factory/test_judges.py` | Fixed HF model mocking | YES |
163
- | `pyproject.toml` | Removed old `magentic` optional dep | MAYBE (review carefully) |
164
-
165
- ---
166
-
167
- ## 9. Questions to Answer Before Proceeding
168
-
169
- 1. **For the hackathon**: Do we need full multi-agent orchestration, or is single-agent sufficient?
170
- 2. **For DeepBoner mainline**: Is the plan to use Microsoft Agent Framework for orchestration?
171
- 3. **Timeline**: How much time do we have to get this right?
172
-
173
- ---
174
-
175
- ## 10. Immediate Actions (DO NOW)
176
-
177
- - [ ] **DO NOT merge PR #41**
178
- - [ ] Close PR #41 with comment explaining the situation
179
- - [ ] Do not push local `dev` branch anywhere
180
- - [ ] Confirm HuggingFace Spaces is untouched (it is - verified)
181
-
182
- ---
183
-
184
- ## 11. Decision Log
185
-
186
- | Date | Decision | Rationale |
187
- |------|----------|-----------|
188
- | 2025-11-27 | Pause refactor merge | Discovered agent framework and pydantic-ai are complementary, not exclusive |
189
- | TBD | ? | Awaiting decision on path forward |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/decisions/architecture-2025-11/01_ARCHITECTURE_SPEC.md DELETED
@@ -1,289 +0,0 @@
1
- # Architecture Specification: Dual-Mode Agent System
2
-
3
- **Date:** November 27, 2025
4
- **Status:** SPECIFICATION
5
- **Goal:** Graceful degradation from full multi-agent orchestration to simple single-agent mode
6
-
7
- ---
8
-
9
- ## 1. Core Concept: Two Operating Modes
10
-
11
- ```text
12
- ┌─────────────────────────────────────────────────────────────────────┐
13
- │ USER REQUEST │
14
- │ │ │
15
- │ ▼ │
16
- │ ┌─────────────────┐ │
17
- │ │ Mode Selection │ │
18
- │ │ (Auto-detect) │ │
19
- │ └────────┬────────┘ │
20
- │ │ │
21
- │ ┌───────────────┴───────────────┐ │
22
- │ │ │ │
23
- │ ▼ ▼ │
24
- │ ┌─────────────────┐ ┌─────────────────┐ │
25
- │ │ SIMPLE MODE │ │ ADVANCED MODE │ │
26
- │ │ (Free Tier) │ │ (Paid Tier) │ │
27
- │ │ │ │ │ │
28
- │ │ pydantic-ai │ │ MS Agent Fwk │ │
29
- │ │ single-agent │ │ + pydantic-ai │ │
30
- │ │ loop │ │ multi-agent │ │
31
- │ └─────────────────┘ └─────────────────┘ │
32
- │ │ │ │
33
- │ └───────────────┬───────────────┘ │
34
- │ ▼ │
35
- │ ┌─────────────────┐ │
36
- │ │ Research Report │ │
37
- │ │ with Citations │ │
38
- │ └─────────────────┘ │
39
- └─────────────────────────────────────────────────────────────────────┘
40
- ```
41
-
42
- ---
43
-
44
- ## 2. Mode Comparison
45
-
46
- | Aspect | Simple Mode | Advanced Mode |
47
- |--------|-------------|---------------|
48
- | **Trigger** | No API key OR `LLM_PROVIDER=huggingface` | OpenAI API key present (currently OpenAI only) |
49
- | **Framework** | pydantic-ai only | Microsoft Agent Framework + pydantic-ai |
50
- | **Architecture** | Single orchestrator loop | Multi-agent coordination |
51
- | **Agents** | One agent does Search→Judge→Report | SearchAgent, JudgeAgent, ReportAgent, AnalysisAgent |
52
- | **State Management** | Simple dict | Thread-safe `MagenticState` with context vars |
53
- | **Quality** | Good (functional) | Better (specialized agents, coordination) |
54
- | **Cost** | Free (HuggingFace Inference) | Paid (OpenAI/Anthropic) |
55
- | **Use Case** | Demos, hackathon, budget-constrained | Production, research quality |
56
-
57
- ---
58
-
59
- ## 3. Simple Mode Architecture (pydantic-ai Only)
60
-
61
- ```text
62
- ┌─────────────────────────────────────────────────────┐
63
- │ Orchestrator │
64
- │ │
65
- │ while not sufficient and iteration < max: │
66
- │ 1. SearchHandler.execute(query) │
67
- │ 2. JudgeHandler.assess(evidence) ◄── pydantic-ai Agent │
68
- │ 3. if sufficient: break │
69
- │ 4. query = judge.next_queries │
70
- │ │
71
- │ return ReportGenerator.generate(evidence) │
72
- └─────────────────────────────────────────────────────┘
73
- ```
74
-
75
- **Components:**
76
- - `src/orchestrator.py` - Simple loop orchestrator
77
- - `src/agent_factory/judges.py` - JudgeHandler with pydantic-ai
78
- - `src/tools/search_handler.py` - Scatter-gather search
79
- - `src/tools/pubmed.py`, `clinicaltrials.py`, `europepmc.py` - Search tools
80
-
81
- ---
82
-
83
- ## 4. Advanced Mode Architecture (MS Agent Framework + pydantic-ai)
84
-
85
- ```text
86
- ┌─────────────────────────────────────────────────────────────────────┐
87
- │ Microsoft Agent Framework Orchestrator │
88
- │ │
89
- │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
90
- │ │ SearchAgent │───▶│ JudgeAgent │───▶│ ReportAgent │ │
91
- │ │ (BaseAgent) │ │ (BaseAgent) │ │ (BaseAgent) │ │
92
- │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
93
- │ │ │ │ │
94
- │ ▼ ▼ ▼ │
95
- │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
96
- │ │ pydantic-ai │ │ pydantic-ai │ │ pydantic-ai │ │
97
- │ │ Agent() │ │ Agent() │ │ Agent() │ │
98
- │ │ output_type=│ │ output_type=│ │ output_type=│ │
99
- │ │ SearchResult│ │ JudgeAssess │ │ Report │ │
100
- │ └─────────────┘ └─────────────┘ └─────────────┘ │
101
- │ │
102
- │ Shared State: MagenticState (thread-safe via contextvars) │
103
- │ - evidence: list[Evidence] │
104
- │ - embedding_service: EmbeddingService │
105
- └─────────────────────────────────────────────────────────────────────┘
106
- ```
107
-
108
- **Components:**
109
- - `src/orchestrator_magentic.py` - Multi-agent orchestrator
110
- - `src/agents/search_agent.py` - SearchAgent (BaseAgent)
111
- - `src/agents/judge_agent.py` - JudgeAgent (BaseAgent)
112
- - `src/agents/report_agent.py` - ReportAgent (BaseAgent)
113
- - `src/agents/analysis_agent.py` - AnalysisAgent (BaseAgent)
114
- - `src/agents/state.py` - Thread-safe state management
115
- - `src/agents/tools.py` - @ai_function decorated tools
116
-
117
- ---
118
-
119
- ## 5. Mode Selection Logic
120
-
121
- ```python
122
- # src/orchestrator_factory.py (actual implementation)
123
-
124
- def create_orchestrator(
125
- search_handler: SearchHandlerProtocol | None = None,
126
- judge_handler: JudgeHandlerProtocol | None = None,
127
- config: OrchestratorConfig | None = None,
128
- mode: Literal["simple", "magentic", "advanced"] | None = None,
129
- ) -> Any:
130
- """
131
- Auto-select orchestrator based on available credentials.
132
-
133
- Priority:
134
- 1. If mode explicitly set, use that
135
- 2. If OpenAI key available -> Advanced Mode (currently OpenAI only)
136
- 3. Otherwise -> Simple Mode (HuggingFace free tier)
137
- """
138
- effective_mode = _determine_mode(mode)
139
-
140
- if effective_mode == "advanced":
141
- orchestrator_cls = _get_magentic_orchestrator_class()
142
- return orchestrator_cls(max_rounds=config.max_iterations if config else 10)
143
-
144
- # Simple mode requires handlers
145
- if search_handler is None or judge_handler is None:
146
- raise ValueError("Simple mode requires search_handler and judge_handler")
147
-
148
- return Orchestrator(
149
- search_handler=search_handler,
150
- judge_handler=judge_handler,
151
- config=config,
152
- )
153
- ```
154
-
155
- ---
156
-
157
- ## 6. Shared Components (Both Modes Use)
158
-
159
- These components work in both modes:
160
-
161
- | Component | Purpose |
162
- |-----------|---------|
163
- | `src/tools/pubmed.py` | PubMed search |
164
- | `src/tools/clinicaltrials.py` | ClinicalTrials.gov search |
165
- | `src/tools/europepmc.py` | Europe PMC search |
166
- | `src/tools/search_handler.py` | Scatter-gather orchestration |
167
- | `src/tools/rate_limiter.py` | Rate limiting |
168
- | `src/utils/models.py` | Evidence, Citation, JudgeAssessment |
169
- | `src/utils/config.py` | Settings |
170
- | `src/services/embeddings.py` | Vector search (optional) |
171
-
172
- ---
173
-
174
- ## 7. pydantic-ai Integration Points
175
-
176
- Both modes use pydantic-ai for structured LLM outputs:
177
-
178
- ```python
179
- # In JudgeHandler (both modes)
180
- from pydantic_ai import Agent
181
- from pydantic_ai.models.huggingface import HuggingFaceModel
182
- from pydantic_ai.models.openai import OpenAIModel
183
- from pydantic_ai.models.anthropic import AnthropicModel
184
-
185
- class JudgeHandler:
186
- def __init__(self, model: Any = None):
187
- self.model = model or get_model() # Auto-selects based on config
188
- self.agent = Agent(
189
- model=self.model,
190
- output_type=JudgeAssessment, # Structured output!
191
- system_prompt=SYSTEM_PROMPT,
192
- )
193
-
194
- async def assess(self, question: str, evidence: list[Evidence]) -> JudgeAssessment:
195
- result = await self.agent.run(format_prompt(question, evidence))
196
- return result.output # Guaranteed to be JudgeAssessment
197
- ```
198
-
199
- ---
200
-
201
- ## 8. Microsoft Agent Framework Integration Points
202
-
203
- Advanced mode wraps pydantic-ai agents in BaseAgent:
204
-
205
- ```python
206
- # In JudgeAgent (advanced mode only)
207
- from agent_framework import BaseAgent, AgentRunResponse, ChatMessage, Role
208
-
209
- class JudgeAgent(BaseAgent):
210
- def __init__(self, judge_handler: JudgeHandlerProtocol):
211
- super().__init__(
212
- name="JudgeAgent",
213
- description="Evaluates evidence quality",
214
- )
215
- self._handler = judge_handler # Uses pydantic-ai internally
216
-
217
- async def run(self, messages, **kwargs) -> AgentRunResponse:
218
- question = extract_question(messages)
219
- evidence = self._evidence_store.get("current", [])
220
-
221
- # Delegate to pydantic-ai powered handler
222
- assessment = await self._handler.assess(question, evidence)
223
-
224
- return AgentRunResponse(
225
- messages=[ChatMessage(role=Role.ASSISTANT, text=format_response(assessment))],
226
- additional_properties={"assessment": assessment.model_dump()},
227
- )
228
- ```
229
-
230
- ---
231
-
232
- ## 9. Benefits of This Architecture
233
-
234
- 1. **Graceful Degradation**: Works without API keys (free tier)
235
- 2. **Progressive Enhancement**: Better with API keys (orchestration)
236
- 3. **Code Reuse**: pydantic-ai handlers shared between modes
237
- 4. **Hackathon Ready**: Demo works without requiring paid keys
238
- 5. **Production Ready**: Full orchestration available when needed
239
- 6. **Future Proof**: Can add more agents to advanced mode
240
- 7. **Testable**: Simple mode is easier to unit test
241
-
242
- ---
243
-
244
- ## 10. Known Risks and Mitigations
245
-
246
- > **From Senior Agent Review**
247
-
248
- ### 10.1 Bridge Complexity (MEDIUM)
249
-
250
- **Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai). Both are async. Context variables (`MagenticState`) must propagate correctly through the pydantic-ai call stack.
251
-
252
- **Mitigation:**
253
- - pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
254
- - Test context propagation explicitly in integration tests
255
- - If issues arise, pass state explicitly rather than via context vars
256
-
257
- ### 10.2 Integration Drift (MEDIUM)
258
-
259
- **Risk:** Simple Mode and Advanced Mode might diverge in behavior over time (e.g., Simple Mode uses logic A, Advanced Mode uses logic B).
260
-
261
- **Mitigation:**
262
- - Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
263
- - Handlers are the single source of truth for business logic
264
- - Agents are thin wrappers that delegate to handlers
265
-
266
- ### 10.3 Testing Burden (LOW-MEDIUM)
267
-
268
- **Risk:** Two distinct orchestrators (`src/orchestrator.py` and `src/orchestrator_magentic.py`) doubles integration testing surface area.
269
-
270
- **Mitigation:**
271
- - Unit test handlers independently (shared code)
272
- - Integration tests for each mode separately
273
- - End-to-end tests verify same output for same input (determinism permitting)
274
-
275
- ### 10.4 Dependency Conflicts (LOW)
276
-
277
- **Risk:** `agent-framework-core` might conflict with `pydantic-ai`'s dependencies (e.g., different pydantic versions).
278
-
279
- **Status:** Both use `pydantic>=2.x`. Should be compatible.
280
-
281
- ---
282
-
283
- ## 11. Naming Clarification
284
-
285
- > See `00_SITUATION_AND_PLAN.md` Section 4 for full details.
286
-
287
- **Important:** The codebase uses "magentic" in file names (`orchestrator_magentic.py`, `magentic_agents.py`) but this refers to our internal naming for Microsoft Agent Framework integration, **NOT** the `magentic` PyPI package.
288
-
289
- **Future action:** Rename to `orchestrator_advanced.py` to eliminate confusion.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/decisions/architecture-2025-11/02_IMPLEMENTATION_PHASES.md DELETED
@@ -1,112 +0,0 @@
1
- # Implementation Phases: Dual-Mode Agent System
2
-
3
- **Date:** November 27, 2025
4
- **Status:** IMPLEMENTATION PLAN (REVISED)
5
- **Strategy:** TDD (Test-Driven Development), SOLID Principles
6
- **Dependency Strategy:** PyPI (agent-framework-core)
7
-
8
- ---
9
-
10
- ## Phase 0: Environment Validation & Cleanup
11
-
12
- **Goal:** Ensure clean state and dependencies are correctly installed.
13
-
14
- ### Step 0.1: Verify PyPI Package
15
- The `agent-framework-core` package is published on PyPI by Microsoft. Verify installation:
16
-
17
- ```bash
18
- uv sync --all-extras
19
- python -c "from agent_framework import ChatAgent; print('OK')"
20
- ```
21
-
22
- ### Step 0.2: Branch State
23
- We are on `feat/dual-mode-architecture`. Ensure it is up to date with `origin/dev` before starting.
24
-
25
- **Note:** The `reference_repos/agent-framework` folder is kept for reference/documentation only.
26
- The production dependency uses the official PyPI release.
27
-
28
- ---
29
-
30
- ## Phase 1: Pydantic-AI Improvements (Simple Mode)
31
-
32
- **Goal:** Implement `HuggingFaceModel` support in `JudgeHandler` using strict TDD.
33
-
34
- ### Step 1.1: Test First (Red)
35
- Create `tests/unit/agent_factory/test_judges_factory.py`:
36
- - Test `get_model()` returns `HuggingFaceModel` when `LLM_PROVIDER=huggingface`.
37
- - Test `get_model()` respects `HF_TOKEN`.
38
- - Test fallback to OpenAI.
39
-
40
- ### Step 1.2: Implementation (Green)
41
- Update `src/utils/config.py`:
42
- - Add `huggingface_model` and `hf_token` fields.
43
-
44
- Update `src/agent_factory/judges.py`:
45
- - Implement `get_model` with the logic derived from the tests.
46
- - Use dependency injection for the model where possible.
47
-
48
- ### Step 1.3: Refactor
49
- Ensure `JudgeHandler` is loosely coupled from the specific model provider.
50
-
51
- ---
52
-
53
- ## Phase 2: Orchestrator Factory (The Switch)
54
-
55
- **Goal:** Implement the factory pattern to switch between Simple and Advanced modes.
56
-
57
- ### Step 2.1: Test First (Red)
58
- Create `tests/unit/test_orchestrator_factory.py`:
59
- - Test `create_orchestrator` returns `Orchestrator` (simple) when API keys are missing.
60
- - Test `create_orchestrator` returns `MagenticOrchestrator` (advanced) when OpenAI key exists.
61
- - Test explicit mode override.
62
-
63
- ### Step 2.2: Implementation (Green)
64
- Update `src/orchestrator_factory.py` to implement the selection logic.
65
-
66
- ---
67
-
68
- ## Phase 3: Agent Framework Integration (Advanced Mode)
69
-
70
- **Goal:** Integrate Microsoft Agent Framework from PyPI.
71
-
72
- ### Step 3.1: Dependency Management
73
- The `agent-framework-core` package is installed from PyPI:
74
- ```toml
75
- [project.optional-dependencies]
76
- magentic = [
77
- "agent-framework-core>=1.0.0b251120,<2.0.0", # Microsoft Agent Framework (PyPI)
78
- ]
79
- ```
80
- Install with: `uv sync --all-extras`
81
-
82
- ### Step 3.2: Verify Imports (Test First)
83
- Create `tests/unit/agents/test_agent_imports.py`:
84
- - Verify `from agent_framework import ChatAgent` works.
85
- - Verify instantiation of `ChatAgent` with a mock client.
86
-
87
- ### Step 3.3: Update Agents
88
- Refactor `src/agents/*.py` to ensure they match the exact signature of the local `ChatAgent` class.
89
- - **SOLID:** Ensure agents have single responsibilities.
90
- - **DRY:** Share tool definitions between Pydantic-AI simple mode and Agent Framework advanced mode.
91
-
92
- ---
93
-
94
- ## Phase 4: UI & End-to-End Verification
95
-
96
- **Goal:** Update Gradio to reflect the active mode.
97
-
98
- ### Step 4.1: UI Updates
99
- Update `src/app.py` to display "Simple Mode" vs "Advanced Mode".
100
-
101
- ### Step 4.2: End-to-End Test
102
- Run the full loop:
103
- 1. Simple Mode (No Keys) -> Search -> Judge (HF) -> Report.
104
- 2. Advanced Mode (OpenAI Key) -> SearchAgent -> JudgeAgent -> ReportAgent.
105
-
106
- ---
107
-
108
- ## Phase 5: Cleanup & Documentation
109
-
110
- - Remove unused code.
111
- - Update main README.md.
112
- - Final `make check`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/decisions/architecture-2025-11/03_IMMEDIATE_ACTIONS.md DELETED
@@ -1,112 +0,0 @@
1
- # Immediate Actions Checklist
2
-
3
- **Date:** November 27, 2025
4
- **Priority:** Execute in order
5
-
6
- ---
7
-
8
- ## Before Starting Implementation
9
-
10
- ### 1. Close PR #41 (CRITICAL)
11
-
12
- ```bash
13
- gh pr close 41 --comment "Architecture decision changed. Cherry-picking improvements to preserve both pydantic-ai and Agent Framework capabilities."
14
- ```
15
-
16
- ### 2. Verify HuggingFace Spaces is Safe
17
-
18
- ```bash
19
- # Should show agent framework files exist
20
- git ls-tree --name-only huggingface-upstream/dev -- src/agents/
21
- git ls-tree --name-only huggingface-upstream/dev -- src/orchestrator_magentic.py
22
- ```
23
-
24
- Expected output: Files should exist (they do as of this writing).
25
-
26
- ### 3. Clean Local Environment
27
-
28
- ```bash
29
- # Switch to main first
30
- git checkout main
31
-
32
- # Delete problematic branches
33
- git branch -D refactor/pydantic-unification 2>/dev/null || true
34
- git branch -D feat/pubmed-fulltext 2>/dev/null || true
35
-
36
- # Reset local dev to origin/dev
37
- git branch -D dev 2>/dev/null || true
38
- git checkout -b dev origin/dev
39
-
40
- # Verify agent framework code exists
41
- ls src/agents/
42
- # Expected: __init__.py, analysis_agent.py, hypothesis_agent.py, judge_agent.py,
43
- # magentic_agents.py, report_agent.py, search_agent.py, state.py, tools.py
44
-
45
- ls src/orchestrator_magentic.py
46
- # Expected: file exists
47
- ```
48
-
49
- ### 4. Create Fresh Feature Branch
50
-
51
- ```bash
52
- git checkout -b feat/dual-mode-architecture origin/dev
53
- ```
54
-
55
- ---
56
-
57
- ## Decision Points
58
-
59
- Before proceeding, confirm:
60
-
61
- 1. **For hackathon**: Do we need advanced mode, or is simple mode sufficient?
62
- - Simple mode = faster to implement, works today
63
- - Advanced mode = better quality, more work
64
-
65
- 2. **Timeline**: How much time do we have?
66
- - If < 1 day: Focus on simple mode only
67
- - If > 1 day: Implement dual-mode
68
-
69
- 3. **Dependencies**: Is `agent-framework-core` available?
70
- - Check: `pip index versions agent-framework-core`
71
- - If not on PyPI, may need to install from GitHub
72
-
73
- ---
74
-
75
- ## Quick Start (Simple Mode Only)
76
-
77
- If time is limited, implement only simple mode improvements:
78
-
79
- ```bash
80
- # On feat/dual-mode-architecture branch
81
-
82
- # 1. Update judges.py to add HuggingFace support
83
- # 2. Update config.py to add HF settings
84
- # 3. Create free_tier_demo.py
85
- # 4. Run make check
86
- # 5. Create PR to dev
87
- ```
88
-
89
- This gives you free-tier capability without touching agent framework code.
90
-
91
- ---
92
-
93
- ## Quick Start (Full Dual-Mode)
94
-
95
- If time permits, implement full dual-mode:
96
-
97
- Follow phases 1-6 in `02_IMPLEMENTATION_PHASES.md`
98
-
99
- ---
100
-
101
- ## Emergency Rollback
102
-
103
- If anything goes wrong:
104
-
105
- ```bash
106
- # Reset to safe state
107
- git checkout main
108
- git branch -D feat/dual-mode-architecture
109
- git checkout -b feat/dual-mode-architecture origin/dev
110
- ```
111
-
112
- Origin/dev is the safe fallback - it has agent framework intact.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/decisions/architecture-2025-11/04_FOLLOWUP_REVIEW_REQUEST.md DELETED
@@ -1,158 +0,0 @@
1
- # Follow-Up Review Request: Did We Implement Your Feedback?
2
-
3
- **Date:** November 27, 2025
4
- **Context:** You previously reviewed our dual-mode architecture plan and provided feedback. We have updated the documentation. Please verify we correctly implemented your recommendations.
5
-
6
- ---
7
-
8
- ## Your Original Feedback vs Our Changes
9
-
10
- ### 1. Naming Confusion Clarification
11
-
12
- **Your feedback:** "You are using Microsoft Agent Framework, but you've named your integration 'Magentic'. This caused the confusion."
13
-
14
- **Our change:** Added Section 4 in `00_SITUATION_AND_PLAN.md`:
15
- ```markdown
16
- ## 4. CRITICAL: Naming Confusion Clarification
17
-
18
- > **Senior Agent Review Finding:** The codebase uses "magentic" in file names
19
- > (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT**
20
- > the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework.
21
-
22
- **The naming confusion:**
23
- - `magentic` (PyPI package): A different library for structured LLM outputs
24
- - "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
25
- - `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
26
-
27
- **Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py`
28
- ```
29
-
30
- **Status:** ✅ IMPLEMENTED
31
-
32
- ---
33
-
34
- ### 2. Bridge Complexity Warning
35
-
36
- **Your feedback:** "You must ensure MagenticState (context vars) propagates correctly through the pydantic-ai call stack."
37
-
38
- **Our change:** Added Section 10.1 in `01_ARCHITECTURE_SPEC.md`:
39
- ```markdown
40
- ### 10.1 Bridge Complexity (MEDIUM)
41
-
42
- **Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai).
43
- Both are async. Context variables (`MagenticState`) must propagate correctly.
44
-
45
- **Mitigation:**
46
- - pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
47
- - Test context propagation explicitly in integration tests
48
- - If issues arise, pass state explicitly rather than via context vars
49
- ```
50
-
51
- **Status:** ✅ IMPLEMENTED
52
-
53
- ---
54
-
55
- ### 3. Integration Drift Warning
56
-
57
- **Your feedback:** "Simple Mode and Advanced Mode might diverge in behavior."
58
-
59
- **Our change:** Added Section 10.2 in `01_ARCHITECTURE_SPEC.md`:
60
- ```markdown
61
- ### 10.2 Integration Drift (MEDIUM)
62
-
63
- **Risk:** Simple Mode and Advanced Mode might diverge in behavior over time.
64
-
65
- **Mitigation:**
66
- - Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
67
- - Handlers are the single source of truth for business logic
68
- - Agents are thin wrappers that delegate to handlers
69
- ```
70
-
71
- **Status:** ✅ IMPLEMENTED
72
-
73
- ---
74
-
75
- ### 4. Testing Burden Warning
76
-
77
- **Your feedback:** "You now have two distinct orchestrators to maintain. This doubles your integration testing surface area."
78
-
79
- **Our change:** Added Section 10.3 in `01_ARCHITECTURE_SPEC.md`:
80
- ```markdown
81
- ### 10.3 Testing Burden (LOW-MEDIUM)
82
-
83
- **Risk:** Two distinct orchestrators doubles integration testing surface area.
84
-
85
- **Mitigation:**
86
- - Unit test handlers independently (shared code)
87
- - Integration tests for each mode separately
88
- - End-to-end tests verify same output for same input
89
- ```
90
-
91
- **Status:** ✅ IMPLEMENTED
92
-
93
- ---
94
-
95
- ### 5. Rename Recommendation
96
-
97
- **Your feedback:** "Rename `src/orchestrator_magentic.py` to `src/orchestrator_advanced.py`"
98
-
99
- **Our change:** Added Step 3.4 in `02_IMPLEMENTATION_PHASES.md`:
100
- ```markdown
101
- ### Step 3.4: (OPTIONAL) Rename "Magentic" to "Advanced"
102
-
103
- > **Senior Agent Recommendation:** Rename files to eliminate confusion.
104
-
105
- git mv src/orchestrator_magentic.py src/orchestrator_advanced.py
106
- git mv src/agents/magentic_agents.py src/agents/advanced_agents.py
107
-
108
- **Note:** This is optional for the hackathon. Can be done in a follow-up PR.
109
- ```
110
-
111
- **Status:** ✅ DOCUMENTED (marked as optional for hackathon)
112
-
113
- ---
114
-
115
- ### 6. Standardize Wrapper Recommendation
116
-
117
- **Your feedback:** "Create a generic `PydanticAiAgentWrapper(BaseAgent)` class instead of manually wrapping each handler."
118
-
119
- **Our change:** NOT YET DOCUMENTED
120
-
121
- **Status:** ⚠️ NOT IMPLEMENTED - Should we add this?
122
-
123
- ---
124
-
125
- ## Questions for Your Review
126
-
127
- 1. **Did we correctly implement your feedback?** Are there any misunderstandings in how we interpreted your recommendations?
128
-
129
- 2. **Is the "Standardize Wrapper" recommendation critical?** Should we add it to the implementation phases, or is it a nice-to-have for later?
130
-
131
- 3. **Dependency versioning:** You noted `agent-framework-core>=1.0.0b251120` might be ephemeral. Should we:
132
- - Pin to a specific version?
133
- - Use a version range?
134
- - Install from GitHub source?
135
-
136
- 4. **Anything else we missed?**
137
-
138
- ---
139
-
140
- ## Files to Re-Review
141
-
142
- 1. `00_SITUATION_AND_PLAN.md` - Added Section 4 (Naming Clarification)
143
- 2. `01_ARCHITECTURE_SPEC.md` - Added Sections 10-11 (Risks, Naming)
144
- 3. `02_IMPLEMENTATION_PHASES.md` - Added Step 3.4 (Optional Rename)
145
-
146
- ---
147
-
148
- ## Current Branch State
149
-
150
- We are now on `feat/dual-mode-architecture` branched from `origin/dev`:
151
- - ✅ Agent framework code intact (`src/agents/`, `src/orchestrator_magentic.py`)
152
- - ✅ Documentation committed
153
- - ❌ PR #41 still open (need to close it)
154
- - ❌ Cherry-pick of pydantic-ai improvements not yet done
155
-
156
- ---
157
-
158
- Please confirm: **GO / NO-GO** to proceed with Phase 1 (cherry-picking pydantic-ai improvements)?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/decisions/architecture-2025-11/REVIEW_PROMPT_FOR_SENIOR_AGENT.md DELETED
@@ -1,113 +0,0 @@
1
- # Senior Agent Review Prompt
2
-
3
- Copy and paste everything below this line to a fresh Claude/AI session:
4
-
5
- ---
6
-
7
- ## Context
8
-
9
- I am a junior developer working on a HuggingFace hackathon project called DeepBoner. We made a significant architectural mistake and are now trying to course-correct. I need you to act as a **senior staff engineer** and critically review our proposed solution.
10
-
11
- ## The Situation
12
-
13
- We almost merged a refactor that would have **deleted** our multi-agent orchestration capability, mistakenly believing that `pydantic-ai` (a library for structured LLM outputs) and Microsoft's `agent-framework` (a framework for multi-agent orchestration) were mutually exclusive alternatives.
14
-
15
- **They are not.** They are complementary:
16
- - `pydantic-ai` ensures LLM responses match Pydantic schemas (type-safe outputs)
17
- - `agent-framework` orchestrates multiple agents working together (coordination layer)
18
-
19
- We now want to implement a **dual-mode architecture** where:
20
- - **Simple Mode (No API key):** Uses only pydantic-ai with HuggingFace free tier
21
- - **Advanced Mode (With API key):** Uses Microsoft Agent Framework for orchestration, with pydantic-ai inside each agent for structured outputs
22
-
23
- ## Your Task
24
-
25
- Please perform a **deep, critical review** of:
26
-
27
- 1. **The architecture diagram** (image attached: `assets/magentic-pydantic.png`)
28
- 2. **Our documentation** (4 files listed below)
29
- 3. **The actual codebase** to verify our claims
30
-
31
- ## Specific Questions to Answer
32
-
33
- ### Architecture Validation
34
- 1. Is our understanding correct that pydantic-ai and agent-framework are complementary, not competing?
35
- 2. Does the dual-mode architecture diagram accurately represent how these should integrate?
36
- 3. Are there any architectural flaws or anti-patterns in our proposed design?
37
-
38
- ### Documentation Accuracy
39
- 4. Are the branch states we documented accurate? (Check `git log`, `git ls-tree`)
40
- 5. Is our understanding of what code exists where correct?
41
- 6. Are the implementation phases realistic and in the correct order?
42
- 7. Are there any missing steps or dependencies we overlooked?
43
-
44
- ### Codebase Reality Check
45
- 8. Does `origin/dev` actually have the agent framework code intact? Verify by checking:
46
- - `git ls-tree origin/dev -- src/agents/`
47
- - `git ls-tree origin/dev -- src/orchestrator_magentic.py`
48
- 9. What does the current `src/agents/` code actually import? Does it use `agent_framework` or `agent-framework-core`?
49
- 10. Is the `agent-framework-core` package actually available on PyPI, or do we need to install from source?
50
-
51
- ### Implementation Feasibility
52
- 11. Can the cherry-pick strategy we outlined actually work, or are there merge conflicts we're not seeing?
53
- 12. Is the mode auto-detection logic sound?
54
- 13. What are the risks we haven't identified?
55
-
56
- ### Critical Errors Check
57
- 14. Did we miss anything critical in our analysis?
58
- 15. Are there any factual errors in our documentation?
59
- 16. Would a Google/DeepMind senior engineer approve this plan, or would they flag issues?
60
-
61
- ## Files to Review
62
-
63
- Please read these files in order:
64
-
65
- 1. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner-1/docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md`
66
- 2. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner-1/docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md`
67
- 3. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner-1/docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md`
68
- 4. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner-1/docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md`
69
-
70
- And the architecture diagram:
71
- 5. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner-1/assets/magentic-pydantic.png`
72
-
73
- ## Reference Repositories to Consult
74
-
75
- We have local clones of the source-of-truth repositories:
76
-
77
- - **Original DeepBoner:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner-1/reference_repos/DeepBoner/`
78
- - **Microsoft Agent Framework:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner-1/reference_repos/agent-framework/`
79
- - **Microsoft AutoGen:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner-1/reference_repos/autogen-microsoft/`
80
-
81
- Please cross-reference our hackathon fork against these to verify architectural alignment.
82
-
83
- ## Codebase to Analyze
84
-
85
- Our hackathon fork is at:
86
- `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner-1/`
87
-
88
- Key files to examine:
89
- - `src/agents/` - Agent framework integration
90
- - `src/agent_factory/judges.py` - pydantic-ai integration
91
- - `src/orchestrator.py` - Simple mode orchestrator
92
- - `src/orchestrator_magentic.py` - Advanced mode orchestrator
93
- - `src/orchestrator_factory.py` - Mode selection
94
- - `pyproject.toml` - Dependencies
95
-
96
- ## Expected Output
97
-
98
- Please provide:
99
-
100
- 1. **Validation Summary:** Is our plan sound? (YES/NO with explanation)
101
- 2. **Errors Found:** List any factual errors in our documentation
102
- 3. **Missing Items:** What did we overlook?
103
- 4. **Risk Assessment:** What could go wrong?
104
- 5. **Recommended Changes:** Specific edits to our documentation or plan
105
- 6. **Go/No-Go Recommendation:** Should we proceed with this plan?
106
-
107
- ## Tone
108
-
109
- Be brutally honest. If our plan is flawed, say so directly. We would rather know now than after implementation. Don't soften criticism - we need accuracy.
110
-
111
- ---
112
-
113
- END OF PROMPT