Spaces:
Running
Running
File size: 5,964 Bytes
016b413 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
# Fix Plan: Magentic Mode Report Generation
**Related Bug**: `P0_MAGENTIC_MODE_BROKEN.md`
**Approach**: Test-Driven Development (TDD)
**Estimated Scope**: 4 tasks, ~2-3 hours
---
## Problem Summary
Magentic mode runs but fails to produce readable reports due to:
1. **Primary Bug**: `MagenticFinalResultEvent.message` returns `ChatMessage` object, not text
2. **Secondary Bug**: Max rounds (3) reached before ReportAgent completes
3. **Tertiary Issues**: Stale "bioRxiv" references in prompts
---
## Fix Order (TDD)
### Phase 1: Write Failing Tests
**Task 1.1**: Create test for ChatMessage text extraction
```python
# tests/unit/test_orchestrator_magentic.py
def test_process_event_extracts_text_from_chat_message():
"""Final result event should extract text from ChatMessage object."""
# Arrange: Mock ChatMessage with .content attribute
# Act: Call _process_event with MagenticFinalResultEvent
# Assert: Returned AgentEvent.message is a string, not object repr
```
**Task 1.2**: Create test for max rounds configuration
```python
def test_orchestrator_uses_configured_max_rounds():
"""MagenticOrchestrator should use max_rounds from constructor."""
# Arrange: Create orchestrator with max_rounds=10
# Act: Build workflow
# Assert: Workflow has max_round_count=10
```
**Task 1.3**: Create test for bioRxiv reference removal
```python
def test_task_prompt_references_europe_pmc():
"""Task prompt should reference Europe PMC, not bioRxiv."""
# Arrange: Create orchestrator
# Act: Check task string in run()
# Assert: Contains "Europe PMC", not "bioRxiv"
```
---
### Phase 2: Fix ChatMessage Text Extraction
**File**: `src/orchestrator_magentic.py`
**Lines**: 192-199
**Current Code**:
```python
elif isinstance(event, MagenticFinalResultEvent):
text = event.message.text if event.message else "No result"
```
**Fixed Code**:
```python
elif isinstance(event, MagenticFinalResultEvent):
if event.message:
# ChatMessage may have .content or .text depending on version
if hasattr(event.message, 'content') and event.message.content:
text = str(event.message.content)
elif hasattr(event.message, 'text') and event.message.text:
text = str(event.message.text)
else:
# Fallback: convert entire message to string
text = str(event.message)
else:
text = "No result generated"
```
**Why**: The `agent_framework.ChatMessage` object structure may vary. We need defensive extraction.
---
### Phase 3: Fix Max Rounds Configuration
**File**: `src/orchestrator_magentic.py`
**Lines**: 97-99
**Current Code**:
```python
.with_standard_manager(
chat_client=manager_client,
max_round_count=self._max_rounds, # Already uses config
max_stall_count=3,
max_reset_count=2,
)
```
**Issue**: Default `max_rounds` in `__init__` is 10, but workflow may need more for complex queries.
**Fix**: Verify the value flows through correctly. Add logging.
```python
logger.info(
"Building Magentic workflow",
max_rounds=self._max_rounds,
max_stall=3,
max_reset=2,
)
```
**Also check**: `src/orchestrator_factory.py` passes config correctly:
```python
return MagenticOrchestrator(
max_rounds=config.max_iterations if config else 10,
)
```
---
### Phase 4: Fix Stale bioRxiv References
**Files to update**:
| File | Line | Change |
|------|------|--------|
| `src/orchestrator_magentic.py` | 131 | "bioRxiv" β "Europe PMC" |
| `src/agents/magentic_agents.py` | 32-33 | "bioRxiv" β "Europe PMC" |
| `src/app.py` | 202-203 | "bioRxiv" β "Europe PMC" |
**Search command to verify**:
```bash
grep -rn "bioRxiv\|biorxiv" src/
```
---
## Implementation Checklist
```
[ ] Phase 1: Write failing tests
[ ] 1.1 Test ChatMessage text extraction
[ ] 1.2 Test max rounds configuration
[ ] 1.3 Test Europe PMC references
[ ] Phase 2: Fix ChatMessage extraction
[ ] Update _process_event() in orchestrator_magentic.py
[ ] Run test 1.1 - should pass
[ ] Phase 3: Fix max rounds
[ ] Add logging to _build_workflow()
[ ] Verify factory passes config correctly
[ ] Run test 1.2 - should pass
[ ] Phase 4: Fix bioRxiv references
[ ] Update orchestrator_magentic.py task prompt
[ ] Update magentic_agents.py descriptions
[ ] Update app.py UI text
[ ] Run test 1.3 - should pass
[ ] Run grep to verify no remaining refs
[ ] Final Verification
[ ] make check passes
[ ] All tests pass (108+)
[ ] Manual test: run_magentic.py produces readable report
```
---
## Test Commands
```bash
# Run specific test file
uv run pytest tests/unit/test_orchestrator_magentic.py -v
# Run all tests
uv run pytest tests/unit/ -v
# Full check
make check
# Manual integration test
set -a && source .env && set +a
uv run python examples/orchestrator_demo/run_magentic.py "metformin alzheimer"
```
---
## Success Criteria
1. `run_magentic.py` outputs a readable research report (not `<ChatMessage object>`)
2. Report includes: Executive Summary, Key Findings, Drug Candidates, References
3. No "Max round count reached" error with default settings
4. No "bioRxiv" references anywhere in codebase
5. All 108+ tests pass
6. `make check` passes
---
## Files Modified
```
src/
βββ orchestrator_magentic.py # ChatMessage fix, logging
βββ agents/magentic_agents.py # bioRxiv β Europe PMC
βββ app.py # bioRxiv β Europe PMC
tests/unit/
βββ test_orchestrator_magentic.py # NEW: 3 tests
```
---
## Notes for AI Agent
When implementing this fix plan:
1. **DO NOT** create mock data or fake responses
2. **DO** write real tests that verify actual behavior
3. **DO** run `make check` after each phase
4. **DO** test with real OpenAI API key via `.env`
5. **DO** preserve existing functionality - simple mode must still work
6. **DO NOT** over-engineer - minimal changes to fix the specific bugs
|