File size: 5,964 Bytes
016b413
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
# Fix Plan: Magentic Mode Report Generation

**Related Bug**: `P0_MAGENTIC_MODE_BROKEN.md`
**Approach**: Test-Driven Development (TDD)
**Estimated Scope**: 4 tasks, ~2-3 hours

---

## Problem Summary

Magentic mode runs but fails to produce readable reports due to:

1. **Primary Bug**: `MagenticFinalResultEvent.message` returns `ChatMessage` object, not text
2. **Secondary Bug**: Max rounds (3) reached before ReportAgent completes
3. **Tertiary Issues**: Stale "bioRxiv" references in prompts

---

## Fix Order (TDD)

### Phase 1: Write Failing Tests

**Task 1.1**: Create test for ChatMessage text extraction

```python
# tests/unit/test_orchestrator_magentic.py

def test_process_event_extracts_text_from_chat_message():
    """Final result event should extract text from ChatMessage object."""
    # Arrange: Mock ChatMessage with .content attribute
    # Act: Call _process_event with MagenticFinalResultEvent
    # Assert: Returned AgentEvent.message is a string, not object repr
```

**Task 1.2**: Create test for max rounds configuration

```python
def test_orchestrator_uses_configured_max_rounds():
    """MagenticOrchestrator should use max_rounds from constructor."""
    # Arrange: Create orchestrator with max_rounds=10
    # Act: Build workflow
    # Assert: Workflow has max_round_count=10
```

**Task 1.3**: Create test for bioRxiv reference removal

```python
def test_task_prompt_references_europe_pmc():
    """Task prompt should reference Europe PMC, not bioRxiv."""
    # Arrange: Create orchestrator
    # Act: Check task string in run()
    # Assert: Contains "Europe PMC", not "bioRxiv"
```

---

### Phase 2: Fix ChatMessage Text Extraction

**File**: `src/orchestrator_magentic.py`
**Lines**: 192-199

**Current Code**:
```python
elif isinstance(event, MagenticFinalResultEvent):
    text = event.message.text if event.message else "No result"
```

**Fixed Code**:
```python
elif isinstance(event, MagenticFinalResultEvent):
    if event.message:
        # ChatMessage may have .content or .text depending on version
        if hasattr(event.message, 'content') and event.message.content:
            text = str(event.message.content)
        elif hasattr(event.message, 'text') and event.message.text:
            text = str(event.message.text)
        else:
            # Fallback: convert entire message to string
            text = str(event.message)
    else:
        text = "No result generated"
```

**Why**: The `agent_framework.ChatMessage` object structure may vary. We need defensive extraction.

---

### Phase 3: Fix Max Rounds Configuration

**File**: `src/orchestrator_magentic.py`
**Lines**: 97-99

**Current Code**:
```python
.with_standard_manager(
    chat_client=manager_client,
    max_round_count=self._max_rounds,  # Already uses config
    max_stall_count=3,
    max_reset_count=2,
)
```

**Issue**: Default `max_rounds` in `__init__` is 10, but workflow may need more for complex queries.

**Fix**: Verify the value flows through correctly. Add logging.

```python
logger.info(
    "Building Magentic workflow",
    max_rounds=self._max_rounds,
    max_stall=3,
    max_reset=2,
)
```

**Also check**: `src/orchestrator_factory.py` passes config correctly:
```python
return MagenticOrchestrator(
    max_rounds=config.max_iterations if config else 10,
)
```

---

### Phase 4: Fix Stale bioRxiv References

**Files to update**:

| File | Line | Change |
|------|------|--------|
| `src/orchestrator_magentic.py` | 131 | "bioRxiv" β†’ "Europe PMC" |
| `src/agents/magentic_agents.py` | 32-33 | "bioRxiv" β†’ "Europe PMC" |
| `src/app.py` | 202-203 | "bioRxiv" β†’ "Europe PMC" |

**Search command to verify**:
```bash
grep -rn "bioRxiv\|biorxiv" src/
```

---

## Implementation Checklist

```
[ ] Phase 1: Write failing tests
    [ ] 1.1 Test ChatMessage text extraction
    [ ] 1.2 Test max rounds configuration
    [ ] 1.3 Test Europe PMC references

[ ] Phase 2: Fix ChatMessage extraction
    [ ] Update _process_event() in orchestrator_magentic.py
    [ ] Run test 1.1 - should pass

[ ] Phase 3: Fix max rounds
    [ ] Add logging to _build_workflow()
    [ ] Verify factory passes config correctly
    [ ] Run test 1.2 - should pass

[ ] Phase 4: Fix bioRxiv references
    [ ] Update orchestrator_magentic.py task prompt
    [ ] Update magentic_agents.py descriptions
    [ ] Update app.py UI text
    [ ] Run test 1.3 - should pass
    [ ] Run grep to verify no remaining refs

[ ] Final Verification
    [ ] make check passes
    [ ] All tests pass (108+)
    [ ] Manual test: run_magentic.py produces readable report
```

---

## Test Commands

```bash
# Run specific test file
uv run pytest tests/unit/test_orchestrator_magentic.py -v

# Run all tests
uv run pytest tests/unit/ -v

# Full check
make check

# Manual integration test
set -a && source .env && set +a
uv run python examples/orchestrator_demo/run_magentic.py "metformin alzheimer"
```

---

## Success Criteria

1. `run_magentic.py` outputs a readable research report (not `<ChatMessage object>`)
2. Report includes: Executive Summary, Key Findings, Drug Candidates, References
3. No "Max round count reached" error with default settings
4. No "bioRxiv" references anywhere in codebase
5. All 108+ tests pass
6. `make check` passes

---

## Files Modified

```
src/
β”œβ”€β”€ orchestrator_magentic.py   # ChatMessage fix, logging
β”œβ”€β”€ agents/magentic_agents.py  # bioRxiv β†’ Europe PMC
└── app.py                     # bioRxiv β†’ Europe PMC

tests/unit/
└── test_orchestrator_magentic.py  # NEW: 3 tests
```

---

## Notes for AI Agent

When implementing this fix plan:

1. **DO NOT** create mock data or fake responses
2. **DO** write real tests that verify actual behavior
3. **DO** run `make check` after each phase
4. **DO** test with real OpenAI API key via `.env`
5. **DO** preserve existing functionality - simple mode must still work
6. **DO NOT** over-engineer - minimal changes to fix the specific bugs