Spaces:

iteratehack
/

MentorFlow

Paused

MentorFlow / teacher_agent_dev /ANALYSIS_AND_FIXES.md

Cornelius

Deploy MentorFlow with GPU support

a52f96d 13 days ago

2.85 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Analysis: Why Accuracy Drops and How to Fix

Evaluation uses NEW tasks each time (line 171-175 in compare_strategies.py)
- general_accuracy = student.evaluate([generator.generate_task(...) for ...])
- Creates new tasks every iteration → variance and inconsistency
- Should use FIXED eval set
Forgetting rate too aggressive for 500 iterations
- Forgetting rate: 0.05
- After 500 iterations (500 time units): retention = exp(-0.05 * 500) ≈ 0.0
- All skills forgotten by the end!
- Retention drops to near-zero after ~50-100 time units
Evaluation timing confusion
- Currently: Learn → Evaluate → Advance time
- Should be clearer about when evaluation happens relative to forgetting

Uses student.evaluate(eval_tasks) which:
- Calls answer() for each task (stochastic, uses randomness)
- Accounts for forgetting via _get_effective_skill()
- Returns fraction of correct answers

Use FIXED eval set generated once at start
Use expected accuracy instead of sampled (less variance)
- Expected acc = mean(prob_correct) over all tasks
Larger eval set (50-100 tasks) for stability
Separate eval timing: Evaluate BEFORE time advance

Mock Student:

Mock Task Generator:

Real Components (in MentorFlow):

YES, likely:

BUT: