Ric
Claude
commited on
Commit
·
47ec478
1
Parent(s):
a6b8ecc
Update README with accurate project status and paper submission
Browse files- Replaced aspirational roadmap with honest project status
- Added "Known Limitations" section noting training didn't achieve targets
- Updated citation to reflect paper under review
- Added note about paper materials in not_uploaded/ directory
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
README.md
CHANGED
|
@@ -232,19 +232,32 @@ task_type: PHI_DETECTION
|
|
| 232 |
- **Hardware**: Single A100 40GB GPU
|
| 233 |
- **Training Time**: ~8 hours
|
| 234 |
|
| 235 |
-
##
|
| 236 |
|
|
|
|
|
|
|
|
|
|
| 237 |
- [x] Project structure and configuration
|
| 238 |
- [x] Synthea integration for synthetic patient data
|
| 239 |
- [x] PDF generation pipeline with PHI annotations
|
| 240 |
- [x] PHI annotation and preprocessing tools
|
| 241 |
-
- [x] LoRA adapter implementation
|
| 242 |
-
- [
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
-
|
| 246 |
-
-
|
| 247 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 248 |
|
| 249 |
## Citation
|
| 250 |
|
|
@@ -254,7 +267,7 @@ If you use this work in your research, please cite:
|
|
| 254 |
@article{justitia2025,
|
| 255 |
title={Justitia: Selective Vision Token Masking for PHI-Compliant OCR},
|
| 256 |
author={Your Name},
|
| 257 |
-
journal={
|
| 258 |
year={2025}
|
| 259 |
}
|
| 260 |
```
|
|
|
|
| 232 |
- **Hardware**: Single A100 40GB GPU
|
| 233 |
- **Training Time**: ~8 hours
|
| 234 |
|
| 235 |
+
## Project Status
|
| 236 |
|
| 237 |
+
**Current State**: Early research prototype with synthetic data generation pipeline completed. Initial LoRA training experiments showed limitations in the approach.
|
| 238 |
+
|
| 239 |
+
### Completed
|
| 240 |
- [x] Project structure and configuration
|
| 241 |
- [x] Synthea integration for synthetic patient data
|
| 242 |
- [x] PDF generation pipeline with PHI annotations
|
| 243 |
- [x] PHI annotation and preprocessing tools
|
| 244 |
+
- [x] Initial LoRA adapter implementation
|
| 245 |
+
- [x] Basic training pipeline (results were suboptimal)
|
| 246 |
+
|
| 247 |
+
### Known Limitations
|
| 248 |
+
- Initial training approach did not achieve target performance
|
| 249 |
+
- Vision token masking effectiveness needs further research
|
| 250 |
+
- Alternative architectures may be required
|
| 251 |
+
|
| 252 |
+
### Future Directions
|
| 253 |
+
- Explore alternative masking strategies
|
| 254 |
+
- Investigate different vision encoder fine-tuning approaches
|
| 255 |
+
- Consider hybrid text-vision detection methods
|
| 256 |
+
- Benchmark against traditional OCR + NER pipelines
|
| 257 |
+
|
| 258 |
+
## Paper
|
| 259 |
+
|
| 260 |
+
A paper describing this work has been submitted for peer review. The paper, experimental results, and additional materials are available in the `not_uploaded/` directory (not included in this public repository).
|
| 261 |
|
| 262 |
## Citation
|
| 263 |
|
|
|
|
| 267 |
@article{justitia2025,
|
| 268 |
title={Justitia: Selective Vision Token Masking for PHI-Compliant OCR},
|
| 269 |
author={Your Name},
|
| 270 |
+
journal={Under Review},
|
| 271 |
year={2025}
|
| 272 |
}
|
| 273 |
```
|