Ric Claude commited on
Commit
47ec478
·
1 Parent(s): a6b8ecc

Update README with accurate project status and paper submission

Browse files

- Replaced aspirational roadmap with honest project status
- Added "Known Limitations" section noting training didn't achieve targets
- Updated citation to reflect paper under review
- Added note about paper materials in not_uploaded/ directory

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +22 -9
README.md CHANGED
@@ -232,19 +232,32 @@ task_type: PHI_DETECTION
232
  - **Hardware**: Single A100 40GB GPU
233
  - **Training Time**: ~8 hours
234
 
235
- ## Development Roadmap
236
 
 
 
 
237
  - [x] Project structure and configuration
238
  - [x] Synthea integration for synthetic patient data
239
  - [x] PDF generation pipeline with PHI annotations
240
  - [x] PHI annotation and preprocessing tools
241
- - [x] LoRA adapter implementation
242
- - [ ] Complete training pipeline with evaluation
243
- - [ ] Inference engine with masking strategies
244
- - [ ] Comprehensive evaluation framework
245
- - [ ] Benchmarking against baseline methods
246
- - [ ] Documentation and tutorials
247
- - [ ] Paper submission to arXiv/conference
 
 
 
 
 
 
 
 
 
 
248
 
249
  ## Citation
250
 
@@ -254,7 +267,7 @@ If you use this work in your research, please cite:
254
  @article{justitia2025,
255
  title={Justitia: Selective Vision Token Masking for PHI-Compliant OCR},
256
  author={Your Name},
257
- journal={arXiv preprint arXiv:2025.xxxxx},
258
  year={2025}
259
  }
260
  ```
 
232
  - **Hardware**: Single A100 40GB GPU
233
  - **Training Time**: ~8 hours
234
 
235
+ ## Project Status
236
 
237
+ **Current State**: Early research prototype with synthetic data generation pipeline completed. Initial LoRA training experiments showed limitations in the approach.
238
+
239
+ ### Completed
240
  - [x] Project structure and configuration
241
  - [x] Synthea integration for synthetic patient data
242
  - [x] PDF generation pipeline with PHI annotations
243
  - [x] PHI annotation and preprocessing tools
244
+ - [x] Initial LoRA adapter implementation
245
+ - [x] Basic training pipeline (results were suboptimal)
246
+
247
+ ### Known Limitations
248
+ - Initial training approach did not achieve target performance
249
+ - Vision token masking effectiveness needs further research
250
+ - Alternative architectures may be required
251
+
252
+ ### Future Directions
253
+ - Explore alternative masking strategies
254
+ - Investigate different vision encoder fine-tuning approaches
255
+ - Consider hybrid text-vision detection methods
256
+ - Benchmark against traditional OCR + NER pipelines
257
+
258
+ ## Paper
259
+
260
+ A paper describing this work has been submitted for peer review. The paper, experimental results, and additional materials are available in the `not_uploaded/` directory (not included in this public repository).
261
 
262
  ## Citation
263
 
 
267
  @article{justitia2025,
268
  title={Justitia: Selective Vision Token Masking for PHI-Compliant OCR},
269
  author={Your Name},
270
+ journal={Under Review},
271
  year={2025}
272
  }
273
  ```