Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

DeepCritical / CONTRIBUTING.md

Joseph Pollack

implements documentation improvements

d45d242 11 days ago

preview code

raw

history blame

16.6 kB

Contributing to The DETERMINATOR

Thank you for your interest in contributing to The DETERMINATOR! This guide will help you get started.

Git Workflow
Getting Started
Development Commands
MCP Integration
Common Pitfalls
Key Principles
Pull Request Process

Note: Additional sections (Code Style, Error Handling, Testing, Implementation Patterns, Code Quality, and Prompt Engineering) are available as separate pages in the documentation.
Note on Project Names: "The DETERMINATOR" is the product name, "DeepCritical" is the organization/project name, and "determinator" is the Python package name.

Repository Information

GitHub Repository: DeepCritical/GradioDemo (source of truth, PRs, code review)
HuggingFace Space: DataQuests/DeepCritical (deployment/demo)
Package Name: determinator (Python package name in pyproject.toml)

Git Workflow

main: Production-ready (GitHub)
dev: Development integration (GitHub)
Use feature branches: yourname-dev
NEVER push directly to main or dev on HuggingFace
GitHub is source of truth; HuggingFace is for deployment

Dual Repository Setup

This project uses a dual repository setup:

GitHub (DeepCritical/GradioDemo): Source of truth for code, PRs, and code review
HuggingFace (DataQuests/DeepCritical): Deployment target for the Gradio demo

Remote Configuration

When cloning, set up remotes as follows:

# Clone from GitHub
git clone https://github.com/DeepCritical/GradioDemo.git
cd GradioDemo

# Add HuggingFace remote (optional, for deployment)
git remote add huggingface-upstream https://huggingface.co/spaces/DataQuests/DeepCritical

Important: Never push directly to main or dev on HuggingFace. Always work through GitHub PRs. GitHub is the source of truth; HuggingFace is for deployment/demo only.

Getting Started

Fork the repository on GitHub: DeepCritical/GradioDemo

Clone your fork:

git clone https://github.com/yourusername/GradioDemo.git
cd GradioDemo

Install dependencies:

uv sync --all-extras
uv run pre-commit install

Create a feature branch:
```
git checkout -b yourname-feature-name
```
Make your changes following the guidelines below

Run checks:

uv run ruff check src tests
uv run mypy src
uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire

Commit and push:

git commit -m "Description of changes"
git push origin yourname-feature-name

Create a pull request on GitHub

Package Manager

This project uses uv as the package manager. All commands should be prefixed with uv run to ensure they run in the correct environment.

Installation

# Install uv if you haven't already (recommended: standalone installer)
# Unix/macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell):
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Alternative: pipx install uv
# Or: pip install uv

# Sync all dependencies including dev extras
uv sync --all-extras

# Install pre-commit hooks
uv run pre-commit install

Development Commands

# Installation
uv sync --all-extras              # Install all dependencies including dev
uv run pre-commit install          # Install pre-commit hooks

# Code Quality Checks (run all before committing)
uv run ruff check src tests       # Lint with ruff
uv run ruff format src tests      # Format with ruff
uv run mypy src                   # Type checking
uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire  # Tests with coverage

# Testing Commands
uv run pytest tests/unit/ -v -m "not openai" -p no:logfire              # Run unit tests (excludes OpenAI tests)
uv run pytest tests/ -v -m "huggingface" -p no:logfire                 # Run HuggingFace tests
uv run pytest tests/ -v -p no:logfire                                  # Run all tests
uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire  # Tests with terminal coverage
uv run pytest --cov=src --cov-report=html -p no:logfire                # Generate HTML coverage report (opens htmlcov/index.html)

# Documentation Commands
uv run mkdocs build                # Build documentation
uv run mkdocs serve                # Serve documentation locally (http://127.0.0.1:8000)

Test Markers

The project uses pytest markers to categorize tests. See Testing Guidelines for details:

unit: Unit tests (mocked, fast)
integration: Integration tests (real APIs)
slow: Slow tests
openai: Tests requiring OpenAI API key
huggingface: Tests requiring HuggingFace API key
embedding_provider: Tests requiring API-based embedding providers
local_embeddings: Tests using local embeddings

Note: The -p no:logfire flag disables the logfire plugin to avoid conflicts during testing.

Code Style & Conventions

Type Safety

ALWAYS use type hints for all function parameters and return types
Use mypy --strict compliance (no Any unless absolutely necessary)
Use TYPE_CHECKING imports for circular dependencies:

TYPE_CHECKING Import Pattern start_line:8 end_line:11

Pydantic Models

All data exchange uses Pydantic models (src/utils/models.py)
Models are frozen (model_config = {"frozen": True}) for immutability
Use Field() with descriptions for all model fields
Validate with ge=, le=, min_length=, max_length= constraints

Async Patterns

ALL I/O operations must be async (async def, await)
Use asyncio.gather() for parallel operations
CPU-bound work (embeddings, parsing) must use run_in_executor():

loop = asyncio.get_running_loop()
result = await loop.run_in_executor(None, cpu_bound_function, args)

Never block the event loop with synchronous I/O

Linting

Ruff with 100-char line length
Ignore rules documented in pyproject.toml:
- PLR0913: Too many arguments (agents need many params)
- PLR0912: Too many branches (complex orchestrator logic)
- PLR0911: Too many return statements (complex agent logic)
- PLR2004: Magic values (statistical constants)
- PLW0603: Global statement (singleton pattern)
- PLC0415: Lazy imports for optional dependencies

Pre-commit

Pre-commit hooks run automatically on commit
Must pass: lint + typecheck + test-cov
Install hooks with: uv run pre-commit install
Note: uv sync --all-extras installs the pre-commit package, but you must run uv run pre-commit install separately to set up the git hooks

Error Handling & Logging

Exception Hierarchy

Use custom exception hierarchy (src/utils/exceptions.py):

Exception Hierarchy start_line:4 end_line:31

Error Handling Rules

Always chain exceptions: raise SearchError(...) from e
Log errors with context using structlog:

logger.error("Operation failed", error=str(e), context=value)

Never silently swallow exceptions
Provide actionable error messages

Logging

Use structlog for all logging (NOT print or logging)
Import: import structlog; logger = structlog.get_logger()
Log with structured data: logger.info("event", key=value)
Use appropriate levels: DEBUG, INFO, WARNING, ERROR

Logging Examples

logger.info("Starting search", query=query, tools=[t.name for t in tools])
logger.warning("Search tool failed", tool=tool.name, error=str(result))
logger.error("Assessment failed", error=str(e))

Error Chaining

Always preserve exception context:

try:
    result = await api_call()
except httpx.HTTPError as e:
    raise SearchError(f"API call failed: {e}") from e

Testing Requirements

Test Structure

Unit tests in tests/unit/ (mocked, fast)
Integration tests in tests/integration/ (real APIs, marked @pytest.mark.integration)
Use markers: unit, integration, slow

Mocking

Use respx for httpx mocking
Use pytest-mock for general mocking
Mock LLM calls in unit tests (use MockJudgeHandler)
Fixtures in tests/conftest.py: mock_httpx_client, mock_llm_response

TDD Workflow

Write failing test in tests/unit/
Implement in src/
Ensure test passes
Run checks: uv run ruff check src tests && uv run mypy src && uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire

Test Examples

@pytest.mark.unit
async def test_pubmed_search(mock_httpx_client):
    tool = PubMedTool()
    results = await tool.search("metformin", max_results=5)
    assert len(results) > 0
    assert all(isinstance(r, Evidence) for r in results)

@pytest.mark.integration
async def test_real_pubmed_search():
    tool = PubMedTool()
    results = await tool.search("metformin", max_results=3)
    assert len(results) <= 3

Test Coverage

Run uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire for coverage report
Run uv run pytest --cov=src --cov-report=html -p no:logfire for HTML coverage report (opens htmlcov/index.html)
Aim for >80% coverage on critical paths
Exclude: __init__.py, TYPE_CHECKING blocks

Implementation Patterns

Search Tools

All tools implement SearchTool protocol (src/tools/base.py):

Must have name property
Must implement async def search(query, max_results) -> list[Evidence]
Use @retry decorator from tenacity for resilience
Rate limiting: Implement _rate_limit() for APIs with limits (e.g., PubMed)
Error handling: Raise SearchError or RateLimitError on failures

Example pattern:

class MySearchTool:
    @property
    def name(self) -> str:
        return "mytool"
    
    @retry(stop=stop_after_attempt(3), wait=wait_exponential(...))
    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
        # Implementation
        return evidence_list

Judge Handlers

Implement JudgeHandlerProtocol (async def assess(question, evidence) -> JudgeAssessment)
Use pydantic-ai Agent with output_type=JudgeAssessment
System prompts in src/prompts/judge.py
Support fallback handlers: MockJudgeHandler, HFInferenceJudgeHandler
Always return valid JudgeAssessment (never raise exceptions)

Agent Factory Pattern

Use factory functions for creating agents (src/agent_factory/)
Lazy initialization for optional dependencies (e.g., embeddings, Modal)
Check requirements before initialization:

Check Magentic Requirements start_line:152 end_line:170

State Management

Magentic Mode: Use ContextVar for thread-safe state (src/agents/state.py)
Simple Mode: Pass state via function parameters
Never use global mutable state (except singletons via @lru_cache)

Singleton Pattern

Use @lru_cache(maxsize=1) for singletons:

Singleton Pattern Example start_line:252 end_line:255

Lazy initialization to avoid requiring dependencies at import time

Code Quality & Documentation

Docstrings

Google-style docstrings for all public functions
Include Args, Returns, Raises sections
Use type hints in docstrings only if needed for clarity

Example:

Search Method Docstring Example start_line:51 end_line:58

Code Comments

Explain WHY, not WHAT
Document non-obvious patterns (e.g., why requests not httpx for ClinicalTrials)
Mark critical sections: # CRITICAL: ...
Document rate limiting rationale
Explain async patterns when non-obvious

Prompt Engineering & Citation Validation

Judge Prompts

System prompt in src/prompts/judge.py
Format evidence with truncation (1500 chars per item)
Handle empty evidence case separately
Always request structured JSON output
Use format_user_prompt() and format_empty_evidence_prompt() helpers

Hypothesis Prompts

Use diverse evidence selection (MMR algorithm)
Sentence-aware truncation (truncate_at_sentence())
Format: Drug → Target → Pathway → Effect
System prompt emphasizes mechanistic reasoning
Use format_hypothesis_prompt() with embeddings for diversity

Report Prompts

Include full citation details for validation
Use diverse evidence selection (n=20)
CRITICAL: Emphasize citation validation rules
Format hypotheses with support/contradiction counts
System prompt includes explicit JSON structure requirements

Citation Validation

ALWAYS validate references before returning reports
Use validate_references() from src/utils/citation_validator.py
Remove hallucinated citations (URLs not in evidence)
Log warnings for removed citations
Never trust LLM-generated citations without validation

Citation Validation Rules

Every reference URL must EXACTLY match a provided evidence URL
Do NOT invent, fabricate, or hallucinate any references
Do NOT modify paper titles, authors, dates, or URLs
If unsure about a citation, OMIT it rather than guess
Copy URLs exactly as provided - do not create similar-looking URLs

Evidence Selection

Use select_diverse_evidence() for MMR-based selection
Balance relevance vs diversity (lambda=0.7 default)
Sentence-aware truncation preserves meaning
Limit evidence per prompt to avoid context overflow

MCP Integration

MCP Tools

Functions in src/mcp_tools.py for Claude Desktop
Full type hints required
Google-style docstrings with Args/Returns sections
Formatted string returns (markdown)

Gradio MCP Server

Enable with mcp_server=True in demo.launch()
Endpoint: /gradio_api/mcp/
Use ssr_mode=False to fix hydration issues in HF Spaces

Common Pitfalls

Blocking the event loop: Never use sync I/O in async functions
Missing type hints: All functions must have complete type annotations
Hallucinated citations: Always validate references
Global mutable state: Use ContextVar or pass via parameters
Import errors: Lazy-load optional dependencies (magentic, modal, embeddings)
Rate limiting: Always implement for external APIs
Error chaining: Always use from e when raising exceptions

Key Principles

Type Safety First: All code must pass mypy --strict
Async Everything: All I/O must be async
Test-Driven: Write tests before implementation
No Hallucinations: Validate all citations
Graceful Degradation: Support free tier (HF Inference) when no API keys
Lazy Loading: Don't require optional dependencies at import time
Structured Logging: Use structlog, never print()
Error Chaining: Always preserve exception context

Pull Request Process

Ensure all checks pass: uv run ruff check src tests && uv run mypy src && uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire
Update documentation if needed
Add tests for new features
Update CHANGELOG if applicable
Request review from maintainers
Address review feedback
Wait for approval before merging

Project Structure

src/: Main source code
tests/: Test files (unit/ and integration/)
docs/: Documentation source files (MkDocs)
examples/: Example usage scripts
pyproject.toml: Project configuration and dependencies
.pre-commit-config.yaml: Pre-commit hook configuration

Questions?

Open an issue on GitHub
Check existing documentation
Review code examples in the codebase

Thank you for contributing to The DETERMINATOR!