Spaces:

awellis
/

bfh-studadmin-assist

Sleeping

awellis commited on Oct 7

Commit

5df4a2a

1 Parent(s): be61100

Implement modular RAG email assistant architecture

- Add modular src/ structure with all components
- Implement document processing (loader, chunker)
- Implement OpenSearch indexing with hybrid retrieval
- Implement PydanticAI agents (intent, composer, fact checker)
- Implement pipeline orchestrator
- Add Gradio UI with draft refinement
- Create document ingestion script
- Update dependencies and configuration
- Add comprehensive documentation (README, QUICKSTART, CLAUDE.md)

Files changed (27) hide show

.env.example +30 -0
.gitignore +3 -0
CLAUDE.md +17 -9
QUICKSTART.md +116 -0
README.md +256 -16
app.py +12 -62
requirements.txt +29 -52
scripts/__init__.py +1 -0
scripts/ingest_documents.py +104 -0
src/__init__.py +3 -0
src/agents/__init__.py +7 -0
src/agents/composer_agent.py +155 -0
src/agents/fact_checker_agent.py +159 -0
src/agents/intent_agent.py +100 -0
src/config.py +141 -0
src/document_processing/__init__.py +6 -0
src/document_processing/chunker.py +78 -0
src/document_processing/loader.py +65 -0
src/indexing/__init__.py +6 -0
src/indexing/indexer.py +106 -0
src/indexing/opensearch_client.py +167 -0
src/pipeline/__init__.py +5 -0
src/pipeline/orchestrator.py +192 -0
src/retrieval/__init__.py +5 -0
src/retrieval/hybrid_retriever.py +201 -0
src/ui/__init__.py +5 -0
src/ui/gradio_app.py +285 -0

.env.example ADDED Viewed

	@@ -0,0 +1,30 @@

+# OpenAI Configuration
+OPENAI_API_KEY=your_openai_api_key_here
+LLM_MODEL=gpt-4o
+EMBEDDING_MODEL=text-embedding-3-small
+LLM_TEMPERATURE=0.7
+LLM_MAX_TOKENS=2000
+# OpenSearch Configuration
+OPENSEARCH_HOST=localhost
+OPENSEARCH_PORT=9200
+OPENSEARCH_USER=admin
+OPENSEARCH_PASSWORD=your_password_here
+OPENSEARCH_USE_SSL=true
+OPENSEARCH_VERIFY_CERTS=false
+INDEX_NAME=bfh_admin_docs
+# Document Processing Configuration
+DOCUMENTS_PATH=assets/markdown
+CHUNK_SIZE=300
+CHUNK_OVERLAP=50
+MIN_CHUNK_SIZE=100
+# Retrieval Configuration
+RETRIEVAL_TOP_K=5
+BM25_WEIGHT=0.5
+VECTOR_WEIGHT=0.5
+MIN_RELEVANCE_SCORE=0.3
+# Application Configuration
+DEBUG=false

.gitignore CHANGED Viewed

@@ -142,6 +142,9 @@ gradio_cached_examples/
 flagged/
 *.db
 # IDE
 .vscode/
 .idea/

 flagged/
 *.db
+# Baseline reference file
+rag_email_assistant_haystack_2_pydantic_ai_gradio_modular_2025_baseline.py
 # IDE
 .vscode/
 .idea/

CLAUDE.md CHANGED Viewed

@@ -6,14 +6,16 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 This is a RAG (Retrieval-Augmented Generation) Email Assistant system designed for university administrative staff at BFH (Bern University of Applied Sciences). The system uses Haystack 2 for document processing and retrieval, PydanticAI for multi-agent orchestration, and Gradio for the user interface.
-**Current Status**: The project is in initial development. The baseline implementation exists as a single monolithic file that needs to be split into the proper modular structure.
 ## Key Files
-- **[rag_email_assistant_haystack_2_pydantic_ai_gradio_modular_2025_baseline.py](rag_email_assistant_haystack_2_pydantic_ai_gradio_modular_2025_baseline.py)**: Complete baseline implementation containing all code for the system. This file has comments indicating how to split it into modules.
 - **[docs/RAG_Email_Assistant_Specifications_v1.0.md](docs/RAG_Email_Assistant_Specifications_v1.0.md)**: Comprehensive specification document defining architecture, components, and implementation details.
-- **[app.py](app.py)**: Legacy Hugging Face Spaces demo file (not part of the RAG assistant).
-- **assets/markdown/**: Directory containing administrative documents in markdown format (forms, information sheets) that serve as the knowledge base.
 ## Architecture
@@ -28,9 +30,9 @@ The system follows a multi-agent RAG architecture with three main stages:
 3. **Gradio UI**: Interactive interface for composing and refining email responses.
-### Target Module Structure
-Per the specification, the monolithic baseline should be split into:
 ```
 src/
 ├── config.py                    # Configuration management
@@ -62,11 +64,17 @@ src/
 ### Running the Application
 ```bash
-# Once modularized, the main entry point will be:
 python -m src.ui.gradio_app
-# For baseline (current single-file version):
-python rag_email_assistant_haystack_2_pydantic_ai_gradio_modular_2025_baseline.py
 ```
 ### Environment Setup

 This is a RAG (Retrieval-Augmented Generation) Email Assistant system designed for university administrative staff at BFH (Bern University of Applied Sciences). The system uses Haystack 2 for document processing and retrieval, PydanticAI for multi-agent orchestration, and Gradio for the user interface.
+**Current Status**: The project has been fully implemented with a modular architecture. The baseline reference file is kept for reference but the production code is in the `src/` directory.
 ## Key Files
 - **[docs/RAG_Email_Assistant_Specifications_v1.0.md](docs/RAG_Email_Assistant_Specifications_v1.0.md)**: Comprehensive specification document defining architecture, components, and implementation details.
+- **[app.py](app.py)**: Main entry point for Hugging Face Spaces deployment.
+- **[src/](src/)**: Production implementation with modular architecture.
+- **[scripts/ingest_documents.py](scripts/ingest_documents.py)**: Script to load, chunk, and index documents.
+- **[assets/markdown/](assets/markdown/)**: Directory containing administrative documents in markdown format (forms, information sheets) that serve as the knowledge base.
+- **rag_email_assistant_haystack_2_pydantic_ai_gradio_modular_2025_baseline.py**: Reference baseline (gitignored).
 ## Architecture
 3. **Gradio UI**: Interactive interface for composing and refining email responses.
+### Module Structure
+The implemented modular structure:
 ```
 src/
 ├── config.py                    # Configuration management
 ### Running the Application
 ```bash
+# Main entry point (for Hugging Face Spaces and local):
+python app.py
+# Or run the UI module directly:
 python -m src.ui.gradio_app
+```
+### Document Ingestion
+```bash
+# Index markdown documents before first run:
+python scripts/ingest_documents.py
 ```
 ### Environment Setup

QUICKSTART.md ADDED Viewed

	@@ -0,0 +1,116 @@

+# Quick Start Guide
+## Prerequisites
+1. **Python 3.10+** installed
+2. **OpenSearch instance** running with k-NN plugin enabled
+3. **OpenAI API key**
+## Setup (5 minutes)
+### 1. Install Dependencies
+```bash
+pip install -r requirements.txt
+```
+### 2. Configure Environment
+```bash
+# Copy the example environment file
+cp .env.example .env
+# Edit .env and add your credentials
+nano .env  # or use your preferred editor
+```
+**Required variables:**
+- `OPENAI_API_KEY` - Your OpenAI API key
+- `OPENSEARCH_HOST` - OpenSearch host (e.g., localhost)
+- `OPENSEARCH_PORT` - OpenSearch port (e.g., 9200)
+- `OPENSEARCH_USER` - OpenSearch username
+- `OPENSEARCH_PASSWORD` - OpenSearch password
+### 3. Index Documents
+```bash
+python scripts/ingest_documents.py
+```
+This will:
+- Load markdown documents from `assets/markdown/`
+- Chunk them semantically
+- Generate embeddings
+- Index in OpenSearch
+Expected output:
+```
+Successfully indexed X document chunks
+Total documents in index: X
+✅ Document ingestion completed successfully!
+```
+### 4. Run the Application
+```bash
+python app.py
+```
+The Gradio interface will launch at `http://localhost:7860`
+## Usage
+1. **Enter a student query** (e.g., "Wie kann ich mich exmatrikulieren?")
+2. **Click "Generate Email Draft"**
+3. **Review the results:**
+   - Intent analysis
+   - Email draft (subject + body)
+   - Fact check results
+   - Source documents used
+4. **Refine if needed** by providing feedback
+## Example Queries
+German:
+- "Wie kann ich mich exmatrikulieren?"
+- "Was kostet eine Namensänderung?"
+- "Ich möchte ein Modul zurückziehen. Was muss ich beachten?"
+- "Welche Fristen gibt es für die Beurlaubung?"
+English:
+- "How can I withdraw from the university?"
+- "What are the fees for changing my name?"
+- "I want to take a leave of absence. What do I need to know?"
+## Troubleshooting
+### Cannot connect to OpenSearch
+- Check that OpenSearch is running: `curl -X GET "localhost:9200"`
+- Verify credentials in `.env`
+- Check firewall settings
+### No documents indexed
+- Verify markdown files exist in `assets/markdown/`
+- Check OpenSearch index: `curl -X GET "localhost:9200/_cat/indices"`
+- Review ingestion script logs
+### OpenAI API errors
+- Verify API key in `.env`
+- Check API quota and billing
+- Ensure internet connectivity
+## Next Steps
+- Review [README.md](README.md) for full documentation
+- Check [docs/RAG_Email_Assistant_Specifications_v1.0.md](docs/RAG_Email_Assistant_Specifications_v1.0.md) for architecture details
+- See [CLAUDE.md](CLAUDE.md) for development guidance
+## Support
+For issues, please check:
+1. Environment variables are correctly set
+2. OpenSearch is accessible
+3. Documents are properly indexed
+4. API keys are valid
+Need help? Open an issue on GitHub.

README.md CHANGED Viewed

@@ -1,16 +1,256 @@
----
-title: Bfh Studadmin Assist
-emoji: 💬
-colorFrom: yellow
-colorTo: purple
-sdk: gradio
-sdk_version: 5.49.0
-app_file: app.py
-pinned: false
-hf_oauth: true
-hf_oauth_scopes:
-- inference-api
-license: mit
----
-An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).

+---
+title: BFH Student Administration Assistant
+emoji: 📧
+colorFrom: yellow
+colorTo: purple
+sdk: gradio
+sdk_version: 5.49.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# 📧 BFH Student Administration Email Assistant
+AI-powered RAG (Retrieval-Augmented Generation) email assistant for university administrative staff at BFH (Bern University of Applied Sciences).
+## Overview
+This system helps administrative staff compose accurate, professional email responses to student inquiries using:
+- **Haystack 2**: Document processing and hybrid retrieval (BM25 + semantic search)
+- **OpenSearch**: Vector database with k-NN support
+- **PydanticAI**: Multi-agent orchestration with structured outputs
+- **Gradio**: Interactive web interface
+- **OpenAI GPT-4o**: Language model for intent extraction, composition, and fact-checking
+## Features
+### 🎯 Multi-Agent Architecture
+1. **Intent Extraction Agent**: Analyzes queries to extract structured intent (action type, topic, urgency, language)
+2. **Composer Agent**: Drafts professional email responses using retrieved context
+3. **Fact Checker Agent**: Validates drafts against source documents for accuracy
+### 🔍 Hybrid Retrieval
+- Combines BM25 (keyword-based) and dense vector search
+- Configurable scoring weights
+- Retrieves relevant administrative documents and forms
+### ✉️ Email Composition
+- Multilingual support (German, English, French)
+- Professional tone and formatting
+- Context-aware responses based on university policies
+- Draft refinement based on user feedback
+### ✅ Fact Checking
+- Automated verification against source documents
+- Accuracy scoring
+- Issue identification and suggestions
+- Chain-of-thought reasoning
+## Project Structure
+```
+bfh-studadmin-assist/
+├── src/
+│   ├── config.py                    # Configuration management
+│   ├── document_processing/
+│   │   ├── loader.py                # Markdown document loading
+│   │   └── chunker.py               # Semantic chunking
+│   ├── indexing/
+│   │   ├── opensearch_client.py     # OpenSearch client
+│   │   └── indexer.py               # Document indexing
+│   ├── retrieval/
+│   │   └── hybrid_retriever.py      # Hybrid BM25 + vector search
+│   ├── agents/
+│   │   ├── intent_agent.py          # Intent extraction
+│   │   ├── composer_agent.py        # Email composition
+│   │   └── fact_checker_agent.py   # Fact checking
+│   ├── pipeline/
+│   │   └── orchestrator.py          # Multi-agent orchestration
+│   └── ui/
+│       └── gradio_app.py            # Gradio interface
+├── scripts/
+│   └── ingest_documents.py          # Document ingestion script
+├── assets/
+│   └── markdown/                    # Administrative documents (German)
+├── docs/
+│   └── RAG_Email_Assistant_Specifications_v1.0.md
+├── app.py                           # Main entry point
+├── requirements.txt
+├── .env.example
+└── README.md
+```
+## Setup
+### Prerequisites
+- Python 3.10+
+- OpenSearch instance (with k-NN plugin enabled)
+- OpenAI API key
+### Installation
+1. Clone the repository:
+```bash
+git clone https://github.com/yourusername/bfh-studadmin-assist.git
+cd bfh-studadmin-assist
+```
+2. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+3. Configure environment variables:
+```bash
+cp .env.example .env
+# Edit .env with your configuration
+```
+Required environment variables:
+- `OPENAI_API_KEY`: Your OpenAI API key
+- `OPENSEARCH_HOST`: OpenSearch host
+- `OPENSEARCH_PORT`: OpenSearch port
+- `OPENSEARCH_USER`: OpenSearch username
+- `OPENSEARCH_PASSWORD`: OpenSearch password
+- `INDEX_NAME`: Name of the OpenSearch index
+### Document Ingestion
+Before running the application, index the administrative documents:
+```bash
+python scripts/ingest_documents.py
+```
+This will:
+1. Load markdown documents from `assets/markdown/`
+2. Chunk documents using semantic splitting
+3. Generate embeddings using OpenAI
+4. Index documents in OpenSearch with hybrid retrieval support
+### Running the Application
+**Local development:**
+```bash
+python app.py
+```
+**Production (Hugging Face Spaces):**
+The app is configured for automatic deployment to Hugging Face Spaces via `app.py`.
+## Usage
+1. **Enter a student query** in the text area
+2. **Click "Generate Email Draft"** to process the query
+3. Review the generated email and analysis:
+   - Intent analysis
+   - Email subject and body
+   - Fact check results
+   - Retrieved source documents
+4. **Refine the draft** by providing feedback and clicking "Refine Draft"
+## Configuration
+Key configuration options in `.env`:
+### LLM Configuration
+- `LLM_MODEL`: OpenAI model (default: gpt-4o)
+- `EMBEDDING_MODEL`: Embedding model (default: text-embedding-3-small)
+- `LLM_TEMPERATURE`: Temperature for generation (0-1)
+- `LLM_MAX_TOKENS`: Maximum tokens per response
+### Document Processing
+- `DOCUMENTS_PATH`: Path to markdown documents
+- `CHUNK_SIZE`: Target words per chunk
+- `CHUNK_OVERLAP`: Word overlap between chunks
+### Retrieval
+- `RETRIEVAL_TOP_K`: Number of documents to retrieve
+- `BM25_WEIGHT`: Weight for BM25 score (0-1)
+- `VECTOR_WEIGHT`: Weight for vector similarity (0-1)
+- `MIN_RELEVANCE_SCORE`: Minimum score threshold
+## Administrative Documents
+The system uses administrative documents from BFH including:
+- Exmatriculation forms and procedures
+- Leave of absence (Beurlaubung) information
+- Name change forms
+- Insurance information (AHV, health insurance)
+- Fee schedules
+- Course withdrawal procedures
+Documents are stored as markdown in `assets/markdown/`.
+## Development
+### Adding New Documents
+1. Add markdown files to `assets/markdown/`
+2. Run the ingestion script:
+   ```bash
+   python scripts/ingest_documents.py
+   ```
+### Testing
+Run the application locally and test with sample queries:
+- "Wie kann ich mich exmatrikulieren?"
+- "What are the fees for changing my name?"
+- "Ich möchte ein Modul zurückziehen."
+### Extending the System
+- **Add new agents**: Create new agent classes in `src/agents/`
+- **Customize prompts**: Edit system prompts in agent initialization
+- **Add new retrievers**: Implement in `src/retrieval/`
+- **Modify UI**: Edit `src/ui/gradio_app.py`
+## Technical Details
+### Haystack Pipeline
+The system uses Haystack 2 components:
+- `MarkdownToDocument`: Convert markdown files to documents
+- `DocumentSplitter`: Semantic chunking
+- `OpenAIDocumentEmbedder`: Generate embeddings
+- `OpenSearchDocumentStore`: Store and retrieve documents
+- `OpenSearchBM25Retriever`: Keyword-based retrieval
+- `OpenSearchEmbeddingRetriever`: Vector-based retrieval
+### PydanticAI Agents
+Agents use structured outputs with Pydantic models:
+- `IntentData`: Structured intent information
+- `EmailDraft`: Email with metadata
+- `FactCheckResult`: Verification results
+### OpenSearch Index Mapping
+The index uses:
+- Text fields with BM25 for keyword search
+- k-NN vector fields for semantic search
+- Metadata fields for filtering and display
+## License
+MIT License - See LICENSE file for details
+## Acknowledgments
+- Built for BFH (Bern University of Applied Sciences)
+- Uses Haystack by deepset
+- Powered by OpenAI GPT-4o
+- UI built with Gradio
+## Support
+For issues or questions, please open an issue on GitHub.

app.py CHANGED Viewed

@@ -1,70 +1,20 @@
-import gradio as gr
-from huggingface_hub import InferenceClient
-def respond(
-    message,
-    history: list[dict[str, str]],
-    system_message,
-    max_tokens,
-    temperature,
-    top_p,
-    hf_token: gr.OAuthToken,
-):
-    """
-    For more information on `huggingface_hub` Inference API support, please check the docs: https://huggingface.co/docs/huggingface_hub/v0.22.2/en/guides/inference
-    """
-    client = InferenceClient(token=hf_token.token, model="openai/gpt-oss-20b")
-    messages = [{"role": "system", "content": system_message}]
-    messages.extend(history)
-    messages.append({"role": "user", "content": message})
-    response = ""
-    for message in client.chat_completion(
-        messages,
-        max_tokens=max_tokens,
-        stream=True,
-        temperature=temperature,
-        top_p=top_p,
-    ):
-        choices = message.choices
-        token = ""
-        if len(choices) and choices[0].delta.content:
-            token = choices[0].delta.content
-        response += token
-        yield response
-"""
-For information on how to customize the ChatInterface, peruse the gradio docs: https://www.gradio.app/docs/chatinterface
-"""
-chatbot = gr.ChatInterface(
-    respond,
-    type="messages",
-    additional_inputs=[
-        gr.Textbox(value="You are a friendly Chatbot.", label="System message"),
-        gr.Slider(minimum=1, maximum=2048, value=512, step=1, label="Max new tokens"),
-        gr.Slider(minimum=0.1, maximum=4.0, value=0.7, step=0.1, label="Temperature"),
-        gr.Slider(
-            minimum=0.1,
-            maximum=1.0,
-            value=0.95,
-            step=0.05,
-            label="Top-p (nucleus sampling)",
-        ),
-    ],
 )
-with gr.Blocks() as demo:
-    with gr.Sidebar():
-        gr.LoginButton()
-    chatbot.render()
 if __name__ == "__main__":
     demo.launch()

+"""Main application entry point for Hugging Face Spaces deployment."""
+import logging
+from src.ui.gradio_app import create_gradio_interface
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
 )
+logger = logging.getLogger(__name__)
+# Create and launch the Gradio interface
+logger.info("Starting BFH Student Administration Email Assistant...")
+demo = create_gradio_interface()
 if __name__ == "__main__":
     demo.launch()

requirements.txt CHANGED Viewed

@@ -1,60 +1,37 @@
-aiofiles==24.1.0
-annotated-types==0.7.0
-anyio==4.11.0
-audioop-lts==0.2.2
-Brotli==1.1.0
-certifi==2025.10.5
-charset-normalizer==3.4.3
-click==8.3.0
-distro==1.9.0
-fastapi==0.118.0
-ffmpy==0.6.1
-filelock==3.19.1
-fsspec==2025.9.0
 gradio==5.49.0
 gradio_client==1.13.3
-groovy==0.1.2
-h11==0.16.0
-hf-xet==1.1.10
-httpcore==1.0.9
 httpx==0.28.1
-huggingface-hub==0.35.3
-idna==3.10
-Jinja2==3.1.6
-jiter==0.11.0
-markdown-it-py==4.0.0
-MarkupSafe==3.0.3
-mdurl==0.1.2
-numpy==2.3.3
-openai==2.2.0
-orjson==3.11.3
-packaging==25.0
-pandas==2.3.3
-pillow==11.3.0
-pydantic==2.11.10
-pydantic_core==2.33.2
-pydub==0.25.1
-Pygments==2.19.2
-python-dateutil==2.9.0.post0
-python-dotenv==1.1.1
-python-multipart==0.0.20
-pytz==2025.2
-PyYAML==6.0.3
 requests==2.32.5
 rich==14.1.0
-ruff==0.13.3
-safehttpx==0.1.6
-semantic-version==2.10.0
-shellingham==1.5.4
-six==1.17.0
-sniffio==1.3.1
-starlette==0.48.0
-tomlkit==0.13.3
 tqdm==4.67.1
-typer==0.19.2
-typing-inspection==0.4.2
-typing_extensions==4.15.0
-tzdata==2025.2
-urllib3==2.5.0
 uvicorn==0.37.0
 websockets==15.0.1

+# Core dependencies
+python-dotenv==1.1.1
+# Haystack and integrations
+haystack-ai==2.8.0
+opensearch-haystack==1.1.0
+opensearch-py==2.8.0
+# PydanticAI for agents
+pydantic-ai==0.0.14
+pydantic==2.11.10
+pydantic_core==2.33.2
+# OpenAI
+openai==2.2.0
+# Gradio UI
 gradio==5.49.0
 gradio_client==1.13.3
+# HTTP and async
+aiofiles==24.1.0
 httpx==0.28.1
+httpcore==1.0.9
 requests==2.32.5
+certifi==2025.10.5
+# Utilities
 rich==14.1.0
 tqdm==4.67.1
+PyYAML==6.0.3
+# Supporting packages for Gradio
+fastapi==0.118.0
 uvicorn==0.37.0
 websockets==15.0.1
+huggingface-hub==0.35.3

scripts/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Scripts for document ingestion and maintenance."""

scripts/ingest_documents.py ADDED Viewed

	@@ -0,0 +1,104 @@

+#!/usr/bin/env python3
+"""Script to ingest and index markdown documents."""
+import sys
+import logging
+from pathlib import Path
+# Add src to path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+from src.config import get_config
+from src.document_processing.loader import MarkdownDocumentLoader
+from src.document_processing.chunker import SemanticChunker
+from src.indexing.opensearch_client import OpenSearchClient
+from src.indexing.indexer import DocumentIndexer
+def setup_logging():
+    """Configure logging."""
+    logging.basicConfig(
+        level=logging.INFO,
+        format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+    )
+def main():
+    """Main ingestion workflow."""
+    setup_logging()
+    logger = logging.getLogger(__name__)
+    logger.info("Starting document ingestion process...")
+    # Load configuration
+    config = get_config()
+    logger.info(f"Using documents path: {config.document_processing.documents_path}")
+    logger.info(f"Target index: {config.opensearch.index_name}")
+    # Initialize OpenSearch client
+    logger.info("Connecting to OpenSearch...")
+    os_client = OpenSearchClient(config.opensearch)
+    if not os_client.ping():
+        logger.error("Failed to connect to OpenSearch. Please check your configuration.")
+        sys.exit(1)
+    logger.info("Successfully connected to OpenSearch")
+    # Create or recreate index
+    logger.info("Setting up index...")
+    if os_client.index_exists():
+        logger.warning(f"Index '{config.opensearch.index_name}' already exists")
+        response = input("Do you want to delete and recreate it? (yes/no): ")
+        if response.lower() in ["yes", "y"]:
+            logger.info("Deleting existing index...")
+            os_client.delete_index()
+            os_client.create_index(embedding_dim=1536)
+        else:
+            logger.info("Using existing index")
+    else:
+        os_client.create_index(embedding_dim=1536)
+    # Load documents
+    logger.info("Loading markdown documents...")
+    loader = MarkdownDocumentLoader(config.document_processing.documents_path)
+    documents = loader.load_documents()
+    if not documents:
+        logger.error("No documents loaded. Exiting.")
+        sys.exit(1)
+    logger.info(f"Loaded {len(documents)} documents")
+    # Chunk documents
+    logger.info("Chunking documents...")
+    chunker = SemanticChunker(
+        chunk_size=config.document_processing.chunk_size,
+        chunk_overlap=config.document_processing.chunk_overlap,
+        min_chunk_size=config.document_processing.min_chunk_size,
+    )
+    chunked_documents = chunker.chunk_documents(documents)
+    logger.info(f"Created {len(chunked_documents)} chunks")
+    # Index documents
+    logger.info("Indexing documents in OpenSearch...")
+    indexer = DocumentIndexer(
+        opensearch_config=config.opensearch,
+        llm_config=config.llm,
+    )
+    indexed_count = indexer.index_documents(chunked_documents)
+    logger.info(f"Successfully indexed {indexed_count} document chunks")
+    # Verify
+    final_count = indexer.get_document_count()
+    logger.info(f"Total documents in index: {final_count}")
+    logger.info("✅ Document ingestion completed successfully!")
+if __name__ == "__main__":
+    main()

src/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """BFH Student Administration RAG Email Assistant."""
2	+
3	+ __version__ = "1.0.0"

src/agents/__init__.py ADDED Viewed

	@@ -0,0 +1,7 @@

+"""PydanticAI agents for intent extraction, composition, and fact checking."""
+from .intent_agent import IntentAgent
+from .composer_agent import ComposerAgent
+from .fact_checker_agent import FactCheckerAgent
+__all__ = ["IntentAgent", "ComposerAgent", "FactCheckerAgent"]

src/agents/composer_agent.py ADDED Viewed

	@@ -0,0 +1,155 @@

+"""Email composer agent using PydanticAI."""
+from typing import List
+from pydantic import BaseModel, Field
+from pydantic_ai import Agent
+from haystack import Document
+import logging
+from .intent_agent import IntentData
+logger = logging.getLogger(__name__)
+class EmailDraft(BaseModel):
+    """Structured email draft."""
+    subject: str = Field(description="Email subject line")
+    body: str = Field(description="Email body text")
+    tone: str = Field(
+        default="professional",
+        description="Tone of the email: 'formal', 'professional', 'friendly'",
+    )
+    sources_used: List[str] = Field(
+        default_factory=list,
+        description="List of source documents used in composing the email",
+    )
+    confidence: float = Field(
+        default=0.0,
+        description="Confidence score (0-1) in the accuracy of the response",
+    )
+class ComposerAgent:
+    """Agent for composing email responses."""
+    def __init__(self, api_key: str, model: str = "openai:gpt-4o"):
+        """
+        Initialize the email composer agent.
+        Args:
+            api_key: OpenAI API key
+            model: Model to use for composition
+        """
+        self.agent = Agent(
+            model,
+            result_type=EmailDraft,
+            system_prompt="""You are an expert email composer for BFH (Bern University of Applied Sciences) administrative staff.
+Your task is to compose professional, accurate, and helpful email responses to student inquiries based on:
+1. The user's query and extracted intent
+2. Retrieved relevant documents from the knowledge base
+3. University policies and procedures
+Guidelines for email composition:
+- Write in the same language as the query (German, English, or French)
+- Use a professional but friendly tone
+- Be clear, concise, and accurate
+- Reference specific forms, deadlines, or procedures when relevant
+- Include concrete next steps or actions for the student
+- Cite information from the retrieved documents
+- If information is incomplete, acknowledge what you can't answer
+- Use appropriate greeting and closing
+- Structure the email logically with paragraphs
+For German emails:
+- Use formal "Sie" form
+- Common greetings: "Guten Tag", "Sehr geehrte/r [Name]"
+- Common closings: "Freundliche Grüsse", "Mit freundlichen Grüssen"
+For English emails:
+- Use professional greeting: "Dear [Name]" or "Hello"
+- Common closings: "Best regards", "Kind regards"
+Track which source documents you used and estimate your confidence in the response accuracy.""",
+        )
+    async def compose_email(
+        self, query: str, intent: IntentData, context_docs: List[Document]
+    ) -> EmailDraft:
+        """
+        Compose an email response.
+        Args:
+            query: Original user query
+            intent: Extracted intent data
+            context_docs: Retrieved context documents
+        Returns:
+            Email draft
+        """
+        logger.info(f"Composing email for topic: {intent.topic}")
+        # Build context from documents
+        context_text = self._build_context(context_docs)
+        # Create prompt with all information
+        prompt = f"""Compose an email response for the following query.
+User Query: {query}
+Intent Analysis:
+- Action Type: {intent.action_type}
+- Topic: {intent.topic}
+- Language: {intent.language}
+- Urgency: {intent.urgency}
+- Key Entities: {', '.join(intent.key_entities) if intent.key_entities else 'None'}
+- Specific Questions: {', '.join(intent.specific_questions) if intent.specific_questions else 'None'}
+Retrieved Context from Knowledge Base:
+{context_text}
+Based on this information, compose a complete email response that addresses the user's query professionally and accurately."""
+        try:
+            result = await self.agent.run(prompt)
+            draft = result.data
+            logger.info(f"Composed email - Subject: {draft.subject}")
+            logger.debug(f"Confidence: {draft.confidence}")
+            return draft
+        except Exception as e:
+            logger.error(f"Error composing email: {e}")
+            # Return minimal draft on error
+            return EmailDraft(
+                subject="Ihre Anfrage / Your Inquiry",
+                body="Vielen Dank für Ihre Anfrage. Wir werden uns in Kürze bei Ihnen melden.\n\nThank you for your inquiry. We will get back to you shortly.",
+                tone="professional",
+                confidence=0.0,
+            )
+    def _build_context(self, documents: List[Document]) -> str:
+        """
+        Build context text from retrieved documents.
+        Args:
+            documents: List of retrieved documents
+        Returns:
+            Formatted context text
+        """
+        if not documents:
+            return "No relevant documents found in the knowledge base."
+        context_parts = []
+        for i, doc in enumerate(documents, 1):
+            source = doc.meta.get("source_file", "Unknown") if doc.meta else "Unknown"
+            score = doc.score or 0.0
+            context_parts.append(
+                f"--- Document {i} (Source: {source}, Relevance: {score:.2f}) ---\n{doc.content}\n"
+            )
+        return "\n".join(context_parts)

src/agents/fact_checker_agent.py ADDED Viewed

	@@ -0,0 +1,159 @@

+"""Fact checker agent using PydanticAI for validating email responses."""
+from typing import List
+from pydantic import BaseModel, Field
+from pydantic_ai import Agent
+from haystack import Document
+import logging
+from .composer_agent import EmailDraft
+logger = logging.getLogger(__name__)
+class FactCheckResult(BaseModel):
+    """Result of fact checking an email draft."""
+    is_accurate: bool = Field(
+        description="Whether the email content is factually accurate"
+    )
+    accuracy_score: float = Field(
+        description="Overall accuracy score (0-1)",
+        ge=0.0,
+        le=1.0,
+    )
+    issues_found: List[str] = Field(
+        default_factory=list,
+        description="List of factual issues or inaccuracies found",
+    )
+    verification_steps: List[str] = Field(
+        default_factory=list,
+        description="Steps taken to verify the facts",
+    )
+    suggestions: List[str] = Field(
+        default_factory=list,
+        description="Suggestions for improving accuracy or completeness",
+    )
+    verified_claims: List[str] = Field(
+        default_factory=list,
+        description="Claims that were successfully verified against sources",
+    )
+class FactCheckerAgent:
+    """Agent for fact-checking email drafts against source documents."""
+    def __init__(self, api_key: str, model: str = "openai:gpt-4o"):
+        """
+        Initialize the fact checker agent.
+        Args:
+            api_key: OpenAI API key
+            model: Model to use for fact checking
+        """
+        self.agent = Agent(
+            model,
+            result_type=FactCheckResult,
+            system_prompt="""You are an expert fact-checker for university administrative communications.
+Your task is to verify the accuracy of email drafts against source documents from the knowledge base.
+Verification process:
+1. Extract all factual claims from the email (dates, procedures, requirements, fees, deadlines, etc.)
+2. Cross-reference each claim with the provided source documents
+3. Identify any unsupported, incorrect, or contradictory information
+4. Check for completeness - are important details missing?
+5. Verify that references to forms, processes, or policies are accurate
+6. Ensure numerical information (fees, dates, etc.) is correct
+Classification of issues:
+- CRITICAL: Factually incorrect information that could mislead students
+- WARNING: Information not found in sources (may be correct but unverified)
+- SUGGESTION: Missing information that would improve completeness
+Be thorough and precise. University administrative information must be accurate as it affects students' academic status and finances.
+Provide:
+- Overall accuracy assessment
+- Specific issues found with severity level
+- Verification steps you performed
+- Suggestions for improvement
+- List of verified claims""",
+        )
+    async def fact_check(
+        self, email_draft: EmailDraft, source_docs: List[Document]
+    ) -> FactCheckResult:
+        """
+        Fact-check an email draft against source documents.
+        Args:
+            email_draft: Email draft to check
+            source_docs: Source documents used for context
+        Returns:
+            Fact check result with accuracy assessment
+        """
+        logger.info("Fact-checking email draft...")
+        # Build source context
+        source_text = self._build_source_context(source_docs)
+        # Create fact-checking prompt
+        prompt = f"""Fact-check the following email draft against the provided source documents.
+EMAIL DRAFT:
+Subject: {email_draft.subject}
+Body:
+{email_draft.body}
+SOURCE DOCUMENTS:
+{source_text}
+Perform a thorough fact-check and identify any inaccuracies, unsupported claims, or missing important information."""
+        try:
+            result = await self.agent.run(prompt)
+            fact_check_result = result.data
+            logger.info(f"Fact check complete - Accurate: {fact_check_result.is_accurate}")
+            logger.info(f"Accuracy score: {fact_check_result.accuracy_score:.2f}")
+            if fact_check_result.issues_found:
+                logger.warning(f"Issues found: {len(fact_check_result.issues_found)}")
+                for issue in fact_check_result.issues_found:
+                    logger.warning(f"  - {issue}")
+            return fact_check_result
+        except Exception as e:
+            logger.error(f"Error during fact checking: {e}")
+            # Return conservative result on error
+            return FactCheckResult(
+                is_accurate=False,
+                accuracy_score=0.5,
+                issues_found=["Unable to complete fact check due to error"],
+                verification_steps=["Attempted automated fact checking"],
+            )
+    def _build_source_context(self, documents: List[Document]) -> str:
+        """
+        Build formatted source context from documents.
+        Args:
+            documents: List of source documents
+        Returns:
+            Formatted source text
+        """
+        if not documents:
+            return "No source documents provided."
+        context_parts = []
+        for i, doc in enumerate(documents, 1):
+            source = doc.meta.get("source_file", "Unknown") if doc.meta else "Unknown"
+            context_parts.append(f"--- Source {i}: {source} ---\n{doc.content}\n")
+        return "\n".join(context_parts)

src/agents/intent_agent.py ADDED Viewed

	@@ -0,0 +1,100 @@

+"""Intent extraction agent using PydanticAI."""
+from typing import List
+from pydantic import BaseModel, Field
+from pydantic_ai import Agent
+import logging
+logger = logging.getLogger(__name__)
+class IntentData(BaseModel):
+    """Structured intent data extracted from user query."""
+    action_type: str = Field(
+        description="Type of action: 'information_request', 'form_help', 'process_guidance', 'general_inquiry'"
+    )
+    topic: str = Field(
+        description="Main topic or subject of the query (e.g., 'exmatriculation', 'insurance', 'fees')"
+    )
+    key_entities: List[str] = Field(
+        default_factory=list,
+        description="Key entities mentioned (dates, forms, departments, etc.)",
+    )
+    language: str = Field(
+        default="de", description="Detected language of the query (de, en, fr)"
+    )
+    urgency: str = Field(
+        default="normal", description="Urgency level: 'high', 'normal', 'low'"
+    )
+    specific_questions: List[str] = Field(
+        default_factory=list,
+        description="Specific questions or sub-questions identified in the query",
+    )
+class IntentAgent:
+    """Agent for extracting structured intent from user queries."""
+    def __init__(self, api_key: str, model: str = "openai:gpt-4o"):
+        """
+        Initialize the intent extraction agent.
+        Args:
+            api_key: OpenAI API key
+            model: Model to use for intent extraction
+        """
+        self.agent = Agent(
+            model,
+            result_type=IntentData,
+            system_prompt="""You are an expert at analyzing user queries for a university administrative email assistant.
+Your task is to extract structured intent information from user queries. Analyze:
+1. What type of action is being requested (information, form help, process guidance, etc.)
+2. The main topic or subject matter
+3. Key entities mentioned (specific forms, dates, departments, processes)
+4. The language of the query
+5. Urgency level based on context and keywords
+6. Specific questions that need to be answered
+Context: This is for BFH (Bern University of Applied Sciences) administrative staff helping students with:
+- Exmatriculation (leaving university)
+- Leave of absence (Beurlaubung)
+- Name changes
+- Insurance matters (AHV, health insurance)
+- Fees and payments
+- Course withdrawals and deadlines
+Provide accurate, structured intent extraction to help compose appropriate email responses.""",
+        )
+    async def extract_intent(self, query: str) -> IntentData:
+        """
+        Extract intent from user query.
+        Args:
+            query: User's query text
+        Returns:
+            Structured intent data
+        """
+        logger.info("Extracting intent from query...")
+        try:
+            result = await self.agent.run(query)
+            intent = result.data
+            logger.info(f"Extracted intent - Action: {intent.action_type}, Topic: {intent.topic}")
+            logger.debug(f"Full intent: {intent}")
+            return intent
+        except Exception as e:
+            logger.error(f"Error extracting intent: {e}")
+            # Return default intent on error
+            return IntentData(
+                action_type="general_inquiry",
+                topic="unknown",
+                language="de",
+                urgency="normal",
+            )

src/config.py ADDED Viewed

	@@ -0,0 +1,141 @@

+"""Configuration management for the RAG Email Assistant."""
+import os
+from dataclasses import dataclass
+from typing import Optional
+from dotenv import load_dotenv
+# Load environment variables from .env file
+load_dotenv()
+@dataclass
+class OpenSearchConfig:
+    """OpenSearch connection configuration."""
+    host: str
+    port: int
+    user: str
+    password: str
+    index_name: str
+    use_ssl: bool = True
+    verify_certs: bool = False
+    @classmethod
+    def from_env(cls) -> "OpenSearchConfig":
+        """Create configuration from environment variables."""
+        return cls(
+            host=os.getenv("OPENSEARCH_HOST", "localhost"),
+            port=int(os.getenv("OPENSEARCH_PORT", "9200")),
+            user=os.getenv("OPENSEARCH_USER", "admin"),
+            password=os.getenv("OPENSEARCH_PASSWORD", ""),
+            index_name=os.getenv("INDEX_NAME", "bfh_admin_docs"),
+            use_ssl=os.getenv("OPENSEARCH_USE_SSL", "true").lower() == "true",
+            verify_certs=os.getenv("OPENSEARCH_VERIFY_CERTS", "false").lower() == "true",
+        )
+@dataclass
+class LLMConfig:
+    """LLM configuration."""
+    api_key: str
+    model_name: str = "gpt-4o"
+    embedding_model: str = "text-embedding-3-small"
+    temperature: float = 0.7
+    max_tokens: int = 2000
+    @classmethod
+    def from_env(cls) -> "LLMConfig":
+        """Create configuration from environment variables."""
+        api_key = os.getenv("OPENAI_API_KEY", "")
+        if not api_key:
+            raise ValueError("OPENAI_API_KEY environment variable is required")
+        return cls(
+            api_key=api_key,
+            model_name=os.getenv("LLM_MODEL", "gpt-4o"),
+            embedding_model=os.getenv("EMBEDDING_MODEL", "text-embedding-3-small"),
+            temperature=float(os.getenv("LLM_TEMPERATURE", "0.7")),
+            max_tokens=int(os.getenv("LLM_MAX_TOKENS", "2000")),
+        )
+@dataclass
+class DocumentProcessingConfig:
+    """Document processing configuration."""
+    documents_path: str = "assets/markdown"
+    chunk_size: int = 300  # Target words per chunk
+    chunk_overlap: int = 50  # Words overlap between chunks
+    min_chunk_size: int = 100  # Minimum words per chunk
+    @classmethod
+    def from_env(cls) -> "DocumentProcessingConfig":
+        """Create configuration from environment variables."""
+        return cls(
+            documents_path=os.getenv("DOCUMENTS_PATH", "assets/markdown"),
+            chunk_size=int(os.getenv("CHUNK_SIZE", "300")),
+            chunk_overlap=int(os.getenv("CHUNK_OVERLAP", "50")),
+            min_chunk_size=int(os.getenv("MIN_CHUNK_SIZE", "100")),
+        )
+@dataclass
+class RetrievalConfig:
+    """Retrieval configuration."""
+    top_k: int = 5  # Number of documents to retrieve
+    bm25_weight: float = 0.5  # Weight for BM25 score
+    vector_weight: float = 0.5  # Weight for vector similarity score
+    min_score: float = 0.3  # Minimum relevance score threshold
+    @classmethod
+    def from_env(cls) -> "RetrievalConfig":
+        """Create configuration from environment variables."""
+        return cls(
+            top_k=int(os.getenv("RETRIEVAL_TOP_K", "5")),
+            bm25_weight=float(os.getenv("BM25_WEIGHT", "0.5")),
+            vector_weight=float(os.getenv("VECTOR_WEIGHT", "0.5")),
+            min_score=float(os.getenv("MIN_RELEVANCE_SCORE", "0.3")),
+        )
+@dataclass
+class AppConfig:
+    """Main application configuration."""
+    opensearch: OpenSearchConfig
+    llm: LLMConfig
+    document_processing: DocumentProcessingConfig
+    retrieval: RetrievalConfig
+    debug: bool = False
+    @classmethod
+    def from_env(cls) -> "AppConfig":
+        """Create complete configuration from environment variables."""
+        return cls(
+            opensearch=OpenSearchConfig.from_env(),
+            llm=LLMConfig.from_env(),
+            document_processing=DocumentProcessingConfig.from_env(),
+            retrieval=RetrievalConfig.from_env(),
+            debug=os.getenv("DEBUG", "false").lower() == "true",
+        )
+# Global configuration instance
+_config: Optional[AppConfig] = None
+def get_config() -> AppConfig:
+    """Get or create the global configuration instance."""
+    global _config
+    if _config is None:
+        _config = AppConfig.from_env()
+    return _config
+def reset_config():
+    """Reset the global configuration instance (useful for testing)."""
+    global _config
+    _config = None

src/document_processing/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+"""Document processing components for loading and chunking documents."""
+from .loader import MarkdownDocumentLoader
+from .chunker import SemanticChunker
+__all__ = ["MarkdownDocumentLoader", "SemanticChunker"]

src/document_processing/chunker.py ADDED Viewed

	@@ -0,0 +1,78 @@

+"""Document chunking with semantic and sentence-based splitting."""
+from typing import List
+from haystack import Document
+from haystack.components.preprocessors import DocumentSplitter
+import logging
+logger = logging.getLogger(__name__)
+class SemanticChunker:
+    """Chunks documents using semantic and sentence-based splitting."""
+    def __init__(
+        self,
+        chunk_size: int = 300,
+        chunk_overlap: int = 50,
+        min_chunk_size: int = 100,
+    ):
+        """
+        Initialize the chunker.
+        Args:
+            chunk_size: Target number of words per chunk
+            chunk_overlap: Number of words to overlap between chunks
+            min_chunk_size: Minimum number of words per chunk
+        """
+        self.chunk_size = chunk_size
+        self.chunk_overlap = chunk_overlap
+        self.min_chunk_size = min_chunk_size
+        # Use Haystack's DocumentSplitter with sentence-based splitting
+        self.splitter = DocumentSplitter(
+            split_by="sentence",
+            split_length=chunk_size,
+            split_overlap=chunk_overlap,
+            split_threshold=min_chunk_size,
+        )
+    def chunk_documents(self, documents: List[Document]) -> List[Document]:
+        """
+        Chunk documents into smaller pieces.
+        Args:
+            documents: List of documents to chunk
+        Returns:
+            List of chunked documents with metadata
+        """
+        if not documents:
+            logger.warning("No documents to chunk")
+            return []
+        logger.info(f"Chunking {len(documents)} documents")
+        # Split documents
+        result = self.splitter.run(documents=documents)
+        chunked_docs = result.get("documents", [])
+        # Add chunk metadata
+        for idx, doc in enumerate(chunked_docs):
+            if doc.meta is None:
+                doc.meta = {}
+            doc.meta["chunk_id"] = idx
+            doc.meta["chunk_size"] = len(doc.content.split())
+        logger.info(f"Created {len(chunked_docs)} chunks from {len(documents)} documents")
+        # Log statistics
+        chunk_sizes = [doc.meta.get("chunk_size", 0) for doc in chunked_docs]
+        if chunk_sizes:
+            avg_size = sum(chunk_sizes) / len(chunk_sizes)
+            logger.info(
+                f"Chunk statistics - Avg: {avg_size:.1f} words, "
+                f"Min: {min(chunk_sizes)}, Max: {max(chunk_sizes)}"
+            )
+        return chunked_docs

src/document_processing/loader.py ADDED Viewed

	@@ -0,0 +1,65 @@

+"""Document loader for markdown files."""
+from pathlib import Path
+from typing import List
+from haystack import Document
+from haystack.components.converters import MarkdownToDocument
+import logging
+logger = logging.getLogger(__name__)
+class MarkdownDocumentLoader:
+    """Loads markdown documents from a directory."""
+    def __init__(self, documents_path: str):
+        """
+        Initialize the document loader.
+        Args:
+            documents_path: Path to directory containing markdown files
+        """
+        self.documents_path = Path(documents_path)
+        self.converter = MarkdownToDocument()
+    def load_documents(self) -> List[Document]:
+        """
+        Load all markdown documents from the configured directory.
+        Returns:
+            List of Haystack Document objects
+        """
+        if not self.documents_path.exists():
+            raise FileNotFoundError(f"Documents path does not exist: {self.documents_path}")
+        documents = []
+        markdown_files = list(self.documents_path.glob("*.md"))
+        if not markdown_files:
+            logger.warning(f"No markdown files found in {self.documents_path}")
+            return documents
+        logger.info(f"Loading {len(markdown_files)} markdown files from {self.documents_path}")
+        for md_file in markdown_files:
+            try:
+                # Convert markdown file to Haystack Document
+                result = self.converter.run(sources=[md_file])
+                file_documents = result.get("documents", [])
+                # Add metadata
+                for doc in file_documents:
+                    if doc.meta is None:
+                        doc.meta = {}
+                    doc.meta["source_file"] = md_file.name
+                    doc.meta["file_path"] = str(md_file)
+                documents.extend(file_documents)
+                logger.info(f"Loaded document: {md_file.name}")
+            except Exception as e:
+                logger.error(f"Error loading {md_file.name}: {e}")
+                continue
+        logger.info(f"Successfully loaded {len(documents)} documents")
+        return documents

src/indexing/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+"""Indexing components for OpenSearch integration."""
+from .opensearch_client import OpenSearchClient
+from .indexer import DocumentIndexer
+__all__ = ["OpenSearchClient", "DocumentIndexer"]

src/indexing/indexer.py ADDED Viewed

	@@ -0,0 +1,106 @@

+"""Document indexer for storing documents in OpenSearch."""
+from typing import List
+from haystack import Document
+from haystack.components.embedders import OpenAIDocumentEmbedder
+from haystack.document_stores.in_memory import InMemoryDocumentStore
+from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
+import logging
+from ..config import OpenSearchConfig, LLMConfig
+logger = logging.getLogger(__name__)
+class DocumentIndexer:
+    """Indexes documents in OpenSearch with embeddings."""
+    def __init__(self, opensearch_config: OpenSearchConfig, llm_config: LLMConfig):
+        """
+        Initialize the document indexer.
+        Args:
+            opensearch_config: OpenSearch configuration
+            llm_config: LLM configuration for embeddings
+        """
+        self.opensearch_config = opensearch_config
+        self.llm_config = llm_config
+        # Initialize document store
+        self.document_store = OpenSearchDocumentStore(
+            hosts=f"{opensearch_config.host}:{opensearch_config.port}",
+            index=opensearch_config.index_name,
+            http_auth=(opensearch_config.user, opensearch_config.password),
+            use_ssl=opensearch_config.use_ssl,
+            verify_certs=opensearch_config.verify_certs,
+            embedding_dim=1536,  # text-embedding-3-small dimension
+        )
+        # Initialize embedder
+        self.embedder = OpenAIDocumentEmbedder(
+            api_key=llm_config.api_key,
+            model=llm_config.embedding_model,
+        )
+    def index_documents(self, documents: List[Document]) -> int:
+        """
+        Index documents with embeddings.
+        Args:
+            documents: List of documents to index
+        Returns:
+            Number of documents successfully indexed
+        """
+        if not documents:
+            logger.warning("No documents to index")
+            return 0
+        logger.info(f"Indexing {len(documents)} documents")
+        try:
+            # Generate embeddings for documents
+            logger.info("Generating embeddings...")
+            result = self.embedder.run(documents=documents)
+            embedded_docs = result.get("documents", [])
+            if not embedded_docs:
+                logger.error("Failed to generate embeddings")
+                return 0
+            logger.info(f"Generated embeddings for {len(embedded_docs)} documents")
+            # Write documents to OpenSearch
+            logger.info("Writing documents to OpenSearch...")
+            self.document_store.write_documents(embedded_docs)
+            doc_count = self.document_store.count_documents()
+            logger.info(f"Successfully indexed documents. Total documents in store: {doc_count}")
+            return len(embedded_docs)
+        except Exception as e:
+            logger.error(f"Error indexing documents: {e}")
+            raise
+    def clear_index(self):
+        """Clear all documents from the index."""
+        try:
+            self.document_store.delete_documents()
+            logger.info("Cleared all documents from index")
+        except Exception as e:
+            logger.error(f"Error clearing index: {e}")
+            raise
+    def get_document_count(self) -> int:
+        """
+        Get number of documents in the index.
+        Returns:
+            Document count
+        """
+        try:
+            return self.document_store.count_documents()
+        except Exception as e:
+            logger.error(f"Error getting document count: {e}")
+            return 0

src/indexing/opensearch_client.py ADDED Viewed

	@@ -0,0 +1,167 @@

+"""OpenSearch client for document storage and retrieval."""
+from typing import Optional
+from opensearchpy import OpenSearch
+from opensearchpy.exceptions import RequestError
+import logging
+from ..config import OpenSearchConfig
+logger = logging.getLogger(__name__)
+class OpenSearchClient:
+    """Client for interacting with OpenSearch."""
+    def __init__(self, config: OpenSearchConfig):
+        """
+        Initialize OpenSearch client.
+        Args:
+            config: OpenSearch configuration
+        """
+        self.config = config
+        self.client = self._create_client()
+    def _create_client(self) -> OpenSearch:
+        """Create OpenSearch client connection."""
+        return OpenSearch(
+            hosts=[{"host": self.config.host, "port": self.config.port}],
+            http_auth=(self.config.user, self.config.password),
+            use_ssl=self.config.use_ssl,
+            verify_certs=self.config.verify_certs,
+            ssl_show_warn=False,
+        )
+    def ping(self) -> bool:
+        """
+        Check if OpenSearch is accessible.
+        Returns:
+            True if connection is successful
+        """
+        try:
+            return self.client.ping()
+        except Exception as e:
+            logger.error(f"Failed to ping OpenSearch: {e}")
+            return False
+    def create_index(self, index_name: Optional[str] = None, embedding_dim: int = 1536) -> bool:
+        """
+        Create an index with proper mapping for hybrid retrieval.
+        Args:
+            index_name: Name of index to create (uses config default if not provided)
+            embedding_dim: Dimension of embedding vectors
+        Returns:
+            True if index was created or already exists
+        """
+        index_name = index_name or self.config.index_name
+        # Define index mapping for hybrid retrieval
+        mapping = {
+            "settings": {
+                "index": {
+                    "number_of_shards": 2,
+                    "number_of_replicas": 1,
+                    "knn": True,  # Enable k-NN
+                }
+            },
+            "mappings": {
+                "properties": {
+                    "content": {
+                        "type": "text",
+                        "analyzer": "standard",
+                    },
+                    "embedding": {
+                        "type": "knn_vector",
+                        "dimension": embedding_dim,
+                        "method": {
+                            "name": "hnsw",
+                            "space_type": "cosinesimil",
+                            "engine": "nmslib",
+                        },
+                    },
+                    "meta": {
+                        "type": "object",
+                        "properties": {
+                            "source_file": {"type": "keyword"},
+                            "file_path": {"type": "keyword"},
+                            "chunk_id": {"type": "integer"},
+                            "chunk_size": {"type": "integer"},
+                        },
+                    },
+                }
+            },
+        }
+        try:
+            if self.client.indices.exists(index=index_name):
+                logger.info(f"Index '{index_name}' already exists")
+                return True
+            self.client.indices.create(index=index_name, body=mapping)
+            logger.info(f"Created index '{index_name}'")
+            return True
+        except RequestError as e:
+            logger.error(f"Failed to create index: {e}")
+            return False
+    def delete_index(self, index_name: Optional[str] = None) -> bool:
+        """
+        Delete an index.
+        Args:
+            index_name: Name of index to delete (uses config default if not provided)
+        Returns:
+            True if index was deleted
+        """
+        index_name = index_name or self.config.index_name
+        try:
+            if not self.client.indices.exists(index=index_name):
+                logger.warning(f"Index '{index_name}' does not exist")
+                return False
+            self.client.indices.delete(index=index_name)
+            logger.info(f"Deleted index '{index_name}'")
+            return True
+        except Exception as e:
+            logger.error(f"Failed to delete index: {e}")
+            return False
+    def index_exists(self, index_name: Optional[str] = None) -> bool:
+        """
+        Check if an index exists.
+        Args:
+            index_name: Name of index to check (uses config default if not provided)
+        Returns:
+            True if index exists
+        """
+        index_name = index_name or self.config.index_name
+        return self.client.indices.exists(index=index_name)
+    def get_document_count(self, index_name: Optional[str] = None) -> int:
+        """
+        Get number of documents in index.
+        Args:
+            index_name: Name of index (uses config default if not provided)
+        Returns:
+            Document count
+        """
+        index_name = index_name or self.config.index_name
+        try:
+            result = self.client.count(index=index_name)
+            return result["count"]
+        except Exception as e:
+            logger.error(f"Failed to get document count: {e}")
+            return 0

src/pipeline/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""Pipeline orchestration for the multi-agent RAG system."""
+from .orchestrator import RAGOrchestrator
+__all__ = ["RAGOrchestrator"]

src/pipeline/orchestrator.py ADDED Viewed

	@@ -0,0 +1,192 @@

+"""RAG pipeline orchestrator coordinating all agents and retrieval."""
+from typing import Dict, Any, List
+from pydantic import BaseModel
+from haystack import Document
+import logging
+from ..config import AppConfig
+from ..agents.intent_agent import IntentAgent, IntentData
+from ..agents.composer_agent import ComposerAgent, EmailDraft
+from ..agents.fact_checker_agent import FactCheckerAgent, FactCheckResult
+from ..retrieval.hybrid_retriever import HybridRetriever
+from ..indexing.indexer import DocumentIndexer
+logger = logging.getLogger(__name__)
+class PipelineResult(BaseModel):
+    """Complete result from the RAG pipeline."""
+    query: str
+    intent: IntentData
+    retrieved_docs: List[Dict[str, Any]]
+    email_draft: EmailDraft
+    fact_check: FactCheckResult
+    processing_time: float = 0.0
+class RAGOrchestrator:
+    """Orchestrates the multi-agent RAG pipeline."""
+    def __init__(self, config: AppConfig, document_indexer: DocumentIndexer):
+        """
+        Initialize the RAG orchestrator.
+        Args:
+            config: Application configuration
+            document_indexer: Document indexer instance (contains document store)
+        """
+        self.config = config
+        # Initialize agents
+        self.intent_agent = IntentAgent(
+            api_key=config.llm.api_key,
+            model=f"openai:{config.llm.model_name}",
+        )
+        self.composer_agent = ComposerAgent(
+            api_key=config.llm.api_key,
+            model=f"openai:{config.llm.model_name}",
+        )
+        self.fact_checker_agent = FactCheckerAgent(
+            api_key=config.llm.api_key,
+            model=f"openai:{config.llm.model_name}",
+        )
+        # Initialize retriever
+        self.retriever = HybridRetriever(
+            document_store=document_indexer.document_store,
+            llm_config=config.llm,
+            retrieval_config=config.retrieval,
+        )
+    async def process_query(self, query: str) -> PipelineResult:
+        """
+        Process a user query through the complete RAG pipeline.
+        Args:
+            query: User's query text
+        Returns:
+            Complete pipeline result
+        """
+        import time
+        start_time = time.time()
+        logger.info(f"Processing query: {query[:100]}...")
+        try:
+            # Step 1: Extract intent
+            logger.info("Step 1: Extracting intent...")
+            intent = await self.intent_agent.extract_intent(query)
+            # Step 2: Retrieve relevant documents
+            logger.info("Step 2: Retrieving relevant documents...")
+            retrieved_docs = self.retriever.retrieve(query)
+            logger.info(f"Retrieved {len(retrieved_docs)} documents")
+            # Step 3: Compose email draft
+            logger.info("Step 3: Composing email draft...")
+            email_draft = await self.composer_agent.compose_email(
+                query=query,
+                intent=intent,
+                context_docs=retrieved_docs,
+            )
+            # Step 4: Fact-check the draft
+            logger.info("Step 4: Fact-checking email draft...")
+            fact_check = await self.fact_checker_agent.fact_check(
+                email_draft=email_draft,
+                source_docs=retrieved_docs,
+            )
+            processing_time = time.time() - start_time
+            # Build result
+            result = PipelineResult(
+                query=query,
+                intent=intent,
+                retrieved_docs=self._serialize_documents(retrieved_docs),
+                email_draft=email_draft,
+                fact_check=fact_check,
+                processing_time=processing_time,
+            )
+            logger.info(f"Pipeline completed in {processing_time:.2f}s")
+            return result
+        except Exception as e:
+            logger.error(f"Error in pipeline: {e}")
+            raise
+    def _serialize_documents(self, documents: List[Document]) -> List[Dict[str, Any]]:
+        """
+        Serialize Haystack documents to dictionaries.
+        Args:
+            documents: List of Haystack documents
+        Returns:
+            List of document dictionaries
+        """
+        serialized = []
+        for doc in documents:
+            serialized.append(
+                {
+                    "content": doc.content,
+                    "score": doc.score,
+                    "meta": doc.meta or {},
+                }
+            )
+        return serialized
+    async def refine_draft(
+        self,
+        original_query: str,
+        current_draft: str,
+        user_feedback: str,
+        retrieved_docs: List[Document],
+    ) -> EmailDraft:
+        """
+        Refine an email draft based on user feedback.
+        Args:
+            original_query: Original user query
+            current_draft: Current email draft text
+            user_feedback: User's feedback or refinement request
+            retrieved_docs: Previously retrieved documents
+        Returns:
+            Refined email draft
+        """
+        logger.info("Refining email draft based on user feedback...")
+        # Create refinement prompt
+        refinement_query = f"""Original Query: {original_query}
+Current Draft:
+{current_draft}
+User Feedback/Refinement Request:
+{user_feedback}
+Please revise the email draft according to the user's feedback while maintaining accuracy and professionalism."""
+        # Re-extract intent with refinement context
+        intent = await self.intent_agent.extract_intent(refinement_query)
+        # Compose refined draft
+        refined_draft = await self.composer_agent.compose_email(
+            query=refinement_query,
+            intent=intent,
+            context_docs=retrieved_docs,
+        )
+        logger.info("Email draft refined")
+        return refined_draft

src/retrieval/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""Retrieval components for hybrid search."""
+from .hybrid_retriever import HybridRetriever
+__all__ = ["HybridRetriever"]

src/retrieval/hybrid_retriever.py ADDED Viewed

	@@ -0,0 +1,201 @@

+"""Hybrid retriever combining BM25 and vector search."""
+from typing import List, Dict, Any
+from haystack import Document
+from haystack.components.embedders import OpenAITextEmbedder
+from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
+from haystack_integrations.components.retrievers.opensearch import (
+    OpenSearchBM25Retriever,
+    OpenSearchEmbeddingRetriever,
+)
+import logging
+from ..config import RetrievalConfig, LLMConfig
+logger = logging.getLogger(__name__)
+class HybridRetriever:
+    """Retrieves documents using hybrid BM25 + vector search."""
+    def __init__(
+        self,
+        document_store: OpenSearchDocumentStore,
+        llm_config: LLMConfig,
+        retrieval_config: RetrievalConfig,
+    ):
+        """
+        Initialize the hybrid retriever.
+        Args:
+            document_store: OpenSearch document store
+            llm_config: LLM configuration for embeddings
+            retrieval_config: Retrieval configuration
+        """
+        self.document_store = document_store
+        self.llm_config = llm_config
+        self.retrieval_config = retrieval_config
+        # Initialize BM25 retriever
+        self.bm25_retriever = OpenSearchBM25Retriever(
+            document_store=document_store,
+        )
+        # Initialize embedding retriever
+        self.embedding_retriever = OpenSearchEmbeddingRetriever(
+            document_store=document_store,
+        )
+        # Initialize text embedder for queries
+        self.text_embedder = OpenAITextEmbedder(
+            api_key=llm_config.api_key,
+            model=llm_config.embedding_model,
+        )
+    def retrieve(self, query: str) -> List[Document]:
+        """
+        Retrieve documents using hybrid search.
+        Args:
+            query: Search query
+        Returns:
+            List of relevant documents with scores
+        """
+        logger.info(f"Retrieving documents for query: {query[:100]}...")
+        try:
+            # Get BM25 results
+            logger.debug("Running BM25 retrieval...")
+            bm25_results = self.bm25_retriever.run(
+                query=query,
+                top_k=self.retrieval_config.top_k * 2,  # Get more to merge
+            )
+            bm25_docs = bm25_results.get("documents", [])
+            logger.debug(f"BM25 retrieved {len(bm25_docs)} documents")
+            # Generate query embedding
+            logger.debug("Generating query embedding...")
+            embedding_result = self.text_embedder.run(text=query)
+            query_embedding = embedding_result.get("embedding")
+            if not query_embedding:
+                logger.warning("Failed to generate query embedding, using BM25 only")
+                return self._apply_score_threshold(bm25_docs)
+            # Get vector search results
+            logger.debug("Running vector retrieval...")
+            vector_results = self.embedding_retriever.run(
+                query_embedding=query_embedding,
+                top_k=self.retrieval_config.top_k * 2,
+            )
+            vector_docs = vector_results.get("documents", [])
+            logger.debug(f"Vector search retrieved {len(vector_docs)} documents")
+            # Merge and rank results
+            merged_docs = self._merge_results(bm25_docs, vector_docs)
+            # Apply score threshold and limit
+            final_docs = self._apply_score_threshold(merged_docs)
+            final_docs = final_docs[: self.retrieval_config.top_k]
+            logger.info(f"Retrieved {len(final_docs)} documents after hybrid ranking")
+            return final_docs
+        except Exception as e:
+            logger.error(f"Error during retrieval: {e}")
+            return []
+    def _merge_results(
+        self, bm25_docs: List[Document], vector_docs: List[Document]
+    ) -> List[Document]:
+        """
+        Merge BM25 and vector search results using weighted scoring.
+        Args:
+            bm25_docs: Documents from BM25 search
+            vector_docs: Documents from vector search
+        Returns:
+            Merged and ranked documents
+        """
+        # Create score maps
+        doc_scores: Dict[str, Dict[str, Any]] = {}
+        # Process BM25 results
+        for doc in bm25_docs:
+            doc_id = doc.id or doc.content[:50]
+            bm25_score = doc.score or 0.0
+            if doc_id not in doc_scores:
+                doc_scores[doc_id] = {
+                    "document": doc,
+                    "bm25_score": 0.0,
+                    "vector_score": 0.0,
+                }
+            doc_scores[doc_id]["bm25_score"] = bm25_score
+        # Process vector results
+        for doc in vector_docs:
+            doc_id = doc.id or doc.content[:50]
+            vector_score = doc.score or 0.0
+            if doc_id not in doc_scores:
+                doc_scores[doc_id] = {
+                    "document": doc,
+                    "bm25_score": 0.0,
+                    "vector_score": 0.0,
+                }
+            doc_scores[doc_id]["vector_score"] = vector_score
+        # Normalize and combine scores
+        bm25_scores = [info["bm25_score"] for info in doc_scores.values()]
+        vector_scores = [info["vector_score"] for info in doc_scores.values()]
+        max_bm25 = max(bm25_scores) if bm25_scores else 1.0
+        max_vector = max(vector_scores) if vector_scores else 1.0
+        merged_docs = []
+        for doc_id, info in doc_scores.items():
+            # Normalize scores
+            norm_bm25 = info["bm25_score"] / max_bm25 if max_bm25 > 0 else 0.0
+            norm_vector = info["vector_score"] / max_vector if max_vector > 0 else 0.0
+            # Combine with weights
+            combined_score = (
+                self.retrieval_config.bm25_weight * norm_bm25
+                + self.retrieval_config.vector_weight * norm_vector
+            )
+            doc = info["document"]
+            doc.score = combined_score
+            if doc.meta is None:
+                doc.meta = {}
+            doc.meta["bm25_score"] = info["bm25_score"]
+            doc.meta["vector_score"] = info["vector_score"]
+            doc.meta["combined_score"] = combined_score
+            merged_docs.append(doc)
+        # Sort by combined score
+        merged_docs.sort(key=lambda x: x.score or 0.0, reverse=True)
+        return merged_docs
+    def _apply_score_threshold(self, documents: List[Document]) -> List[Document]:
+        """
+        Filter documents by minimum score threshold.
+        Args:
+            documents: Documents to filter
+        Returns:
+            Filtered documents
+        """
+        return [
+            doc
+            for doc in documents
+            if doc.score and doc.score >= self.retrieval_config.min_score
+        ]

src/ui/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""Gradio UI components."""
+from .gradio_app import create_gradio_interface
+__all__ = ["create_gradio_interface"]

src/ui/gradio_app.py ADDED Viewed

	@@ -0,0 +1,285 @@

+"""Gradio UI for the RAG Email Assistant."""
+import gradio as gr
+from typing import Tuple, List, Dict, Any
+import logging
+import asyncio
+from ..config import get_config, AppConfig
+from ..indexing.indexer import DocumentIndexer
+from ..pipeline.orchestrator import RAGOrchestrator, PipelineResult
+logger = logging.getLogger(__name__)
+class GradioEmailAssistant:
+    """Gradio interface for the email assistant."""
+    def __init__(self, config: AppConfig):
+        """
+        Initialize the Gradio assistant.
+        Args:
+            config: Application configuration
+        """
+        self.config = config
+        # Initialize indexer and orchestrator
+        self.indexer = DocumentIndexer(
+            opensearch_config=config.opensearch,
+            llm_config=config.llm,
+        )
+        self.orchestrator = RAGOrchestrator(
+            config=config,
+            document_indexer=self.indexer,
+        )
+        # Store last pipeline result for refinement
+        self.last_result: PipelineResult | None = None
+    async def process_query_async(
+        self, query: str
+    ) -> Tuple[str, str, str, str, str, List[Dict[str, Any]]]:
+        """
+        Process a user query asynchronously.
+        Args:
+            query: User query text
+        Returns:
+            Tuple of (subject, body, intent_info, fact_check_info, stats, sources)
+        """
+        try:
+            # Process through pipeline
+            result = await self.orchestrator.process_query(query)
+            self.last_result = result
+            # Extract components
+            subject = result.email_draft.subject
+            body = result.email_draft.body
+            # Format intent information
+            intent_info = f"""**Action Type:** {result.intent.action_type}
+**Topic:** {result.intent.topic}
+**Language:** {result.intent.language}
+**Urgency:** {result.intent.urgency}
+**Key Entities:** {', '.join(result.intent.key_entities) if result.intent.key_entities else 'None'}
+**Questions:** {', '.join(result.intent.specific_questions) if result.intent.specific_questions else 'None'}"""
+            # Format fact check information
+            accuracy_emoji = "✅" if result.fact_check.is_accurate else "⚠️"
+            fact_check_info = f"""**Status:** {accuracy_emoji} {'Accurate' if result.fact_check.is_accurate else 'Issues Found'}
+**Accuracy Score:** {result.fact_check.accuracy_score:.1%}
+**Verified Claims:**
+{self._format_list(result.fact_check.verified_claims)}
+**Issues Found:**
+{self._format_list(result.fact_check.issues_found) if result.fact_check.issues_found else 'None'}
+**Suggestions:**
+{self._format_list(result.fact_check.suggestions) if result.fact_check.suggestions else 'None'}"""
+            # Format statistics
+            stats = f"""**Processing Time:** {result.processing_time:.2f}s
+**Documents Retrieved:** {len(result.retrieved_docs)}
+**Confidence:** {result.email_draft.confidence:.1%}"""
+            # Format sources
+            sources = []
+            for i, doc in enumerate(result.retrieved_docs, 1):
+                sources.append(
+                    {
+                        "Number": i,
+                        "Source": doc["meta"].get("source_file", "Unknown"),
+                        "Score": f"{doc['score']:.3f}",
+                        "Preview": doc["content"][:200] + "...",
+                    }
+                )
+            return subject, body, intent_info, fact_check_info, stats, sources
+        except Exception as e:
+            logger.error(f"Error processing query: {e}")
+            error_msg = f"Error: {str(e)}"
+            return (
+                "Error",
+                error_msg,
+                error_msg,
+                error_msg,
+                error_msg,
+                [],
+            )
+    def process_query_sync(
+        self, query: str
+    ) -> Tuple[str, str, str, str, str, List[Dict[str, Any]]]:
+        """Synchronous wrapper for async query processing."""
+        return asyncio.run(self.process_query_async(query))
+    async def refine_draft_async(
+        self, subject: str, body: str, feedback: str
+    ) -> Tuple[str, str]:
+        """
+        Refine the current draft based on user feedback.
+        Args:
+            subject: Current subject
+            body: Current body
+            feedback: User feedback
+        Returns:
+            Tuple of (new_subject, new_body)
+        """
+        if not self.last_result:
+            return subject, "Error: No draft to refine. Please generate a draft first."
+        try:
+            # Get retrieved docs from last result
+            from haystack import Document
+            retrieved_docs = [
+                Document(content=doc["content"], meta=doc["meta"])
+                for doc in self.last_result.retrieved_docs
+            ]
+            # Refine the draft
+            refined = await self.orchestrator.refine_draft(
+                original_query=self.last_result.query,
+                current_draft=body,
+                user_feedback=feedback,
+                retrieved_docs=retrieved_docs,
+            )
+            return refined.subject, refined.body
+        except Exception as e:
+            logger.error(f"Error refining draft: {e}")
+            return subject, f"Error refining draft: {str(e)}"
+    def refine_draft_sync(self, subject: str, body: str, feedback: str) -> Tuple[str, str]:
+        """Synchronous wrapper for async draft refinement."""
+        return asyncio.run(self.refine_draft_async(subject, body, feedback))
+    def _format_list(self, items: List[str]) -> str:
+        """Format a list of items as markdown."""
+        if not items:
+            return "None"
+        return "\n".join([f"- {item}" for item in items])
+def create_gradio_interface() -> gr.Blocks:
+    """
+    Create and configure the Gradio interface.
+    Returns:
+        Gradio Blocks interface
+    """
+    # Load configuration
+    config = get_config()
+    # Initialize assistant
+    assistant = GradioEmailAssistant(config)
+    # Create interface
+    with gr.Blocks(
+        title="BFH Student Administration Email Assistant",
+        theme=gr.themes.Soft(),
+    ) as demo:
+        gr.Markdown(
+            """
+        # 📧 BFH Student Administration Email Assistant
+        AI-powered email assistant for university administrative staff using RAG (Retrieval-Augmented Generation).
+        **Features:**
+        - Intent extraction from student queries
+        - Hybrid retrieval (BM25 + semantic search)
+        - Multi-agent email composition
+        - Automated fact-checking
+        - Draft refinement based on feedback
+        """
+        )
+        with gr.Row():
+            with gr.Column(scale=1):
+                gr.Markdown("### 📝 Query Input")
+                query_input = gr.Textbox(
+                    label="Student Query",
+                    placeholder="Enter the student's question or email content here...",
+                    lines=5,
+                )
+                process_btn = gr.Button("Generate Email Draft", variant="primary")
+            with gr.Column(scale=1):
+                gr.Markdown("### 📊 Analysis")
+                intent_output = gr.Markdown(label="Intent Analysis")
+                stats_output = gr.Markdown(label="Statistics")
+        gr.Markdown("### ✉️ Email Draft")
+        with gr.Row():
+            with gr.Column(scale=2):
+                subject_output = gr.Textbox(label="Subject", lines=1)
+                body_output = gr.Textbox(label="Body", lines=15)
+            with gr.Column(scale=1):
+                fact_check_output = gr.Markdown(label="Fact Check Results")
+        gr.Markdown("### 🔄 Refine Draft")
+        with gr.Row():
+            feedback_input = gr.Textbox(
+                label="Feedback / Refinement Instructions",
+                placeholder="E.g., 'Make it more formal', 'Add information about deadlines', 'Translate to English'",
+                lines=3,
+            )
+            refine_btn = gr.Button("Refine Draft", variant="secondary")
+        gr.Markdown("### 📚 Retrieved Sources")
+        sources_output = gr.Dataframe(
+            headers=["Number", "Source", "Score", "Preview"],
+            label="Source Documents",
+        )
+        # Event handlers
+        process_btn.click(
+            fn=assistant.process_query_sync,
+            inputs=[query_input],
+            outputs=[
+                subject_output,
+                body_output,
+                intent_output,
+                fact_check_output,
+                stats_output,
+                sources_output,
+            ],
+        )
+        refine_btn.click(
+            fn=assistant.refine_draft_sync,
+            inputs=[subject_output, body_output, feedback_input],
+            outputs=[subject_output, body_output],
+        )
+        gr.Markdown(
+            """
+        ---
+        **Note:** This system uses AI to assist with email composition. Always review and verify the generated content before sending.
+        """
+        )
+    return demo
+if __name__ == "__main__":
+    # Configure logging
+    logging.basicConfig(
+        level=logging.INFO,
+        format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+    )
+    # Create and launch interface
+    demo = create_gradio_interface()
+    demo.launch()