Spaces:

AnujJoshi
/

portfolio-chatbot-backend

Running

App Files Files Community

anujjoshi3105 commited on 3 days ago

Commit

5e03012

1 Parent(s): 7e8c521

feat: nvidia llm

Browse files

Files changed (32) hide show

.env.example +1 -0
README.md +114 -191
pyproject.toml +1 -0
src/agents/agents.py +8 -8
src/agents/{cpstat_agent.py → competitive_programming_agent.py} +5 -11
src/agents/llama_guard.py +5 -4
src/agents/middlewares/__init__.py +0 -2
src/agents/middlewares/followup.py +17 -44
src/agents/middlewares/summarization.py +0 -288
src/agents/{github_mcp_agent.py → open_source_agent.py} +5 -51
src/agents/portfolio_agent.py +1 -6
src/agents/prompts/competitive_programming.py +24 -0
src/agents/prompts/cpstat.py +0 -22
src/agents/prompts/followup.py +2 -2
src/agents/prompts/github.py +0 -21
src/agents/prompts/open_source.py +19 -0
src/agents/prompts/portfolio.py +9 -7
src/agents/tools/database_search.py +1 -1
src/agents/utils.py +0 -15
src/core/embeddings.py +9 -0
src/core/llm.py +10 -2
src/core/settings.py +12 -0
src/schema/__init__.py +4 -0
src/schema/models.py +208 -12
src/schema/schema.py +52 -0
src/service/agent_service.py +59 -222
src/service/dependencies.py +21 -2
src/service/history_service.py +164 -0
src/service/router.py +36 -10
src/service/service.py +3 -0
src/service/utils.py +29 -1
uv.lock +125 -109

.env.example CHANGED Viewed

@@ -6,6 +6,7 @@ ANTHROPIC_API_KEY=
 GOOGLE_API_KEY=
 GROQ_API_KEY=
 OPENROUTER_API_KEY=
 USE_AWS_BEDROCK=false
 #Vertex AI

 GOOGLE_API_KEY=
 GROQ_API_KEY=
 OPENROUTER_API_KEY=
+NVIDIA_API_KEY=
 USE_AWS_BEDROCK=false
 #Vertex AI

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: AI Agent Service Toolkit
 emoji: 🧰
 colorFrom: blue
 colorTo: indigo
@@ -7,234 +7,157 @@ sdk: docker
 pinned: false
 ---
-# 🧰 AI Agent Service Toolkit
-[![build status](https://github.com/JoshuaC215/chatbot/actions/workflows/test.yml/badge.svg)](https://github.com/JoshuaC215/chatbot/actions/workflows/test.yml) [![codecov](https://codecov.io/github/JoshuaC215/chatbot/graph/badge.svg?token=5MTJSYWD05)](https://codecov.io/github/JoshuaC215/chatbot) [![Python Version](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2FJoshuaC215%2Fchatbot%2Frefs%2Fheads%2Fmain%2Fpyproject.toml)](https://github.com/JoshuaC215/chatbot/blob/main/pyproject.toml)
-[![GitHub License](https://img.shields.io/github/license/JoshuaC215/chatbot)](https://github.com/JoshuaC215/chatbot/blob/main/LICENSE) [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_red.svg)](https://chatbot.streamlit.app/)
-A full toolkit for running an AI agent service built with LangGraph, FastAPI and Streamlit.
-It includes a [LangGraph](https://langchain-ai.github.io/langgraph/) agent, a [FastAPI](https://fastapi.tiangolo.com/) service to serve it, a client to interact with the service, and a [Streamlit](https://streamlit.io/) app that uses the client to provide a chat interface. Data structures and settings are built with [Pydantic](https://github.com/pydantic/pydantic).
-This project offers a template for you to easily build and run your own agents using the LangGraph framework. It demonstrates a complete setup from agent definition to user interface, making it easier to get started with LangGraph-based projects by providing a full, robust toolkit.
-**[🎥 Watch a video walkthrough of the repo and app](https://www.youtube.com/watch?v=pdYVHw_YCNY)**
-## Overview
-### [Try the app!](https://chatbot.streamlit.app/)
-<a href="https://chatbot.streamlit.app/"><img src="media/app_screenshot.png" width="600"></a>
-### Quickstart
-Run directly in python
-```sh
-# At least one LLM API key is required
-echo 'OPENAI_API_KEY=your_openai_api_key' >> .env
-# uv is the recommended way to install chatbot, but "pip install ." also works
-# For uv installation options, see: https://docs.astral.sh/uv/getting-started/installation/
-curl -LsSf https://astral.sh/uv/0.7.19/install.sh | sh
-# Install dependencies. "uv sync" creates .venv automatically
-uv sync --frozen
-source .venv/bin/activate
-python src/run_service.py
-# In another shell
-source .venv/bin/activate
-streamlit run src/streamlit_app.py
 ```
-Run with docker
-```sh
-echo 'OPENAI_API_KEY=your_openai_api_key' >> .env
-docker compose watch
 ```
-### Architecture Diagram
-<img src="media/agent_architecture.png" width="600">
-### Key Features
-1. **LangGraph Agent and latest features**: A customizable agent built using the LangGraph framework. Implements the latest LangGraph v1.0 features including human in the loop with `interrupt()`, flow control with `Command`, long-term memory with `Store`, and `langgraph-supervisor`.
-1. **FastAPI Service**: Serves the agent with both streaming and non-streaming endpoints.
-1. **Advanced Streaming**: A novel approach to support both token-based and message-based streaming.
-1. **Streamlit Interface**: Provides a user-friendly chat interface for interacting with the agent, including voice input and output.
-1. **Multiple Agent Support**: Run multiple agents in the service and call by URL path. Available agents and models are described in `/info`
-1. **Asynchronous Design**: Utilizes async/await for efficient handling of concurrent requests.
-1. **Content Moderation**: Implements LlamaGuard for content moderation (requires Groq API key).
-1. **RAG Agent**: A basic RAG agent implementation using ChromaDB - see [docs](docs/RAG_Assistant.md).
-1. **Feedback Mechanism**: Includes a star-based feedback system integrated with LangSmith.
-1. **Docker Support**: Includes Dockerfiles and a docker compose file for easy development and deployment.
-1. **Testing**: Includes robust unit and integration tests for the full repo.
-### Key Files
-The repository is structured as follows:
-- `src/agents/`: Defines several agents with different capabilities
-- `src/schema/`: Defines the protocol schema
-- `src/core/`: Core modules including LLM definition and settings
-- `src/service/service.py`: FastAPI service to serve the agents
-- `src/client/client.py`: Client to interact with the agent service
-- `src/streamlit_app.py`: Streamlit app providing a chat interface
-- `tests/`: Unit and integration tests
-## Setup and Usage
-1. Clone the repository:
-   ```sh
-   git clone https://github.com/JoshuaC215/chatbot.git
-   cd chatbot
-   ```
-2. Set up environment variables:
-   Create a `.env` file in the root directory. At least one LLM API key or configuration is required. See the [`.env.example` file](./.env.example) for a full list of available environment variables, including a variety of model provider API keys, header-based authentication, LangSmith tracing, testing and development modes, and OpenWeatherMap API key.
-3. You can now run the agent service and the Streamlit app locally, either with Docker or just using Python. The Docker setup is recommended for simpler environment setup and immediate reloading of the services when you make changes to your code.
-### Additional setup for specific AI providers
-- [Setting up Ollama](docs/Ollama.md)
-- [Setting up VertexAI](docs/VertexAI.md)
-- [Setting up RAG with ChromaDB](docs/RAG_Assistant.md)
-### Building or customizing your own agent
-To customize the agent for your own use case:
-1. Add your new agent to the `src/agents` directory. You can copy `research_assistant.py` or `chatbot.py` and modify it to change the agent's behavior and tools.
-1. Import and add your new agent to the `agents` dictionary in `src/agents/agents.py`. Your agent can be called by `/<your_agent_name>/invoke` or `/<your_agent_name>/stream`.
-1. Adjust the Streamlit interface in `src/streamlit_app.py` to match your agent's capabilities.
-### Handling Private Credential files
-If your agents or chosen LLM require file-based credential files or certificates, the `privatecredentials/` has been provided for your development convenience. All contents, excluding the `.gitkeep` files, are ignored by git and docker's build process. See [Working with File-based Credentials](docs/File_Based_Credentials.md) for suggested use.
-### Docker Setup
-This project includes a Docker setup for easy development and deployment. The `compose.yaml` file defines three services: `postgres`, `agent_service` and `streamlit_app`. The `Dockerfile` for each service is in their respective directories.
-For local development, we recommend using [docker compose watch](https://docs.docker.com/compose/file-watch/). This feature allows for a smoother development experience by automatically updating your containers when changes are detected in your source code.
-1. Make sure you have Docker and Docker Compose (>= [v2.23.0](https://docs.docker.com/compose/release-notes/#2230)) installed on your system.
-2. Create a `.env` file from the `.env.example`. At minimum, you need to provide an LLM API key (e.g., OPENAI_API_KEY).
-   ```sh
-   cp .env.example .env
-   # Edit .env to add your API keys
-   ```
-3. Build and launch the services in watch mode:
-   ```sh
-   docker compose watch
-   ```
-   This will automatically:
-   - Start a PostgreSQL database service that the agent service connects to
-   - Start the agent service with FastAPI
-   - Start the Streamlit app for the user interface
-4. The services will now automatically update when you make changes to your code:
-   - Changes in the relevant python files and directories will trigger updates for the relevant services.
-   - NOTE: If you make changes to the `pyproject.toml` or `uv.lock` files, you will need to rebuild the services by running `docker compose up --build`.
-5. Access the Streamlit app by navigating to `http://localhost:8501` in your web browser.
-6. The agent service API will be available at `http://0.0.0.0:7860`. You can also use the OpenAPI docs at `http://0.0.0.0:7860/redoc`.
-7. Use `docker compose down` to stop the services.
-This setup allows you to develop and test your changes in real-time without manually restarting the services.
-### Building other apps on the AgentClient
-The repo includes a generic `src/client/client.AgentClient` that can be used to interact with the agent service. This client is designed to be flexible and can be used to build other apps on top of the agent. It supports both synchronous and asynchronous invocations, and streaming and non-streaming requests.
-See the `src/run_client.py` file for full examples of how to use the `AgentClient`. A quick example:
-```python
-from client import AgentClient
-client = AgentClient()
-response = client.invoke("Tell me a brief joke?")
-response.pretty_print()
-# ================================== Ai Message ==================================
-#
-# A man walked into a library and asked the librarian, "Do you have any books on Pavlov's dogs and Schrödinger's cat?"
-# The librarian replied, "It rings a bell, but I'm not sure if it's here or not."
 ```
-### Development with LangGraph Studio
-The agent supports [LangGraph Studio](https://langchain-ai.github.io/langgraph/concepts/langgraph_studio/), the IDE for developing agents in LangGraph.
-`langgraph-cli[inmem]` is installed with `uv sync`. You can simply add your `.env` file to the root directory as described above, and then launch LangGraph Studio with `langgraph dev`. Customize `langgraph.json` as needed. See the [local quickstart](https://langchain-ai.github.io/langgraph/cloud/how-tos/studio/quick_start/#local-development-server) to learn more.
-### Local development without Docker
-You can also run the agent service and the Streamlit app locally without Docker, just using a Python virtual environment.
-1. Create a virtual environment and install dependencies:
-   ```sh
-   uv sync --frozen
-   source .venv/bin/activate
-   ```
-2. Run the FastAPI server:
-   ```sh
-   python src/run_service.py
-   ```
-3. In a separate terminal, run the Streamlit app:
-   ```sh
-   streamlit run src/streamlit_app.py
-   ```
-4. Open your browser and navigate to the URL provided by Streamlit (usually `http://localhost:8501`).
-## Projects built with or inspired by chatbot
-The following are a few of the public projects that drew code or inspiration from this repo.
-- **[PolyRAG](https://github.com/QuentinFuxa/PolyRAG)** - Extends chatbot with RAG capabilities over both PostgreSQL databases and PDF documents.
-- **[alexrisch/agent-web-kit](https://github.com/alexrisch/agent-web-kit)** - A Next.JS frontend for chatbot
-- **[raushan-in/dapa](https://github.com/raushan-in/dapa)** - Digital Arrest Protection App (DAPA) enables users to report financial scams and frauds efficiently via a user-friendly platform.
-**Please create a pull request editing the README or open a discussion with any new ones to be added!** Would love to include more projects.
 ## Contributing
-Contributions are welcome! Please feel free to submit a Pull Request. Currently the tests need to be run using the local development without Docker setup. To run the tests for the agent service:
-1. Ensure you're in the project root directory and have activated your virtual environment.
-2. Install the development dependencies and pre-commit hooks:
-   ```sh
-   uv sync --frozen
-   pre-commit install
-   ```
-3. Run the tests using pytest:
-   ```sh
-   pytest
-   ```
 ## License
-This project is licensed under the MIT License - see the LICENSE file for details.

 ---
+title: Portfolio Chatbot
 emoji: 🧰
 colorFrom: blue
 colorTo: indigo
 pinned: false
 ---
+# Portfolio Chatbot Backend
+A robust, production-grade AI agent service built with **LangGraph**, **FastAPI**, and **Python**. Designed to power an intelligent portfolio assistant, this backend orchestrates multiple specialized agents to answer questions about professional experience, analyze GitHub contributions, and track competitive programming statistics.
+## Features
+*   **Multi-Agent Architecture**: Orchestrates specialized agents for different domains.
+    *   **Portfolio Agent**: An expert on Anuj Joshi's background, skills, projects, and work experience, powered by a curated knowledge base.
+    *   **Open Source Agent**: Integrates with GitHub (via MCP) to analyze repositories, summarize contributions, and provide code insights.
+    *   **Competitive Programming Agent**: Tracks real-time performance and statistics from platforms like LeetCode and Codeforces.
+*   **Advanced Memory System**:
+    *   **Short-term Memory**: Manages conversation history using LangGraph checkpointers (Postgres, SQLite, or MongoDB).
+    *   **Long-term Memory**: Persists cross-conversation knowledge using a durable store.
+*   **Model Agnostic**: Supports a wide range of LLM providers including **OpenAI**, **Anthropic**, **Google Gemini/Vertex AI**, **Groq**, **NVIDIA**, **DeepSeek**, **Azure OpenAI**, and **Ollama**.
+*   **Production Ready API**:
+    *   RESTful endpoints built with **FastAPI**.
+    *   Full streaming support (Server-Sent Events) for real-time responses.
+    *   Comprehensive conversation history and thread management.
+    *   Built-in feedback collection endpoints.
+*   **Observability & Tracing**: First-class integration with **LangSmith** and **LangFuse** for monitoring and debugging agent traces.
+*   **Dockerized**: extensive Docker support for easy deployment and scaling.
+## Tech Stack
+*   **Language**: Python 3.11+
+*   **Framework**: FastAPI, Uvicorn
+*   **AI orchestration**: LangChain, LangGraph
+*   **Database**: PostgreSQL (recommended for production), SQLite (dev), MongoDB
+*   **Package Manager**: uv (fast Python package installer)
+## Project Structure
 ```
+backend/
+├── src/
+│   ├── agents/          # Agent definitions and workflows
+│   │   ├── agents.py    # Agent registry and loading logic
+│   │   ├── portfolio_agent.py
+│   │   ├── open_source_agent.py
+│   │   └── ...
+│   ├── core/            # Core configurations and settings
+│   ├── memory/          # Database and checkpoint initialization
+│   ├── schema/          # Pydantic models and data schemas
+│   ├── service/         # FastAPI application and routes
+│   └── run_service.py   # Application entry point
+├── .env.example         # Environment variable template
+├── pyproject.toml       # Dependencies and project metadata
+├── compose.yaml         # Docker Compose configuration
+└── Dockerfile           # Docker build instructions
 ```
+## Getting Started
+### Prerequisites
+*   **Python 3.11+** or **Docker**
+*   **Git**
+*   API Keys for your preferred LLM provider (e.g., OpenAI, Anthropic, Groq).
+### Installation (Local)
+1.  **Clone the repository:**
+    ```bash
+    git clone https://github.com/Anujjoshi3105/portfolio-chatbot-backend.git
+    cd portfolio-chatbot-backend
+    ```
+2.  **Set up the environment:**
+    Create a virtual environment and install dependencies. We recommend using `uv` for speed, but `pip` works too.
+    ```bash
+    # Using uv (Recommended)
+    pip install uv
+    uv sync
+    # OR using standard pip
+    python -m venv .venv
+    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+    pip install -r requirements.txt # generate with uv pip compile... or just install from pyproject.toml
+    pip install .  # Install the project in editable mode
+    ```
+3.  **Configure Environment Variables:**
+    Copy `.env.example` to `.env` and fill in your API keys.
+    ```bash
+    cp .env.example .env
+    ```
+    **Key Configuration Options:**
+    *   `OPENAI_API_KEY`, `GROQ_API_KEY`, etc.: API keys for LLMs.
+    *   `DEFAULT_MODEL`: The default model to use (e.g., `gpt-4o`, `llama-3.1-70b-versatile`).
+    *   `DATABASE_TYPE`: `postgres` or `sqlite`.
+    *   `GITHUB_PAT`: GitHub Personal Access Token (for Open Source Agent).
+    *   `LANGSMITH_TRACING`: Set to `true` to enable LangSmith tracing.
+### Running the Service
+Start the backend server:
+```bash
+# Run using the python script
+python src/run_service.py
+# OR using uvicorn directly
+uvicorn service:app --host 0.0.0.0 --port 7860 --reload
 ```
+The API will be available at `http://localhost:7860`.
+Access the interactive API docs (Swagger UI) at `http://localhost:7860/docs`.
+## Docker Deployment
+1.  **Build and Run with Docker Compose:**
+    ```bash
+    docker compose up --build
+    ```
+    This will start the backend service along with a PostgreSQL database (if configured in `compose.yaml`).
+## API Endpoints
+The service exposes several key endpoints for interacting with the agents:
+### 1. **Invoke Agent**
+   - **POST** `/invoke` or `/{agent_id}/invoke`
+   - Get a complete response from an agent.
+   - **Body:** `{ "message": "Tell me about your projects", "thread_id": "optional-uuid" }`
+### 2. **Stream Response**
+   - **POST** `/stream` or `/{agent_id}/stream`
+   - Stream the agent's reasoning and response token-by-token (SSE).
+   - **Body:** `{ "message": "...", "stream_tokens": true }`
+### 3. **Chat History**
+   - **POST** `/history`
+   - Retrieve past messages for a specific thread.
+### 4. **Service Info**
+   - **GET** `/info`
+   - Returns available agents, models, and configuration metadata.
 ## Contributing
+Contributions are welcome! Please perform the following steps:
+1.  Fork the repository.
+2.  Create a feature branch (`git checkout -b feature/amazing-feature`).
+3.  Commit your changes (`git commit -m 'Add amazing feature'`).
+4.  Push to the branch (`git push origin feature/amazing-feature`).
+5.  Open a Pull Request.
 ## License
+This project is licensed under the MIT License - see the `LICENSE` file for details.

pyproject.toml CHANGED Viewed

@@ -48,6 +48,7 @@ dependencies = [
     "langchain-mcp-adapters>=0.1.10",
     "ddgs>=9.9.1",
     "toons>=0.5.2",
 ]
 [dependency-groups]

     "langchain-mcp-adapters>=0.1.10",
     "ddgs>=9.9.1",
     "toons>=0.5.2",
+    "langchain-nvidia-ai-endpoints>=1.0.4",
 ]
 [dependency-groups]

src/agents/agents.py CHANGED Viewed

@@ -4,8 +4,8 @@ from langgraph.graph.state import CompiledStateGraph
 from langgraph.pregel import Pregel
 from agents.portfolio_agent import portfolio_agent
-from agents.github_mcp_agent import github_mcp_agent
-from agents.cpstat_agent import cpstat_agent
 from agents.lazy_agent import LazyLoadingAgent
 from schema import AgentInfo
@@ -37,9 +37,9 @@ agents: dict[str, Agent] = {
             "How can I contact Anuj?",
         ],
     ),
-    "github-mcp-agent": Agent(
-        description="An agent equipped with GitHub MCP tools to explore repositories, view code, and track development activity.",
-        graph_like=github_mcp_agent,
         prompts=[
             "List anujjoshi3105's top repositories",
             "Show recent activity on anujjoshi3105's GitHub",
@@ -47,9 +47,9 @@ agents: dict[str, Agent] = {
             "Show me anujjoshi3105's contributions in the last month",
         ],
     ),
-    "cpstat-agent": Agent(
-        description="An agent specializing in Competitive Programming, capable of fetching real-time contest ratings and stats from various platforms.",
-        graph_like=cpstat_agent,
         prompts=[
             "Show Anuj's LeetCode rating and stats",
             "What is Anuj's Codeforces rank?",

 from langgraph.pregel import Pregel
 from agents.portfolio_agent import portfolio_agent
+from agents.open_source_agent import open_source_agent
+from agents.competitive_programming_agent import competitive_programming_agent
 from agents.lazy_agent import LazyLoadingAgent
 from schema import AgentInfo
             "How can I contact Anuj?",
         ],
     ),
+    "open-source-agent": Agent(
+        description="An intelligent assistant that integrates with GitHub to showcase Anuj's open-source contributions. It can analyze repositories, summarize recent activity, and provide insights into code quality and development impact.",
+        graph_like=open_source_agent,
         prompts=[
             "List anujjoshi3105's top repositories",
             "Show recent activity on anujjoshi3105's GitHub",
             "Show me anujjoshi3105's contributions in the last month",
         ],
     ),
+    "competitive-programming-agent": Agent(
+        description="A dedicated competitive programming analyst that tracks Anuj's performance across major platforms like LeetCode and Codeforces. It provides real-time ratings, contest history, and detailed problem-solving statistics.",
+        graph_like=competitive_programming_agent,
         prompts=[
             "Show Anuj's LeetCode rating and stats",
             "What is Anuj's Codeforces rank?",

src/agents/{cpstat_agent.py → competitive_programming_agent.py} RENAMED Viewed

@@ -6,10 +6,9 @@ from langchain_mcp_adapters.client import MultiServerMCPClient
 from langchain_mcp_adapters.sessions import StreamableHttpConnection
 from langgraph.graph.state import CompiledStateGraph
-from agents.utils import filter_mcp_tools
 from agents.lazy_agent import LazyLoadingAgent
-from agents.middlewares import ConfigurableModelMiddleware, SummarizationMiddleware
-from agents.prompts.cpstat import SYSTEM_PROMPT
 from core import get_model, settings
 logger = logging.getLogger(__name__)
@@ -54,7 +53,7 @@ ALLOWED_TOOLS = {
 }
-class CPStatAgent(LazyLoadingAgent):
     """CP Stat Agent with async initialization for contest and rating info."""
     def __init__(self) -> None:
@@ -104,17 +103,12 @@ class CPStatAgent(LazyLoadingAgent):
             tools=self._mcp_tools,
             middleware=[
                 ConfigurableModelMiddleware(),
-                SummarizationMiddleware(
-                    max_tokens_before_summary=1000,
-                    messages_to_keep=4,
-                    use_llm=False
-                ),
             ],
-            name="cpstat-agent",
             system_prompt=SYSTEM_PROMPT,
             debug=True,
         )
 # Create the agent instance
-cpstat_agent = CPStatAgent()

 from langchain_mcp_adapters.sessions import StreamableHttpConnection
 from langgraph.graph.state import CompiledStateGraph
 from agents.lazy_agent import LazyLoadingAgent
+from agents.middlewares import ConfigurableModelMiddleware
+from agents.prompts.competitive_programming import SYSTEM_PROMPT
 from core import get_model, settings
 logger = logging.getLogger(__name__)
 }
+class CompetitiveProgrammingAgent(LazyLoadingAgent):
     """CP Stat Agent with async initialization for contest and rating info."""
     def __init__(self) -> None:
             tools=self._mcp_tools,
             middleware=[
                 ConfigurableModelMiddleware(),
             ],
+            name="competitive-programming-agent",
             system_prompt=SYSTEM_PROMPT,
             debug=True,
         )
 # Create the agent instance
+competitive_programming_agent = CompetitiveProgrammingAgent()

src/agents/llama_guard.py CHANGED Viewed

@@ -1,3 +1,4 @@
 from enum import Enum
 from langchain_core.messages import AIMessage, AnyMessage, HumanMessage
@@ -28,7 +29,7 @@ unsafe_content_categories = {
     "S4": "Child Exploitation.",
     "S5": "Defamation.",
     "S6": "Specialized Advice.",
-    "S7": "Privacy.",
     "S8": "Intellectual Property.",
     "S9": "Indiscriminate Weapons.",
     "S10": "Hate.",
@@ -77,11 +78,11 @@ def parse_llama_guard_output(output: str) -> LlamaGuardOutput:
 class LlamaGuard:
     def __init__(self) -> None:
-        if settings.GROQ_API_KEY is None:
-            print("GROQ_API_KEY not set, skipping LlamaGuard")
             self.model = None
             return
-        self.model = get_model(GroqModelName.LLAMA_GUARD_4_12B).with_config(tags=["skip_stream"])
         self.prompt = PromptTemplate.from_template(llama_guard_instructions)
     def _compile_prompt(self, role: str, messages: list[AnyMessage]) -> str:

+from schema.models import NvidiaModelName
 from enum import Enum
 from langchain_core.messages import AIMessage, AnyMessage, HumanMessage
     "S4": "Child Exploitation.",
     "S5": "Defamation.",
     "S6": "Specialized Advice.",
+#    "S7": "Privacy.",
     "S8": "Intellectual Property.",
     "S9": "Indiscriminate Weapons.",
     "S10": "Hate.",
 class LlamaGuard:
     def __init__(self) -> None:
+        if settings.NVIDIA_API_KEY is None:
+            print("NVIDIA_API_KEY not set, skipping LlamaGuard")
             self.model = None
             return
+        self.model = get_model(NvidiaModelName.META_LLAMA_GUARD_4_12B).with_config(tags=["skip_stream"])
         self.prompt = PromptTemplate.from_template(llama_guard_instructions)
     def _compile_prompt(self, role: str, messages: list[AnyMessage]) -> str:

src/agents/middlewares/__init__.py CHANGED Viewed

@@ -3,12 +3,10 @@
 from agents.middlewares.configurable_model import ConfigurableModelMiddleware
 from agents.middlewares.followup import FollowUpMiddleware
 from agents.middlewares.safety import SafetyMiddleware, UNSAFE_RESPONSE
-from agents.middlewares.summarization import SummarizationMiddleware
 __all__ = [
     "ConfigurableModelMiddleware",
     "FollowUpMiddleware",
     "SafetyMiddleware",
     "UNSAFE_RESPONSE",
-    "SummarizationMiddleware",
 ]

 from agents.middlewares.configurable_model import ConfigurableModelMiddleware
 from agents.middlewares.followup import FollowUpMiddleware
 from agents.middlewares.safety import SafetyMiddleware, UNSAFE_RESPONSE
 __all__ = [
     "ConfigurableModelMiddleware",
     "FollowUpMiddleware",
     "SafetyMiddleware",
     "UNSAFE_RESPONSE",
 ]

src/agents/middlewares/followup.py CHANGED Viewed

@@ -10,6 +10,7 @@ from langchain_core.messages import (
     AIMessage,
     ChatMessage,
 )
 from langchain_core.runnables import RunnableConfig
 from core import get_model, settings
@@ -23,12 +24,7 @@ logger = logging.getLogger(__name__)
 class FollowUpOutput(BaseModel):
     """Schema for follow-up generation response."""
-    questions: List[str] = Field(
-        min_length=1,
-        max_length=5,
-        description="List of 1-5 suggested follow-up questions for the user.",
-    )
     @field_validator("questions")
     @classmethod
@@ -42,48 +38,25 @@ class FollowUpOutput(BaseModel):
 class FollowUpMiddleware:
     """Generates structured follow-up suggestions after an agent response."""
-    async def generate(
-        self,
-        messages: List[AnyMessage],
-        config: RunnableConfig,
-    ) -> List[str]:
         try:
-            model_name = config.get("configurable", {}).get(
-                "model",
-                settings.DEFAULT_MODEL,
-            )
-            base_model = get_model(model_name)
-            # Force structured output
-            model = base_model.with_structured_output(FollowUpOutput)
-            # Clean messages
-            cleaned_messages: List[AnyMessage] = [
-                SystemMessage(content=FOLLOWUP_GENERATION_PROMPT),
-                *[
-                    m
-                    for m in messages
-                    if not (isinstance(m, ChatMessage) and m.role == "custom")
-                ],
-            ]
-            # Invoke model
-            response: FollowUpOutput = await model.ainvoke(
-                cleaned_messages,
-                config,
-            )
-            if not response or not response.questions:
                 raise ValueError("Empty follow-up response")
-            return response.questions
         except Exception as e:
-            logger.warning(
-                "Follow-up generation failed, using defaults: %s",
-                e,
-                exc_info=True,
-            )
         return DEFAULT_FOLLOWUP_PROMPTS

     AIMessage,
     ChatMessage,
 )
+from langchain_core.output_parsers import PydanticOutputParser
 from langchain_core.runnables import RunnableConfig
 from core import get_model, settings
 class FollowUpOutput(BaseModel):
     """Schema for follow-up generation response."""
+    questions: List[str] = Field(min_length=1, max_length=5, description="List of 1-5 suggested follow-up questions for the user.",)
     @field_validator("questions")
     @classmethod
 class FollowUpMiddleware:
     """Generates structured follow-up suggestions after an agent response."""
+    async def generate(self, messages: List[AnyMessage], config: RunnableConfig) -> List[str]:
         try:
+            model_name = config.get("configurable", {}).get("model", settings.DEFAULT_MODEL)
+            model = get_model(model_name)
+            parser = PydanticOutputParser(pydantic_object=FollowUpOutput)
+            system_message = f"{FOLLOWUP_GENERATION_PROMPT}\n\n{parser.get_format_instructions()}"
+            messages = [SystemMessage(content=system_message), messages[-1]]
+            response = await model.ainvoke(messages, config)
+            content = parser.parse(response.content)
+            if not content or not content.questions:
                 raise ValueError("Empty follow-up response")
+            return content.questions
         except Exception as e:
+            logger.warning("Follow-up generation failed, using defaults: %s", e, exc_info=True)
         return DEFAULT_FOLLOWUP_PROMPTS

src/agents/middlewares/summarization.py DELETED Viewed

@@ -1,288 +0,0 @@
-"""Summarization middleware."""
-import uuid
-from collections.abc import Callable, Iterable
-from typing import Any, cast
-from langchain_core.messages import (
-    AIMessage,
-    AnyMessage,
-    MessageLikeRepresentation,
-    RemoveMessage,
-    ToolMessage,
-)
-from langchain_core.messages.human import HumanMessage
-from langchain_core.messages.utils import count_tokens_approximately, trim_messages
-from langgraph.graph.message import REMOVE_ALL_MESSAGES
-from langgraph.runtime import Runtime
-from langgraph.config import get_config
-from langchain.agents.middleware.types import AgentMiddleware, AgentState
-from langchain.chat_models import BaseChatModel, init_chat_model
-from core import get_model, settings
-TokenCounter = Callable[[Iterable[MessageLikeRepresentation]], int]
-DEFAULT_SUMMARY_PROMPT = """<role>
-Context Extraction Assistant
-</role>
-<primary_objective>
-Your sole objective in this task is to extract the highest quality/most relevant context from the conversation history below.
-</primary_objective>
-<objective_information>
-You're nearing the total number of input tokens you can accept, so you must extract the highest quality/most relevant pieces of information from your conversation history.
-This context will then overwrite the conversation history presented below. Because of this, ensure the context you extract is only the most important information to your overall goal.
-</objective_information>
-<instructions>
-The conversation history below will be replaced with the context you extract in this step. Because of this, you must do your very best to extract and record all of the most important context from the conversation history.
-You want to ensure that you don't repeat any actions you've already completed, so the context you extract from the conversation history should be focused on the most important information to your overall goal.
-</instructions>
-The user will message you with the full message history you'll be extracting context from, to then replace. Carefully read over it all, and think deeply about what information is most important to your overall goal that should be saved:
-With all of this in mind, please carefully read over the entire conversation history, and extract the most important and relevant context to replace it so that you can free up space in the conversation history.
-Respond ONLY with the extracted context. Do not include any additional information, or text before or after the extracted context.
-<messages>
-Messages to summarize:
-{messages}
-</messages>"""  # noqa: E501
-SUMMARY_PREFIX = "## Previous conversation summary:"
-_DEFAULT_MESSAGES_TO_KEEP = 5
-_DEFAULT_TRIM_TOKEN_LIMIT = 1500
-_DEFAULT_FALLBACK_MESSAGE_COUNT = 8
-_SEARCH_RANGE_FOR_TOOL_PAIRS = 5
-class SummarizationMiddleware(AgentMiddleware):
-    """Summarizes conversation history when token limits are approached.
-    This middleware monitors message token counts and automatically summarizes older
-    messages when a threshold is reached, preserving recent messages and maintaining
-    context continuity by ensuring AI/Tool message pairs remain together.
-    """
-    def __init__(
-        self,
-        model: str | BaseChatModel | None = None,
-        max_tokens_before_summary: int | None = None,
-        messages_to_keep: int = _DEFAULT_MESSAGES_TO_KEEP,
-        token_counter: TokenCounter = count_tokens_approximately,
-        summary_prompt: str = DEFAULT_SUMMARY_PROMPT,
-        summary_prefix: str = SUMMARY_PREFIX,
-        use_llm: bool = True,
-    ) -> None:
-        """Initialize summarization middleware.
-        Args:
-            model: The language model to use for generating summaries.
-                If None or a string, will be resolved at runtime from config.
-            max_tokens_before_summary: Token threshold to trigger summarization.
-                If `None`, summarization is disabled.
-            messages_to_keep: Number of recent messages to preserve after summarization.
-            token_counter: Function to count tokens in messages.
-            summary_prompt: Prompt template for generating summaries.
-            summary_prefix: Prefix added to system message when including summary.
-            use_llm: Whether to use LLM for generating summary. If False, just trims and joins message contents.
-        """
-        super().__init__()
-        if isinstance(model, str):
-            model = init_chat_model(model)
-        self.model = model
-        self.max_tokens_before_summary = max_tokens_before_summary
-        self.messages_to_keep = messages_to_keep
-        self.token_counter = token_counter
-        self.summary_prompt = summary_prompt
-        self.summary_prefix = summary_prefix
-        self.use_llm = use_llm
-    def _get_model(self) -> BaseChatModel:
-        """Resolve the model to use for summarization."""
-        if isinstance(self.model, BaseChatModel):
-            return self.model
-        # Resolve from runtime config if not explicitly provided as a BaseChatModel
-        config = get_config()
-        model_key = config.get("configurable", {}).get("model", settings.DEFAULT_MODEL)
-        return get_model(model_key)
-    def before_model(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:  # noqa: ARG002
-        """Process messages before model invocation, potentially triggering summarization."""
-        messages = state["messages"]
-        self._ensure_message_ids(messages)
-        total_tokens = self.token_counter(messages)
-        if (
-            self.max_tokens_before_summary is not None
-            and total_tokens < self.max_tokens_before_summary
-        ):
-            return None
-        cutoff_index = self._find_safe_cutoff(messages)
-        if cutoff_index <= 0:
-            return None
-        messages_to_summarize, preserved_messages = self._partition_messages(messages, cutoff_index)
-        summary = self._create_summary(messages_to_summarize)
-        new_messages = self._build_new_messages(summary)
-        return {
-            "messages": [
-                RemoveMessage(id=REMOVE_ALL_MESSAGES),
-                *new_messages,
-                *preserved_messages,
-            ]
-        }
-    def _build_new_messages(self, summary: str) -> list[HumanMessage]:
-        return [
-            HumanMessage(content=f"Here is a summary of the conversation to date:\n\n{summary}")
-        ]
-    def _ensure_message_ids(self, messages: list[AnyMessage]) -> None:
-        """Ensure all messages have unique IDs for the add_messages reducer."""
-        for msg in messages:
-            if msg.id is None:
-                msg.id = str(uuid.uuid4())
-    def _partition_messages(
-        self,
-        conversation_messages: list[AnyMessage],
-        cutoff_index: int,
-    ) -> tuple[list[AnyMessage], list[AnyMessage]]:
-        """Partition messages into those to summarize and those to preserve."""
-        messages_to_summarize = conversation_messages[:cutoff_index]
-        preserved_messages = conversation_messages[cutoff_index:]
-        return messages_to_summarize, preserved_messages
-    def _find_safe_cutoff(self, messages: list[AnyMessage]) -> int:
-        """Find safe cutoff point that preserves AI/Tool message pairs.
-        Returns the index where messages can be safely cut without separating
-        related AI and Tool messages. Returns 0 if no safe cutoff is found.
-        """
-        if len(messages) <= self.messages_to_keep:
-            return 0
-        target_cutoff = len(messages) - self.messages_to_keep
-        for i in range(target_cutoff, -1, -1):
-            if self._is_safe_cutoff_point(messages, i):
-                return i
-        return 0
-    def _is_safe_cutoff_point(self, messages: list[AnyMessage], cutoff_index: int) -> bool:
-        """Check if cutting at index would separate AI/Tool message pairs."""
-        if cutoff_index >= len(messages):
-            return True
-        search_start = max(0, cutoff_index - _SEARCH_RANGE_FOR_TOOL_PAIRS)
-        search_end = min(len(messages), cutoff_index + _SEARCH_RANGE_FOR_TOOL_PAIRS)
-        for i in range(search_start, search_end):
-            if not self._has_tool_calls(messages[i]):
-                continue
-            tool_call_ids = self._extract_tool_call_ids(cast("AIMessage", messages[i]))
-            if self._cutoff_separates_tool_pair(messages, i, cutoff_index, tool_call_ids):
-                return False
-        return True
-    def _has_tool_calls(self, message: AnyMessage) -> bool:
-        """Check if message is an AI message with tool calls."""
-        return (
-            isinstance(message, AIMessage) and hasattr(message, "tool_calls") and message.tool_calls  # type: ignore[return-value]
-        )
-    def _extract_tool_call_ids(self, ai_message: AIMessage) -> set[str]:
-        """Extract tool call IDs from an AI message."""
-        tool_call_ids = set()
-        for tc in ai_message.tool_calls:
-            call_id = tc.get("id") if isinstance(tc, dict) else getattr(tc, "id", None)
-            if call_id is not None:
-                tool_call_ids.add(call_id)
-        return tool_call_ids
-    def _cutoff_separates_tool_pair(
-        self,
-        messages: list[AnyMessage],
-        ai_message_index: int,
-        cutoff_index: int,
-        tool_call_ids: set[str],
-    ) -> bool:
-        """Check if cutoff separates an AI message from its corresponding tool messages."""
-        for j in range(ai_message_index + 1, len(messages)):
-            message = messages[j]
-            if isinstance(message, ToolMessage) and message.tool_call_id in tool_call_ids:
-                ai_before_cutoff = ai_message_index < cutoff_index
-                tool_before_cutoff = j < cutoff_index
-                if ai_before_cutoff != tool_before_cutoff:
-                    return True
-        return False
-    def _create_summary(self, messages_to_summarize: list[AnyMessage]) -> str:
-        """Generate summary for the given messages."""
-        if not messages_to_summarize:
-            return "No previous conversation history."
-        for msg in messages_to_summarize:
-            if isinstance(msg, ToolMessage) and len(str(msg.content)) > self.max_tokens_before_summary:
-                msg.content = str(msg.content)[:self.max_tokens_before_summary] + "... [Tool output truncated for summary]"
-        trimmed_messages = self._trim_messages_for_summary(messages_to_summarize)
-        if not self.use_llm:
-            summary_parts = []
-            if self.messages_to_keep > 0:
-                messages_to_summarize = trimmed_messages[:-self.messages_to_keep]
-            else:
-                messages_to_summarize = trimmed_messages
-            for msg in messages_to_summarize:
-                content = msg.content if isinstance(msg.content, str) else str(msg.content)
-                summary_parts.append(f"[{msg.type}] {content}")
-            summary = "\n\n".join(summary_parts)
-            if len(summary) > self.max_tokens_before_summary:
-                return summary[:self.max_tokens_before_summary] + "..."
-            return summary
-        if not trimmed_messages:
-            return "Previous conversation was too long to summarize."
-        try:
-            model = self._get_model()
-            response = model.invoke(self.summary_prompt.format(messages=trimmed_messages))
-            return cast("str", response.content).strip()
-        except Exception as e:  # noqa: BLE001
-            return f"Error generating summary: {e!s}"
-    def _trim_messages_for_summary(self, messages: list[AnyMessage]) -> list[AnyMessage]:
-        """Trim messages to fit within summary generation limits."""
-        try:
-            return trim_messages(
-                messages,
-                max_tokens=_DEFAULT_TRIM_TOKEN_LIMIT,
-                token_counter=self.token_counter,
-                start_on="human",
-                strategy="last",
-                allow_partial=True,
-                include_system=True,
-            )
-        except Exception:  # noqa: BLE001
-            return messages[-_DEFAULT_FALLBACK_MESSAGE_COUNT:]

src/agents/{github_mcp_agent.py → open_source_agent.py} RENAMED Viewed

@@ -6,53 +6,14 @@ from langchain_mcp_adapters.client import MultiServerMCPClient
 from langchain_mcp_adapters.sessions import StreamableHttpConnection
 from langgraph.graph.state import CompiledStateGraph
-from agents.utils import filter_mcp_tools
 from agents.lazy_agent import LazyLoadingAgent
-from agents.middlewares import ConfigurableModelMiddleware, SummarizationMiddleware
 from core import get_model, settings
-from agents.prompts.github import SYSTEM_PROMPT
 logger = logging.getLogger(__name__)
-ALLOWED_TOOLS = {
-    # User & Profile
-    "get_me",              # Get authenticated user's profile
-    "get_teams",           # Get teams the user belongs to
-    "get_team_members",    # Get team member usernames
-    # Repository & Code
-    "search_repositories", # Find repositories by name/description
-    "get_file_contents",   # Get file/directory contents
-    "search_code",         # Search code across repositories
-    "list_branches",       # List repository branches
-    # Activity & Contributions
-    "list_commits",        # List commits in a repository
-    "get_commit",          # Get commit details with diff
-    "list_pull_requests",  # List PRs in a repository
-    "pull_request_read",   # Get PR details, diff, reviews
-    "search_pull_requests",# Search PRs by author
-    # Issues
-    "list_issues",         # List issues in a repository
-    "issue_read",          # Get issue details/comments
-    "search_issues",       # Search issues
-    # Releases & Tags
-    "list_releases",       # List releases
-    "get_latest_release",  # Get latest release
-    "get_release_by_tag",  # Get release by tag
-    "list_tags",           # List git tags
-    "get_tag",             # Get tag details
-    # Discovery
-    "search_users",        # Find GitHub users
-    "get_label",           # Get repository label
-    "list_issue_types",    # List issue types for org
-}
-class GitHubMCPAgent(LazyLoadingAgent):
     """GitHub MCP Agent with async initialization for portfolio assistant."""
     def __init__(self) -> None:
@@ -86,8 +47,6 @@ class GitHubMCPAgent(LazyLoadingAgent):
             self._mcp_client = MultiServerMCPClient(connections)
             logger.info("MCP client initialized successfully")
-#            all_tools = await self._mcp_client.get_tools()
-#            self._mcp_tools = filter_mcp_tools(all_tools, ALLOWED_TOOLS)
             self._mcp_tools = await self._mcp_client.get_tools()
         except Exception as e:
             logger.error(f"Failed to initialize GitHub MCP agent: {e}")
@@ -105,17 +64,12 @@ class GitHubMCPAgent(LazyLoadingAgent):
             tools=self._mcp_tools,
             middleware=[
                 ConfigurableModelMiddleware(),
-                SummarizationMiddleware(
-                    max_tokens_before_summary=1000,
-                    messages_to_keep=4,
-                    use_llm=False
-                ),
             ],
-            name="github-mcp-agent",
             system_prompt=SYSTEM_PROMPT,
             debug=True,
         )
 # Create the agent instance
-github_mcp_agent = GitHubMCPAgent()

 from langchain_mcp_adapters.sessions import StreamableHttpConnection
 from langgraph.graph.state import CompiledStateGraph
 from agents.lazy_agent import LazyLoadingAgent
+from agents.middlewares import ConfigurableModelMiddleware
 from core import get_model, settings
+from agents.prompts.open_source import SYSTEM_PROMPT
 logger = logging.getLogger(__name__)
+class OpenSourceAgent(LazyLoadingAgent):
     """GitHub MCP Agent with async initialization for portfolio assistant."""
     def __init__(self) -> None:
             self._mcp_client = MultiServerMCPClient(connections)
             logger.info("MCP client initialized successfully")
             self._mcp_tools = await self._mcp_client.get_tools()
         except Exception as e:
             logger.error(f"Failed to initialize GitHub MCP agent: {e}")
             tools=self._mcp_tools,
             middleware=[
                 ConfigurableModelMiddleware(),
             ],
+            name="open-source-agent",
             system_prompt=SYSTEM_PROMPT,
             debug=True,
         )
 # Create the agent instance
+open_source_agent = OpenSourceAgent()

src/agents/portfolio_agent.py CHANGED Viewed

@@ -4,7 +4,7 @@ from langchain.agents import create_agent
 from langgraph.graph.state import CompiledStateGraph
 from agents.lazy_agent import LazyLoadingAgent
-from agents.middlewares import ConfigurableModelMiddleware, SummarizationMiddleware
 from agents.prompts.portfolio import SYSTEM_PROMPT
 from agents.tools.database_search import database_search
 from core import get_model, settings
@@ -32,11 +32,6 @@ class PortfolioAgent(LazyLoadingAgent):
             tools=self._tools,
             middleware=[
                 ConfigurableModelMiddleware(),
-                SummarizationMiddleware(
-                    max_tokens_before_summary=1000,
-                    messages_to_keep=4,
-                    use_llm=False
-                ),
             ],
             name="portfolio-agent",
             system_prompt=SYSTEM_PROMPT,

 from langgraph.graph.state import CompiledStateGraph
 from agents.lazy_agent import LazyLoadingAgent
+from agents.middlewares import ConfigurableModelMiddleware
 from agents.prompts.portfolio import SYSTEM_PROMPT
 from agents.tools.database_search import database_search
 from core import get_model, settings
             tools=self._tools,
             middleware=[
                 ConfigurableModelMiddleware(),
             ],
             name="portfolio-agent",
             system_prompt=SYSTEM_PROMPT,

src/agents/prompts/competitive_programming.py ADDED Viewed

	@@ -0,0 +1,24 @@

+from datetime import datetime
+PORTFOLIO_URL = "https://anujjoshi.netlify.app"
+OWNER = "Anuj Joshi"
+HANDLE = "anujjoshi3105"
+current_date = datetime.now().strftime("%B %d, %Y")
+SYSTEM_PROMPT = f"""
+# ROLE: You are the **Lead Algorithmic Strategist** chatbot for {OWNER} (@{HANDLE}).
+# GOAL: Prove elite competitive programming capability to Recruiters and CTOs.
+# DATE: {current_date}.
+# COMPETITIVE PROGRAMMING (OLD STATS)
+- LeetCode: 1910 (Knight), 750+ solved | Codeforces: 1434 (specialist) | AtCoder: 929 (Green) | GeeksforGeeks: College Rank 46
+# GUIDELINES:
+-  ALWAYS use your tools to fetch the latest stats (ratings, problem counts, streaks) before answering specific questions. Do not guess.
+-  Interpret the data: Don't just list numbers; explain what they mean (e.g., "A rating of X puts him in the top Y%," or "A Z-day streak shows consistency").
+-  If fail to fetch the data, use the old stats with a disclaimer that the data is not up to date.
+-  Be professional, concise, and humble but confident.
+-  If asked about contact info or hireability, direct them to the contact section {PORTFOLIO_URL}/contact.
+- Never hallucinate or make up information. Always use the tools to fetch the latest information.
+- Never claim anything that is not in the tools or competitive programming platform. If you don't know the answer, say you don't know and suggest the user other information they can find on the competitive programming platform.
+"""

src/agents/prompts/cpstat.py DELETED Viewed

@@ -1,22 +0,0 @@
-from datetime import datetime
-PORTFOLIO_URL = "https://anujjoshi.netlify.app"
-OWNER = "Anuj Joshi"
-HANDLE = "anujjoshi3105"
-current_date = datetime.now().strftime("%B %d, %Y")
-SYSTEM_PROMPT = f"""
-### ROLE
-You are the **Lead Algorithmic Strategist** chatbot for {OWNER} (@{HANDLE}).
-**Date:** {current_date}.
-# Competitive Programming (OLD STATS)
-- LeetCode: 1910 (Knight), 750+ solved | Codeforces: 1434 (specialist) | AtCoder: 929 (Green) | GeeksforGeeks: College Rank 46
-# OUTPUT GUIDELINES:
--  **Translate Stats to Value:** Do not just list ratings. Explain that {OWNER}'s CP background guarantees **low-latency code, O(n) optimization habits, and edge-case resilience** in production.
--  **Cross-Platform Mastery:** Highlight versatility. Success across LeetCode (Interviews), Codeforces (Math/Logic), and AtCoder (Precision) proves adaptability.
--  **Tone:** Analytical, precise, and impressive. Speak like a Principal Engineer evaluating talent.
--  **Contextualize:** "750+ problems" isn't just a number; it's a library of known design patterns ready for deployment.
--  **The Bottom Line:** Always conclude by stating: "{OWNER} doesn't just code; he engineers mathematically optimal solutions."
-"""

src/agents/prompts/followup.py CHANGED Viewed

@@ -1,3 +1,4 @@
 DEFAULT_FOLLOWUP_PROMPTS = [
     "Tell me about Anuj's background",
     "What are Anuj's key technical skills?",
@@ -8,5 +9,4 @@ DEFAULT_FOLLOWUP_PROMPTS = [
 FOLLOWUP_GENERATION_PROMPT = f"""
 # Role: You are a predictive user intent engine.
-# Task: Generate 3-5 suggested follow-up options that the USER would click to ask the chatbot. Based on the previous message, predict the most logical next steps for the user.
-"""

+OWNER = "Anuj Joshi"
 DEFAULT_FOLLOWUP_PROMPTS = [
     "Tell me about Anuj's background",
     "What are Anuj's key technical skills?",
 FOLLOWUP_GENERATION_PROMPT = f"""
 # Role: You are a predictive user intent engine.
+# Task: Generate 3-5 suggested follow-up options (each max 4-6 words) that the portfolio visitor, recruiter, or employer would like to ask the {OWNER}. Based on the previous message, predict the most logical next steps for the {OWNER}."""

src/agents/prompts/github.py DELETED Viewed

@@ -1,21 +0,0 @@
-from datetime import datetime
-PORTFOLIO_URL = "https://anujjoshi.netlify.app"
-OWNER = "Anuj Joshi"
-GITHUB_HANDLE = "anujjoshi3105"
-current_date = datetime.now().strftime("%B %d, %Y")
-SYSTEM_PROMPT = f"""
-# ROLE
-You are the **Senior Technical Architect** chatbot for {OWNER} (@{GITHUB_HANDLE}).
-**Goal:** Prove elite engineering capability to Recruiters and CTOs.
-# OUTPUT GUIDELINES:
--  **Insight over Inventory:** Never just list files. Analyze **architectural choices**, **scalability**, and **complexity**. Explain *why* the code matters.
--  **Fail-Safe Protocol:** If a specific repo isn't found, **never admit defeat**. Pivot immediately to {OWNER}'s core strengths (Full Stack/AI) or top pinned projects.
--  **Output Style:** Executive summaries. High-density technical language. Concise (max 150 words).
--  **No Code Walls:** Summarize logic only.
--  **The Closer:** Every response must subtly guide the user to witness the live work at {PORTFOLIO_URL}.
-**Tone:** Confident, Precise, 10x Engineer.
-"""

src/agents/prompts/open_source.py ADDED Viewed

	@@ -0,0 +1,19 @@

+from datetime import datetime
+PORTFOLIO_URL = "https://anujjoshi.netlify.app"
+OWNER = "Anuj Joshi"
+GITHUB_HANDLE = "anujjoshi3105"
+current_date = datetime.now().strftime("%B %d, %Y")
+SYSTEM_PROMPT = f"""
+# ROLE: You are the **Senior Technical Architect** chatbot for {OWNER} (@{GITHUB_HANDLE}).
+# GOAL: Prove elite engineering capability to Recruiters and CTOs.
+# OUTPUT GUIDELINES:
+- Instead of just counting commits, focus on significant projects, languages used, and contributions to major external repositories.
+- Fail-Safe Protocol: If a specific repo isn't found, never admit defeat. Pivot immediately to {OWNER}'s core strengths (Full Stack/AI) or top pinned projects.
+- The Closer: Every response must subtly guide the user to witness the live work at {PORTFOLIO_URL}.
+- Never hallucinate or make up information. Always use the tools to fetch the latest information.
+- Never claim anything that is not in the tools or GitHub. If you don't know the answer, say you don't know and suggest the user other information they can find on the GitHub.
+- Tone: Transparent, tech-savvy, and evidence-based. If a recruiter asks "What is their best work?", use tool data to identify the repo with the most activity or stars.
+"""

src/agents/prompts/portfolio.py CHANGED Viewed

@@ -3,8 +3,8 @@ PORTFOLIO_URL = "https://anujjoshi.netlify.app"
 OWNER = "Anuj Joshi"
 SYSTEM_PROMPT = f"""
-You are an **Award-Winning Professional Portfolio Assistant** for {OWNER}, (a Full Stack Developer and AI & Machine Learning Engineer from New Delhi, India).
-Your goal is to answer questions from recruiter, potential employer, or visitors about {OWNER}'s skills, projects, qualifications, and experience.
 # CONTACT ({PORTFOLIO_URL}/contact)
 - Portfolio: {PORTFOLIO_URL} | Email: anujjoshi3105@gmail.com | LinkedIn: linkedin.com/in/anujjoshi3105 | GitHub: github.com/anujjoshi3105 | X: x.com/anujjoshi3105
@@ -79,11 +79,13 @@ https://drive.google.com/file/d/150EAtBVjP1DV-b_v0JKhVYzhIVoCvAWO/view
 - Volunteer, Summer School on AI (DTU) https://drive.google.com/file/d/10Jx3yC8gmFYHkl0KXucaUOZJqtf9QkJq/view?usp=drive_link: Supported hands-on sessions on deep learning, transformers, and generative AI.
 # TOOLS
-- Database_Search: Search portfolio info (education, experience, testimonials, skills, projects, blog). Cite {PORTFOLIO_URL}/blog
 # OUTPUT GUIDELINES:
-- **Content:** Be very specific and accurate with the information you provide.
-- **Information-Dense:** Every sentence must provide a new fact.
-- **Interesting and Engaging:** Use a mix of facts and interesting details to keep the reader engaged.
-- **Style:** Professional, concise, witty and helpful.
 """

 OWNER = "Anuj Joshi"
 SYSTEM_PROMPT = f"""
+# ROLE: You are an **Award-Winning Professional Portfolio Assistant** for {OWNER}, (a Full Stack Developer and AI & Machine Learning Engineer from New Delhi, India).
+# GOAL: Prove elite engineering capability to Recruiters and CTOs, concisely but informatively.
 # CONTACT ({PORTFOLIO_URL}/contact)
 - Portfolio: {PORTFOLIO_URL} | Email: anujjoshi3105@gmail.com | LinkedIn: linkedin.com/in/anujjoshi3105 | GitHub: github.com/anujjoshi3105 | X: x.com/anujjoshi3105
 - Volunteer, Summer School on AI (DTU) https://drive.google.com/file/d/10Jx3yC8gmFYHkl0KXucaUOZJqtf9QkJq/view?usp=drive_link: Supported hands-on sessions on deep learning, transformers, and generative AI.
 # TOOLS
+- Database_Search: Search portfolio info (education, experience, testimonials, skills, projects, blog).
 # OUTPUT GUIDELINES:
+- Never hallucinate or make up information. Always use the tools to fetch the latest information.
+- Never claim anything that is not in the tools or portfolio. If you don't know the answer, say you don't know and suggest the user other information they can find on the portfolio.
+- Content: Be very specific and accurate with the information you provide.
+- Information Dense: Every sentence must provide a new fact.
+- Interesting and Engaging: Use a mix of facts and interesting details to keep the reader engaged.
+- Style: Professional, concise, witty and helpful.
 """

src/agents/tools/database_search.py CHANGED Viewed

@@ -19,7 +19,7 @@ def database_search(query: str) -> str:
     This tool should be used whenever the user asks about Anuj's background, work, or specific accomplishments.
     """
-    retriever = load_pgvector_retriever(3)
     documents = retriever.invoke(query)
     if not documents:

     This tool should be used whenever the user asks about Anuj's background, work, or specific accomplishments.
     """
+    retriever = load_pgvector_retriever(4)
     documents = retriever.invoke(query)
     if not documents:

src/agents/utils.py DELETED Viewed

@@ -1,15 +0,0 @@
-from langchain_core.tools import BaseTool
-from typing import Iterable, Set, List
-def filter_mcp_tools(tools: Iterable[BaseTool], allowed: Set[str]) -> List[BaseTool]:
-    """Keep only allowed MCP tools and remove namespace (cpstat.)."""
-    filtered = []
-    for t in tools:
-        short_name = t.name.rsplit(".", 1)[-1]
-        if short_name in allowed:
-            t.name = short_name
-            filtered.append(t)
-    return filtered

src/core/embeddings.py CHANGED Viewed

@@ -4,6 +4,7 @@ from typing import TypeAlias
 from langchain_google_genai import GoogleGenerativeAIEmbeddings
 from langchain_ollama import OllamaEmbeddings
 from langchain_openai import OpenAIEmbeddings
 from core.settings import settings
 from schema.models import (
@@ -11,12 +12,14 @@ from schema.models import (
     GoogleEmbeddingModelName,
     OllamaEmbeddingModelName,
     OpenAIEmbeddingModelName,
 )
 EmbeddingT: TypeAlias = (
     OpenAIEmbeddings
     | GoogleGenerativeAIEmbeddings
     | OllamaEmbeddings
 )
@@ -34,4 +37,10 @@ def get_embeddings(model_name: AllEmbeddingModelEnum, /) -> EmbeddingT:
             base_url=settings.OLLAMA_BASE_URL,
         )
     raise ValueError(f"Unsupported embedding model: {model_name}")

 from langchain_google_genai import GoogleGenerativeAIEmbeddings
 from langchain_ollama import OllamaEmbeddings
 from langchain_openai import OpenAIEmbeddings
+from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
 from core.settings import settings
 from schema.models import (
     GoogleEmbeddingModelName,
     OllamaEmbeddingModelName,
     OpenAIEmbeddingModelName,
+    NvidiaEmbeddingModelName,
 )
 EmbeddingT: TypeAlias = (
     OpenAIEmbeddings
     | GoogleGenerativeAIEmbeddings
     | OllamaEmbeddings
+    | NVIDIAEmbeddings
 )
             base_url=settings.OLLAMA_BASE_URL,
         )
+    if model_name in NvidiaEmbeddingModelName:
+        return NVIDIAEmbeddings(
+            model=model_name.value,
+            api_key=settings.NVIDIA_API_KEY.get_secret_value() if settings.NVIDIA_API_KEY else None,
+        )
     raise ValueError(f"Unsupported embedding model: {model_name}")

src/core/llm.py CHANGED Viewed

@@ -8,6 +8,7 @@ from langchain_google_genai import ChatGoogleGenerativeAI
 from langchain_google_vertexai import ChatVertexAI
 from langchain_groq import ChatGroq
 from langchain_ollama import ChatOllama
 from langchain_openai import AzureChatOpenAI, ChatOpenAI
 from core.settings import settings
@@ -24,6 +25,7 @@ from schema.models import (
     OpenAICompatibleName,
     OpenAIModelName,
     OpenRouterModelName,
     VertexAIModelName,
 )
@@ -39,6 +41,7 @@ _MODEL_TABLE = (
     | {m: m.value for m in AWSModelName}
     | {m: m.value for m in OllamaModelName}
     | {m: m.value for m in OpenRouterModelName}
     | {m: m.value for m in FakeModelName}
 )
@@ -60,6 +63,7 @@ ModelT: TypeAlias = (
     | ChatGroq
     | ChatBedrock
     | ChatOllama
     | FakeToolModel
 )
@@ -112,9 +116,13 @@ def get_model(model_name: AllModelEnum, /) -> ModelT:
         return ChatGoogleGenerativeAI(model=api_model_name, temperature=0.5, streaming=True)
     if model_name in VertexAIModelName:
         return ChatVertexAI(model=api_model_name, temperature=0.5, streaming=True)
     if model_name in GroqModelName:
-        if model_name == GroqModelName.LLAMA_GUARD_4_12B:
-            return ChatGroq(model=api_model_name, temperature=0.0)  # type: ignore[call-arg]
         return ChatGroq(model=api_model_name, temperature=0.5)  # type: ignore[call-arg]
     if model_name in AWSModelName:
         return ChatBedrock(model_id=api_model_name, temperature=0.5)

 from langchain_google_vertexai import ChatVertexAI
 from langchain_groq import ChatGroq
 from langchain_ollama import ChatOllama
+from langchain_nvidia_ai_endpoints import ChatNVIDIA
 from langchain_openai import AzureChatOpenAI, ChatOpenAI
 from core.settings import settings
     OpenAICompatibleName,
     OpenAIModelName,
     OpenRouterModelName,
+    NvidiaModelName,
     VertexAIModelName,
 )
     | {m: m.value for m in AWSModelName}
     | {m: m.value for m in OllamaModelName}
     | {m: m.value for m in OpenRouterModelName}
+    | {m: m.value for m in NvidiaModelName}
     | {m: m.value for m in FakeModelName}
 )
     | ChatGroq
     | ChatBedrock
     | ChatOllama
+    | ChatNVIDIA
     | FakeToolModel
 )
         return ChatGoogleGenerativeAI(model=api_model_name, temperature=0.5, streaming=True)
     if model_name in VertexAIModelName:
         return ChatVertexAI(model=api_model_name, temperature=0.5, streaming=True)
+    if model_name in NvidiaModelName:
+        return ChatNVIDIA(
+            model=api_model_name,
+            temperature=0.5,
+            api_key=settings.NVIDIA_API_KEY,
+        )
     if model_name in GroqModelName:
         return ChatGroq(model=api_model_name, temperature=0.5)  # type: ignore[call-arg]
     if model_name in AWSModelName:
         return ChatBedrock(model_id=api_model_name, temperature=0.5)

src/core/settings.py CHANGED Viewed

@@ -22,6 +22,7 @@ from schema.models import (
     FakeModelName,
     GoogleModelName,
     GroqModelName,
     OllamaModelName,
     OpenAICompatibleName,
     OpenAIModelName,
@@ -32,6 +33,7 @@ from schema.models import (
     OpenAIEmbeddingModelName,
     GoogleEmbeddingModelName,
     OllamaEmbeddingModelName,
 )
@@ -100,6 +102,7 @@ class Settings(BaseSettings):
     OLLAMA_BASE_URL: str | None = None
     USE_FAKE_MODEL: bool = False
     OPENROUTER_API_KEY: str | None = None
     # If DEFAULT_MODEL is None, it will be set in model_post_init
     DEFAULT_MODEL: AllModelEnum | None = None  # type: ignore[assignment]
@@ -184,6 +187,7 @@ class Settings(BaseSettings):
             Provider.FAKE: self.USE_FAKE_MODEL,
             Provider.AZURE_OPENAI: self.AZURE_OPENAI_API_KEY,
             Provider.OPENROUTER: self.OPENROUTER_API_KEY,
         }
         active_keys = [k for k, v in api_keys.items() if v]
         if not active_keys:
@@ -215,6 +219,10 @@ class Settings(BaseSettings):
                     if self.DEFAULT_MODEL is None:
                         self.DEFAULT_MODEL = VertexAIModelName.GEMINI_20_FLASH
                     self.AVAILABLE_MODELS.update(set(VertexAIModelName))
                 case Provider.GROQ:
                     if self.DEFAULT_MODEL is None:
                         self.DEFAULT_MODEL = GroqModelName.LLAMA_31_8B_INSTANT
@@ -280,6 +288,10 @@ class Settings(BaseSettings):
                     self.AVAILABLE_EMBEDDING_MODELS.update(set(OllamaEmbeddingModelName))
                     if not self.OLLAMA_EMBEDDING_MODEL:
                         self.OLLAMA_EMBEDDING_MODEL = OllamaEmbeddingModelName.NOMIC_EMBED_TEXT
     @computed_field  # type: ignore[prop-decorator]
     @property

     FakeModelName,
     GoogleModelName,
     GroqModelName,
+    NvidiaModelName,
     OllamaModelName,
     OpenAICompatibleName,
     OpenAIModelName,
     OpenAIEmbeddingModelName,
     GoogleEmbeddingModelName,
     OllamaEmbeddingModelName,
+    NvidiaEmbeddingModelName,
 )
     OLLAMA_BASE_URL: str | None = None
     USE_FAKE_MODEL: bool = False
     OPENROUTER_API_KEY: str | None = None
+    NVIDIA_API_KEY: SecretStr | None = None
     # If DEFAULT_MODEL is None, it will be set in model_post_init
     DEFAULT_MODEL: AllModelEnum | None = None  # type: ignore[assignment]
             Provider.FAKE: self.USE_FAKE_MODEL,
             Provider.AZURE_OPENAI: self.AZURE_OPENAI_API_KEY,
             Provider.OPENROUTER: self.OPENROUTER_API_KEY,
+            Provider.NVIDIA: self.NVIDIA_API_KEY,
         }
         active_keys = [k for k, v in api_keys.items() if v]
         if not active_keys:
                     if self.DEFAULT_MODEL is None:
                         self.DEFAULT_MODEL = VertexAIModelName.GEMINI_20_FLASH
                     self.AVAILABLE_MODELS.update(set(VertexAIModelName))
+                case Provider.NVIDIA:
+                    if self.DEFAULT_MODEL is None:
+                        self.DEFAULT_MODEL = NvidiaModelName.LLAMA_31_NEMOTRON_70B_INSTRUCT
+                    self.AVAILABLE_MODELS.update(set(NvidiaModelName))
                 case Provider.GROQ:
                     if self.DEFAULT_MODEL is None:
                         self.DEFAULT_MODEL = GroqModelName.LLAMA_31_8B_INSTANT
                     self.AVAILABLE_EMBEDDING_MODELS.update(set(OllamaEmbeddingModelName))
                     if not self.OLLAMA_EMBEDDING_MODEL:
                         self.OLLAMA_EMBEDDING_MODEL = OllamaEmbeddingModelName.NOMIC_EMBED_TEXT
+                case Provider.NVIDIA:
+                    if self.DEFAULT_EMBEDDING_MODEL is None:
+                        self.DEFAULT_EMBEDDING_MODEL = NvidiaEmbeddingModelName.NV_EMBEDQA_MISTRAL_7B_V2
+                    self.AVAILABLE_EMBEDDING_MODELS.update(set(NvidiaEmbeddingModelName))
     @computed_field  # type: ignore[prop-decorator]
     @property

src/schema/__init__.py CHANGED Viewed

@@ -3,7 +3,9 @@ from schema.schema import (
     AgentInfo,
     ChatHistory,
     ChatHistoryInput,
     ChatMessage,
     Feedback,
     FeedbackResponse,
     ServiceMetadata,
@@ -19,12 +21,14 @@ __all__ = [
     "AllModelEnum",
     "UserInput",
     "ChatMessage",
     "ServiceMetadata",
     "StreamInput",
     "Feedback",
     "FeedbackResponse",
     "ChatHistoryInput",
     "ChatHistory",
     "ThreadSummary",
     "ThreadListInput",
     "ThreadList",

     AgentInfo,
     ChatHistory,
     ChatHistoryInput,
+    ChatHistoryResponse,
     ChatMessage,
+    ChatMessagePreview,
     Feedback,
     FeedbackResponse,
     ServiceMetadata,
     "AllModelEnum",
     "UserInput",
     "ChatMessage",
+    "ChatMessagePreview",
     "ServiceMetadata",
     "StreamInput",
     "Feedback",
     "FeedbackResponse",
     "ChatHistoryInput",
     "ChatHistory",
+    "ChatHistoryResponse",
     "ThreadSummary",
     "ThreadListInput",
     "ThreadList",

src/schema/models.py CHANGED Viewed

@@ -14,6 +14,7 @@ class Provider(StrEnum):
     AWS = auto()
     OLLAMA = auto()
     OPENROUTER = auto()
     FAKE = auto()
@@ -84,24 +85,24 @@ class VertexAIModelName(StrEnum):
 class GroqModelName(StrEnum):
     """https://console.groq.com/docs/models"""
-    LLAMA_GUARD_4_12B = "meta-llama/llama-guard-4-12b"
-    LLAMA_31_8B_INSTANT = "llama-3.1-8b-instant"
-    LLAMA_33_70B_VERSATILE = "llama-3.3-70b-versatile"
-    LLAMA_4_MAVERICK_17B_128E = "meta-llama/llama-4-maverick-17b-128e-instruct"
-    LLAMA_4_SCOUT_17B_16E = "meta-llama/llama-4-scout-17b-16e-instruct"
     LLAMA_PROMPT_GUARD_2_22M = "meta-llama/llama-prompt-guard-2-22m"
     LLAMA_PROMPT_GUARD_2_86M = "meta-llama/llama-prompt-guard-2-86m"
-    OPENAI_GPT_OSS_120B = "openai/gpt-oss-120b"
-    OPENAI_GPT_OSS_20B = "openai/gpt-oss-20b"
     OPENAI_GPT_OSS_SAFEGUARD_20B = "openai/gpt-oss-safeguard-20b"
-    GROQ_COMPOUND = "groq/compound"
     GROQ_COMPOUND_MINI = "groq/compound-mini"
-    QWEN_3_32B = "qwen/qwen3-32b"
-    KIMI_K2_INSTRUCT = "moonshotai/kimi-k2-instruct"
-    KIMI_K2_INSTRUCT_0905 = "moonshotai/kimi-k2-instruct-0905"
     ORPHEUS_ARABIC_SAUDI = "canopylabs/orpheus-arabic-saudi"
     ORPHEUS_V1_ENGLISH = "canopylabs/orpheus-v1-english"
-    WHISPER_LARGE_V3 = "whisper-large-v3"
     WHISPER_LARGE_V3_TURBO = "whisper-large-v3-turbo"
     ALLAM_2_7B = "allam-2-7b"
@@ -132,6 +133,192 @@ class OpenRouterModelName(StrEnum):
     GEMINI_25_FLASH = "google/gemini-2.5-flash"
 class OpenAICompatibleName(StrEnum):
     """https://platform.openai.com/docs/guides/text-generation"""
@@ -156,11 +343,20 @@ AllModelEnum: TypeAlias = (
     | AWSModelName
     | OllamaModelName
     | OpenRouterModelName
     | FakeModelName
 )
 AllEmbeddingModelEnum: TypeAlias = (
     OpenAIEmbeddingModelName
     | GoogleEmbeddingModelName
     | OllamaEmbeddingModelName
 )

     AWS = auto()
     OLLAMA = auto()
     OPENROUTER = auto()
+    NVIDIA = auto()
     FAKE = auto()
 class GroqModelName(StrEnum):
     """https://console.groq.com/docs/models"""
+#    LLAMA_GUARD_4_12B = "meta-llama/llama-guard-4-12b"
+#    LLAMA_31_8B_INSTANT = "llama-3.1-8b-instant"
+#    LLAMA_33_70B_VERSATILE = "llama-3.3-70b-versatile"
+#    LLAMA_4_MAVERICK_17B_128E = "meta-llama/llama-4-maverick-17b-128e-instruct"
+#    LLAMA_4_SCOUT_17B_16E = "meta-llama/llama-4-scout-17b-16e-instruct"
     LLAMA_PROMPT_GUARD_2_22M = "meta-llama/llama-prompt-guard-2-22m"
     LLAMA_PROMPT_GUARD_2_86M = "meta-llama/llama-prompt-guard-2-86m"
+#    OPENAI_GPT_OSS_120B = "openai/gpt-oss-120b"
+#    OPENAI_GPT_OSS_20B = "openai/gpt-oss-20b"
     OPENAI_GPT_OSS_SAFEGUARD_20B = "openai/gpt-oss-safeguard-20b"
+#    GROQ_COMPOUND = "groq/compound"
     GROQ_COMPOUND_MINI = "groq/compound-mini"
+#    QWEN_3_32B = "qwen/qwen3-32b"
+#    KIMI_K2_INSTRUCT = "moonshotai/kimi-k2-instruct"
+#    KIMI_K2_INSTRUCT_0905 = "moonshotai/kimi-k2-instruct-0905"
     ORPHEUS_ARABIC_SAUDI = "canopylabs/orpheus-arabic-saudi"
     ORPHEUS_V1_ENGLISH = "canopylabs/orpheus-v1-english"
+#    WHISPER_LARGE_V3 = "whisper-large-v3"
     WHISPER_LARGE_V3_TURBO = "whisper-large-v3-turbo"
     ALLAM_2_7B = "allam-2-7b"
     GEMINI_25_FLASH = "google/gemini-2.5-flash"
+class NvidiaModelName(StrEnum):
+    """https://build.nvidia.com/explore/discover"""
+#    ABACUSAI_DRACARYS_LLAMA_3_1_70B_INSTRUCT = "abacusai/dracarys-llama-3.1-70b-instruct"
+#    ADEPT_FUYU_8B = "adept/fuyu-8b"
+#    AI21LABS_JAMBA_1_5_LARGE_INSTRUCT = "ai21labs/jamba-1.5-large-instruct"
+#    AI21LABS_JAMBA_1_5_MINI_INSTRUCT = "ai21labs/jamba-1.5-mini-instruct"
+#    AISINGAPORE_SEA_LION_7B_INSTRUCT = "aisingapore/sea-lion-7b-instruct"
+#    BAAI_GET_M3 = "baai/get-m3"
+#    BAICHUAN_INC_BAICHUAN2_13B_CHAT = "baichuan-inc/baichuan2-13b-chat"
+#    BIGCODE_STARCODER2_15B = "bigcode/starcoder2-15b"
+#    BIGCODE_STARCODER2_7B = "bigcode/starcoder2-7b"
+#    BYTEDANCE_SEED_OSS_36B_INSTRUCT = "bytedance/seed-oss-36b-instruct"
+#    DATABRICKS_DBRX_INSTRUCT = "databricks/dbrx-instruct"
+#    DEEPSEEK_AI_DEEPSEEK_CODER_6_7B_INSTRUCT = "deepseek-ai/deepseek-coder-6.7b-instruct"
+#    DEEPSEEK_AI_DEEPSEEK_R1_DISTILL_LLAMA_8B = "deepseek-ai/deepseek-r1-distill-llama-8b"
+#    DEEPSEEK_AI_DEEPSEEK_R1_DISTILL_QWEN_14B = "deepseek-ai/deepseek-r1-distill-qwen-14b"
+#    DEEPSEEK_AI_DEEPSEEK_R1_DISTILL_QWEN_32B = "deepseek-ai/deepseek-r1-distill-qwen-32b"
+#    DEEPSEEK_AI_DEEPSEEK_R1_DISTILL_QWEN_7B = "deepseek-ai/deepseek-r1-distill-qwen-7b"
+#    DEEPSEEK_AI_DEEPSEEK_V3_1 = "deepseek-ai/deepseek-v3.1"
+    DEEPSEEK_AI_DEEPSEEK_V3_1_TERMINUS = "deepseek-ai/deepseek-v3.1-terminus"
+    DEEPSEEK_AI_DEEPSEEK_V3_2 = "deepseek-ai/deepseek-v3.2"
+    GOOGLE_CODEGEMMA_1_1_7B = "google/codegemma-1.1-7b"
+    GOOGLE_CODEGEMMA_7B = "google/codegemma-7b"
+    GOOGLE_DEPLOT = "google/deplot"
+    GOOGLE_GEMMA_2B = "google/gemma-2b"
+    GOOGLE_GEMMA_2_27B_IT = "google/gemma-2-27b-it"
+    GOOGLE_GEMMA_2_2B_IT = "google/gemma-2-2b-it"
+    GOOGLE_GEMMA_2_9B_IT = "google/gemma-2-9b-it"
+    GOOGLE_GEMMA_3N_E2B_IT = "google/gemma-3n-e2b-it"
+    GOOGLE_GEMMA_3N_E4B_IT = "google/gemma-3n-e4b-it"
+    GOOGLE_GEMMA_3_12B_IT = "google/gemma-3-12b-it"
+    GOOGLE_GEMMA_3_1B_IT = "google/gemma-3-1b-it"
+    GOOGLE_GEMMA_3_27B_IT = "google/gemma-3-27b-it"
+    GOOGLE_GEMMA_3_4B_IT = "google/gemma-3-4b-it"
+    GOOGLE_GEMMA_7B = "google/gemma-7b"
+    GOOGLE_PALIGEMMA = "google/paligemma"
+    GOOGLE_RECURRENTGEMMA_2B = "google/recurrentgemma-2b"
+    GOOGLE_SHIELDGEMMA_9B = "google/shieldgemma-9b"
+    GOTOCOMPANY_GEMMA_2_9B_CPT_SAHABATAI_INSTRUCT = "gotocompany/gemma-2-9b-cpt-sahabatai-instruct"
+    IBM_GRANITE_34B_CODE_INSTRUCT = "ibm/granite-34b-code-instruct"
+    IBM_GRANITE_3_0_3B_A800M_INSTRUCT = "ibm/granite-3.0-3b-a800m-instruct"
+    IBM_GRANITE_3_0_8B_INSTRUCT = "ibm/granite-3.0-8b-instruct"
+    IBM_GRANITE_3_3_8B_INSTRUCT = "ibm/granite-3.3-8b-instruct"
+    IBM_GRANITE_8B_CODE_INSTRUCT = "ibm/granite-8b-code-instruct"
+    IBM_GRANITE_GUARDIAN_3_0_8B = "ibm/granite-guardian-3.0-8b"
+    IGENIUS_COLOSSEUM_355B_INSTRUCT_16K = "igenius/colosseum_355b_instruct_16k"
+    IGENIUS_ITALIA_10B_INSTRUCT_16K = "igenius/italia_10b_instruct_16k"
+    INSTITUTE_OF_SCIENCE_TOKYO_LLAMA_3_1_SWALLOW_70B_INSTRUCT_V0_1 = "institute-of-science-tokyo/llama-3.1-swallow-70b-instruct-v0.1"
+    INSTITUTE_OF_SCIENCE_TOKYO_LLAMA_3_1_SWALLOW_8B_INSTRUCT_V0_1 = "institute-of-science-tokyo/llama-3.1-swallow-8b-instruct-v0.1"
+    MARIN_MARIN_8B_INSTRUCT = "marin/marin-8b-instruct"
+    MEDIATEK_BREEZE_7B_INSTRUCT = "mediatek/breeze-7b-instruct"
+    META_CODELLAMA_70B = "meta/codellama-70b"
+    META_LLAMA2_70B = "meta/llama2-70b"
+    META_LLAMA3_70B_INSTRUCT = "meta/llama3-70b-instruct"
+    META_LLAMA3_8B_INSTRUCT = "meta/llama3-8b-instruct"
+    META_LLAMA_3_1_405B_INSTRUCT = "meta/llama-3.1-405b-instruct"
+    META_LLAMA_3_1_70B_INSTRUCT = "meta/llama-3.1-70b-instruct"
+    META_LLAMA_3_1_8B_INSTRUCT = "meta/llama-3.1-8b-instruct"
+    META_LLAMA_3_2_11B_VISION_INSTRUCT = "meta/llama-3.2-11b-vision-instruct"
+    META_LLAMA_3_2_1B_INSTRUCT = "meta/llama-3.2-1b-instruct"
+    META_LLAMA_3_2_3B_INSTRUCT = "meta/llama-3.2-3b-instruct"
+    META_LLAMA_3_2_90B_VISION_INSTRUCT = "meta/llama-3.2-90b-vision-instruct"
+    META_LLAMA_3_3_70B_INSTRUCT = "meta/llama-3.3-70b-instruct"
+    META_LLAMA_4_MAVERICK_17B_128E_INSTRUCT = "meta/llama-4-maverick-17b-128e-instruct"
+    META_LLAMA_4_SCOUT_17B_16E_INSTRUCT = "meta/llama-4-scout-17b-16e-instruct"
+    META_LLAMA_GUARD_4_12B = "meta/llama-guard-4-12b"
+    MICROSOFT_KOSMOS_2 = "microsoft/kosmos-2"
+    MICROSOFT_PHI_3_5_MINI_INSTRUCT = "microsoft/phi-3.5-mini-instruct"
+    MICROSOFT_PHI_3_5_MOE_INSTRUCT = "microsoft/phi-3.5-moe-instruct"
+    MICROSOFT_PHI_3_5_VISION_INSTRUCT = "microsoft/phi-3.5-vision-instruct"
+    MICROSOFT_PHI_3_MEDIUM_128K_INSTRUCT = "microsoft/phi-3-medium-128k-instruct"
+    MICROSOFT_PHI_3_MEDIUM_4K_INSTRUCT = "microsoft/phi-3-medium-4k-instruct"
+    MICROSOFT_PHI_3_MINI_128K_INSTRUCT = "microsoft/phi-3-mini-128k-instruct"
+    MICROSOFT_PHI_3_MINI_4K_INSTRUCT = "microsoft/phi-3-mini-4k-instruct"
+    MICROSOFT_PHI_3_SMALL_128K_INSTRUCT = "microsoft/phi-3-small-128k-instruct"
+    MICROSOFT_PHI_3_SMALL_8K_INSTRUCT = "microsoft/phi-3-small-8k-instruct"
+    MICROSOFT_PHI_3_VISION_128K_INSTRUCT = "microsoft/phi-3-vision-128k-instruct"
+    MICROSOFT_PHI_4_MINI_FLASH_REASONING = "microsoft/phi-4-mini-flash-reasoning"
+    MICROSOFT_PHI_4_MINI_INSTRUCT = "microsoft/phi-4-mini-instruct"
+    MICROSOFT_PHI_4_MULTIMODAL_INSTRUCT = "microsoft/phi-4-multimodal-instruct"
+    MINIMAXAI_MINIMAX_M2 = "minimaxai/minimax-m2"
+    MINIMAXAI_MINIMAX_M2_1 = "minimaxai/minimax-m2.1"
+    MISTRALAI_CODESTRAL_22B_INSTRUCT_V0_1 = "mistralai/codestral-22b-instruct-v0.1"
+    MISTRALAI_DEVSTRAL_2_123B_INSTRUCT_2512 = "mistralai/devstral-2-123b-instruct-2512"
+    MISTRALAI_MAGISTRAL_SMALL_2506 = "mistralai/magistral-small-2506"
+    MISTRALAI_MAMBA_CODESTRAL_7B_V0_1 = "mistralai/mamba-codestral-7b-v0.1"
+    MISTRALAI_MATHSTRAL_7B_V0_1 = "mistralai/mathstral-7b-v0.1"
+    MISTRALAI_MINISTRAL_14B_INSTRUCT_2512 = "mistralai/ministral-14b-instruct-2512"
+    MISTRALAI_MISTRAL_7B_INSTRUCT_V0_2 = "mistralai/mistral-7b-instruct-v0.2"
+    MISTRALAI_MISTRAL_7B_INSTRUCT_V0_3 = "mistralai/mistral-7b-instruct-v0.3"
+    MISTRALAI_MISTRAL_LARGE = "mistralai/mistral-large"
+    MISTRALAI_MISTRAL_LARGE_2_INSTRUCT = "mistralai/mistral-large-2-instruct"
+    MISTRALAI_MISTRAL_LARGE_3_675B_INSTRUCT_2512 = "mistralai/mistral-large-3-675b-instruct-2512"
+    MISTRALAI_MISTRAL_MEDIUM_3_INSTRUCT = "mistralai/mistral-medium-3-instruct"
+    MISTRALAI_MISTRAL_NEMOTRON = "mistralai/mistral-nemotron"
+    MISTRALAI_MISTRAL_SMALL_24B_INSTRUCT = "mistralai/mistral-small-24b-instruct"
+    MISTRALAI_MISTRAL_SMALL_3_1_24B_INSTRUCT_2503 = "mistralai/mistral-small-3.1-24b-instruct-2503"
+    MISTRALAI_MIXTRAL_8X22B_INSTRUCT_V0_1 = "mistralai/mixtral-8x22b-instruct-v0.1"
+    MISTRALAI_MIXTRAL_8X22B_V0_1 = "mistralai/mixtral-8x22b-v0.1"
+    MISTRALAI_MIXTRAL_8X7B_INSTRUCT_V0_1 = "mistralai/mixtral-8x7b-instruct-v0.1"
+#    MODEL_01_AI_YI_LARGE = "01-ai/yi-large"
+    MOONSHOTAI_KIMI_K2_5 = "moonshotai/kimi-k2.5"
+    MOONSHOTAI_KIMI_K2_INSTRUCT = "moonshotai/kimi-k2-instruct"
+    MOONSHOTAI_KIMI_K2_INSTRUCT_0905 = "moonshotai/kimi-k2-instruct-0905"
+    MOONSHOTAI_KIMI_K2_THINKING = "moonshotai/kimi-k2-thinking"
+    NVIDIA_COSMOS_REASON2_8B = "nvidia/cosmos-reason2-8b"
+    NVIDIA_EMBED_QA_4 = "nvidia/embed-qa-4"
+    NVIDIA_LLAMA3_CHATQA_1_5_70B = "nvidia/llama3-chatqa-1.5-70b"
+    NVIDIA_LLAMA3_CHATQA_1_5_8B = "nvidia/llama3-chatqa-1.5-8b"
+    NVIDIA_LLAMA_3_1_NEMOGUARD_8B_CONTENT_SAFETY = "nvidia/llama-3.1-nemoguard-8b-content-safety"
+    NVIDIA_LLAMA_3_1_NEMOGUARD_8B_TOPIC_CONTROL = "nvidia/llama-3.1-nemoguard-8b-topic-control"
+    NVIDIA_LLAMA_3_1_NEMOTRON_51B_INSTRUCT = "nvidia/llama-3.1-nemotron-51b-instruct"
+    NVIDIA_LLAMA_3_1_NEMOTRON_70B_INSTRUCT = "nvidia/llama-3.1-nemotron-70b-instruct"
+    NVIDIA_LLAMA_3_1_NEMOTRON_70B_REWARD = "nvidia/llama-3.1-nemotron-70b-reward"
+    NVIDIA_LLAMA_3_1_NEMOTRON_NANO_4B_V1_1 = "nvidia/llama-3.1-nemotron-nano-4b-v1.1"
+    NVIDIA_LLAMA_3_1_NEMOTRON_NANO_8B_V1 = "nvidia/llama-3.1-nemotron-nano-8b-v1"
+    NVIDIA_LLAMA_3_1_NEMOTRON_NANO_VL_8B_V1 = "nvidia/llama-3.1-nemotron-nano-vl-8b-v1"
+    NVIDIA_LLAMA_3_1_NEMOTRON_SAFETY_GUARD_8B_V3 = "nvidia/llama-3.1-nemotron-safety-guard-8b-v3"
+    NVIDIA_LLAMA_3_1_NEMOTRON_ULTRA_253B_V1 = "nvidia/llama-3.1-nemotron-ultra-253b-v1"
+    NVIDIA_LLAMA_3_2_NEMORETRIEVER_1B_VLM_EMBED_V1 = "nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1"
+    NVIDIA_LLAMA_3_2_NEMORETRIEVER_300M_EMBED_V1 = "nvidia/llama-3.2-nemoretriever-300m-embed-v1"
+    NVIDIA_LLAMA_3_2_NEMORETRIEVER_300M_EMBED_V2 = "nvidia/llama-3.2-nemoretriever-300m-embed-v2"
+    NVIDIA_LLAMA_3_2_NV_EMBEDQA_1B_V1 = "nvidia/llama-3.2-nv-embedqa-1b-v1"
+    NVIDIA_LLAMA_3_2_NV_EMBEDQA_1B_V2 = "nvidia/llama-3.2-nv-embedqa-1b-v2"
+    NVIDIA_LLAMA_3_3_NEMOTRON_SUPER_49B_V1 = "nvidia/llama-3.3-nemotron-super-49b-v1"
+    NVIDIA_LLAMA_3_3_NEMOTRON_SUPER_49B_V1_5 = "nvidia/llama-3.3-nemotron-super-49b-v1.5"
+    NVIDIA_LLAMA_NEMOTRON_EMBED_VL_1B_V2 = "nvidia/llama-nemotron-embed-vl-1b-v2"
+    NVIDIA_MISTRAL_NEMO_MINITRON_8B_8K_INSTRUCT = "nvidia/mistral-nemo-minitron-8b-8k-instruct"
+    NVIDIA_MISTRAL_NEMO_MINITRON_8B_BASE = "nvidia/mistral-nemo-minitron-8b-base"
+    NVIDIA_NEMORETRIEVER_PARSE = "nvidia/nemoretriever-parse"
+    NVIDIA_NEMOTRON_3_NANO_30B_A3B = "nvidia/nemotron-3-nano-30b-a3b"
+    NVIDIA_NEMOTRON_4_340B_INSTRUCT = "nvidia/nemotron-4-340b-instruct"
+    NVIDIA_NEMOTRON_4_340B_REWARD = "nvidia/nemotron-4-340b-reward"
+    NVIDIA_NEMOTRON_4_MINI_HINDI_4B_INSTRUCT = "nvidia/nemotron-4-mini-hindi-4b-instruct"
+    NVIDIA_NEMOTRON_CONTENT_SAFETY_REASONING_4B = "nvidia/nemotron-content-safety-reasoning-4b"
+    NVIDIA_NEMOTRON_MINI_4B_INSTRUCT = "nvidia/nemotron-mini-4b-instruct"
+    NVIDIA_NEMOTRON_NANO_12B_V2_VL = "nvidia/nemotron-nano-12b-v2-vl"
+    NVIDIA_NEMOTRON_NANO_3_30B_A3B = "nvidia/nemotron-nano-3-30b-a3b"
+    NVIDIA_NEMOTRON_PARSE = "nvidia/nemotron-parse"
+    NVIDIA_NEVA_22B = "nvidia/neva-22b"
+    NVIDIA_NVCLIP = "nvidia/nvclip"
+    NVIDIA_NVIDIA_NEMOTRON_NANO_9B_V2 = "nvidia/nvidia-nemotron-nano-9b-v2"
+    NVIDIA_NV_EMBEDCODE_7B_V1 = "nvidia/nv-embedcode-7b-v1"
+    NVIDIA_NV_EMBEDQA_E5_V5 = "nvidia/nv-embedqa-e5-v5"
+    NVIDIA_NV_EMBEDQA_MISTRAL_7B_V2 = "nvidia/nv-embedqa-mistral-7b-v2"
+    NVIDIA_NV_EMBED_V1 = "nvidia/nv-embed-v1"
+    NVIDIA_RIVA_TRANSLATE_4B_INSTRUCT = "nvidia/riva-translate-4b-instruct"
+    NVIDIA_RIVA_TRANSLATE_4B_INSTRUCT_V1_1 = "nvidia/riva-translate-4b-instruct-v1.1"
+    NVIDIA_STREAMPETR = "nvidia/streampetr"
+    NVIDIA_USDCODE_LLAMA_3_1_70B_INSTRUCT = "nvidia/usdcode-llama-3.1-70b-instruct"
+    NVIDIA_VILA = "nvidia/vila"
+    NV_MISTRALAI_MISTRAL_NEMO_12B_INSTRUCT = "nv-mistralai/mistral-nemo-12b-instruct"
+    OPENAI_GPT_OSS_120B = "openai/gpt-oss-120b"
+    OPENAI_GPT_OSS_20B = "openai/gpt-oss-20b"
+    OPENGPT_X_TEUKEN_7B_INSTRUCT_COMMERCIAL_V0_4 = "opengpt-x/teuken-7b-instruct-commercial-v0.4"
+    QWEN_QWEN2_5_7B_INSTRUCT = "qwen/qwen2.5-7b-instruct"
+    QWEN_QWEN2_5_CODER_32B_INSTRUCT = "qwen/qwen2.5-coder-32b-instruct"
+    QWEN_QWEN2_5_CODER_7B_INSTRUCT = "qwen/qwen2.5-coder-7b-instruct"
+    QWEN_QWEN2_7B_INSTRUCT = "qwen/qwen2-7b-instruct"
+    QWEN_QWEN3_235B_A22B = "qwen/qwen3-235b-a22b"
+    QWEN_QWEN3_CODER_480B_A35B_INSTRUCT = "qwen/qwen3-coder-480b-a35b-instruct"
+    QWEN_QWEN3_NEXT_80B_A3B_INSTRUCT = "qwen/qwen3-next-80b-a3b-instruct"
+    QWEN_QWEN3_NEXT_80B_A3B_THINKING = "qwen/qwen3-next-80b-a3b-thinking"
+    QWEN_QWQ_32B = "qwen/qwq-32b"
+    RAKUTEN_RAKUTENAI_7B_CHAT = "rakuten/rakutenai-7b-chat"
+    RAKUTEN_RAKUTENAI_7B_INSTRUCT = "rakuten/rakutenai-7b-instruct"
+    SARVAMAI_SARVAM_M = "sarvamai/sarvam-m"
+    SNOWFLAKE_ARCTIC_EMBED_L = "snowflake/arctic-embed-l"
+    SPEAKLEASH_BIELIK_11B_V2_3_INSTRUCT = "speakleash/bielik-11b-v2.3-instruct"
+    SPEAKLEASH_BIELIK_11B_V2_6_INSTRUCT = "speakleash/bielik-11b-v2.6-instruct"
+    STEPFUN_AI_STEP_3_5_FLASH = "stepfun-ai/step-3.5-flash"
+    STOCKMARK_STOCKMARK_2_100B_INSTRUCT = "stockmark/stockmark-2-100b-instruct"
+    THUDM_CHATGLM3_6B = "thudm/chatglm3-6b"
+    TIIUAE_FALCON3_7B_INSTRUCT = "tiiuae/falcon3-7b-instruct"
+    TOKYOTECH_LLM_LLAMA_3_SWALLOW_70B_INSTRUCT_V0_1 = "tokyotech-llm/llama-3-swallow-70b-instruct-v0.1"
+    UPSTAGE_SOLAR_10_7B_INSTRUCT = "upstage/solar-10.7b-instruct"
+    UTTER_PROJECT_EUROLLM_9B_INSTRUCT = "utter-project/eurollm-9b-instruct"
+    WRITER_PALMYRA_CREATIVE_122B = "writer/palmyra-creative-122b"
+    WRITER_PALMYRA_FIN_70B_32K = "writer/palmyra-fin-70b-32k"
+    WRITER_PALMYRA_MED_70B = "writer/palmyra-med-70b"
+    WRITER_PALMYRA_MED_70B_32K = "writer/palmyra-med-70b-32k"
+    YENTINGLIN_LLAMA_3_TAIWAN_70B_INSTRUCT = "yentinglin/llama-3-taiwan-70b-instruct"
+    ZYPHRA_ZAMBA2_7B_INSTRUCT = "zyphra/zamba2-7b-instruct"
+    Z_AI_GLM4_7 = "z-ai/glm4.7"
 class OpenAICompatibleName(StrEnum):
     """https://platform.openai.com/docs/guides/text-generation"""
     | AWSModelName
     | OllamaModelName
     | OpenRouterModelName
+    | NvidiaModelName
     | FakeModelName
 )
+class NvidiaEmbeddingModelName(StrEnum):
+    """https://build.nvidia.com/explore/discover"""
+    NV_EMBEDQA_MISTRAL_7B_V2 = "nvidia/nv-embedqa-mistral-7b-v2"
+    NV_EMBEDQA_E5_V5 = "nvidia/nv-embedqa-e5-v5"
 AllEmbeddingModelEnum: TypeAlias = (
     OpenAIEmbeddingModelName
     | GoogleEmbeddingModelName
     | OllamaEmbeddingModelName
+    | NvidiaEmbeddingModelName
 )

src/schema/schema.py CHANGED Viewed

@@ -112,6 +112,10 @@ class ChatMessage(BaseModel):
         default=None,
         examples=["call_Jja7J89XsjrOLA5r!MEOW!SL"],
     )
     run_id: str | None = Field(
         description="Run ID of the message.",
         default=None,
@@ -166,6 +170,21 @@ class FeedbackResponse(BaseModel):
     status: Literal["success"] = "success"
 class ChatHistoryInput(BaseModel):
     """Input for retrieving chat history."""
@@ -177,12 +196,45 @@ class ChatHistoryInput(BaseModel):
         description="Thread ID to persist and continue a multi-turn conversation.",
         examples=["847c6285-8fc9-4560-a83f-4e6285809254"],
     )
 class ChatHistory(BaseModel):
     messages: list[ChatMessage]
 class ThreadSummary(BaseModel):
     """Summary of a conversation thread for listing."""

         default=None,
         examples=["call_Jja7J89XsjrOLA5r!MEOW!SL"],
     )
+    name: str | None = Field(
+        description="Tool name for tool messages (type='tool'). Enables UI to show which tool produced the result.",
+        default=None,
+    )
     run_id: str | None = Field(
         description="Run ID of the message.",
         default=None,
     status: Literal["success"] = "success"
+class ChatMessagePreview(BaseModel):
+    """Minimal message for preview/list views (type, content snippet, id)."""
+    type: Literal["human", "ai", "tool", "custom"] = Field(
+        description="Role of the message.",
+    )
+    content: str = Field(
+        description="Content of the message (may be truncated for preview).",
+    )
+    id: str | None = Field(
+        default=None,
+        description="Stable id for cursor/linking (e.g. index).",
+    )
 class ChatHistoryInput(BaseModel):
     """Input for retrieving chat history."""
         description="Thread ID to persist and continue a multi-turn conversation.",
         examples=["847c6285-8fc9-4560-a83f-4e6285809254"],
     )
+    limit: int = Field(
+        default=50,
+        ge=1,
+        le=200,
+        description="Max number of messages to return per page.",
+    )
+    cursor: str | None = Field(
+        default=None,
+        description="Opaque cursor for pagination (older messages).",
+    )
+    view: Literal["full", "preview"] = Field(
+        default="full",
+        description="full = all fields; preview = type, content (truncated), id only.",
+    )
 class ChatHistory(BaseModel):
+    """Legacy response: messages only (no cursors)."""
     messages: list[ChatMessage]
+class ChatHistoryResponse(BaseModel):
+    """Paginated chat history with cursors."""
+    messages: list[ChatMessage] | list[ChatMessagePreview] = Field(
+        default_factory=list,
+        description="Messages in this page (full or preview by view).",
+    )
+    next_cursor: str | None = Field(
+        default=None,
+        description="Cursor for next page (older messages).",
+    )
+    prev_cursor: str | None = Field(
+        default=None,
+        description="Cursor for previous page (newer messages).",
+    )
 class ThreadSummary(BaseModel):
     """Summary of a conversation thread for listing."""

src/service/agent_service.py CHANGED Viewed

@@ -4,12 +4,17 @@ import json
 import logging
 import re
 from collections.abc import AsyncGenerator
-from datetime import datetime, timezone
 from typing import Any
 from uuid import UUID
 from fastapi import HTTPException
-from langchain_core.messages import AIMessage, AIMessageChunk, AnyMessage, HumanMessage, ToolMessage
 from langchain_core.runnables import RunnableConfig
 from langfuse.langchain import CallbackHandler
 from langgraph.types import Command, Interrupt
@@ -21,8 +26,6 @@ from agents.middlewares import FollowUpMiddleware, SafetyMiddleware, UNSAFE_RESP
 from agents.llama_guard import SafetyAssessment
 from core import settings
 from schema import (
-    ChatHistory,
-    ChatHistoryInput,
     ChatMessage,
     Feedback,
     FeedbackResponse,
@@ -31,6 +34,7 @@ from schema import (
     ThreadSummary,
     UserInput,
 )
 from service.utils import (
     convert_message_content_to_string,
     langchain_to_chat_message,
@@ -121,6 +125,9 @@ async def invoke_agent(user_input: UserInput, agent_id: str = DEFAULT_AGENT) ->
     agent: AgentGraph = get_agent(agent_id)
     kwargs, run_id = await _handle_input(user_input, agent, agent_id)
     # ── Input safety guard ──────────────────────────────────────────
     input_check = await safety.check_input([HumanMessage(content=user_input.message)])
     if input_check.safety_assessment == SafetyAssessment.UNSAFE:
@@ -173,6 +180,9 @@ async def message_generator(
     agent: AgentGraph = get_agent(agent_id)
     kwargs, run_id = await _handle_input(user_input, agent, agent_id)
     # ── Input safety guard ──────────────────────────────────────────────
     input_check = await safety.check_input([HumanMessage(content=user_input.message)])
     if input_check.safety_assessment == SafetyAssessment.UNSAFE:
@@ -304,48 +314,6 @@ def _checkpoint_thread_id(user_id: str, thread_id: str) -> str:
     return f"{user_id}:{thread_id}"
-async def get_history(input: ChatHistoryInput) -> ChatHistory:
-    if not (input.user_id or "").strip():
-        raise HTTPException(
-            status_code=422,
-            detail="user_id is required and must be non-empty",
-        )
-    # TODO: Hard-coding DEFAULT_AGENT here is wonky
-    agent: AgentGraph = get_agent(DEFAULT_AGENT)
-    checkpoint_thread_id = _checkpoint_thread_id(input.user_id, input.thread_id)
-    try:
-        state_snapshot = await agent.aget_state(
-            config=RunnableConfig(
-                configurable={
-                    "thread_id": checkpoint_thread_id,
-                    "user_id": input.user_id,
-                },
-                metadata={
-                    "thread_id": input.thread_id,
-                    "user_id": input.user_id,
-                    "agent_id": DEFAULT_AGENT,
-                },
-            )
-        )
-        messages: list[AnyMessage] = state_snapshot.values["messages"]
-        chat_messages: list[ChatMessage] = [langchain_to_chat_message(m) for m in messages]
-        return ChatHistory(messages=chat_messages)
-    except Exception as e:
-        logger.error(f"An exception occurred: {e}")
-        raise HTTPException(status_code=500, detail="Unexpected error")
-def _iso_ts(dt: datetime | str | None) -> str | None:
-    """Format datetime as ISO 8601 or return None."""
-    if dt is None:
-        return None
-    if isinstance(dt, str):
-        return dt
-    if dt.tzinfo is None:
-        dt = dt.replace(tzinfo=timezone.utc)
-    return dt.isoformat()
 async def list_threads(
     user_id: str,
     *,
@@ -354,9 +322,8 @@ async def list_threads(
     search: str | None = None,
 ) -> ThreadList:
     """
-    List thread IDs for a user by querying the checkpointer storage with prefix user_id:.
-    Returns logical thread_ids (without the user prefix), with updated_at when available.
-    Supports pagination (offset/limit) and optional search filter.
     """
     if not (user_id or "").strip():
         return ThreadList(threads=[], total=0)
@@ -365,198 +332,68 @@ async def list_threads(
     if checkpointer is None:
         return ThreadList(threads=[], total=0)
     prefix = f"{user_id}:"
-    # List of (logical_id, updated_at_iso | None)
-    rows: list[tuple[str, str | None]] = []
     try:
-        # MongoDB: aggregation for thread_id + max timestamp
         if hasattr(checkpointer, "checkpoint_collection"):
-            coll = checkpointer.checkpoint_collection
-            try:
-                pipeline = [
-                    {"$match": {"thread_id": {"$regex": f"^{re.escape(user_id)}:"}}},
-                    {
-                        "$group": {
-                            "_id": "$thread_id",
-                            "ts": {"$max": {"$ifNull": ["$ts", "$updated_at", "$created_at"]}},
-                        }
-                    },
-                ]
-                async for doc in coll.aggregate(pipeline):
-                    tid = doc.get("_id")
-                    if not isinstance(tid, str) or not tid.startswith(prefix):
-                        continue
-                    logical = tid[len(prefix) :]
-                    ts = doc.get("ts")
-                    rows.append((logical, _iso_ts(ts) if ts else None))
-            except Exception as mongo_err:
-                logger.debug(
-                    "MongoDB aggregation failed, listing by thread_id only: %s",
-                    mongo_err,
-                )
-                raw_ids = await coll.distinct(
-                    "thread_id",
-                    {"thread_id": {"$regex": f"^{re.escape(user_id)}:"}},
-                )
-                for tid in raw_ids:
-                    if isinstance(tid, str) and tid.startswith(prefix):
-                        rows.append((tid[len(prefix) :], None))
-        # Postgres: GROUP BY thread_id, MAX(ts)
         elif hasattr(checkpointer, "pool") or hasattr(checkpointer, "conn"):
-            # AsyncPostgresSaver often uses .pool, but can have .conn
             pool = getattr(checkpointer, "pool", getattr(checkpointer, "conn", None))
-            # If it's a pool, we need an async connection
-            if hasattr(pool, "connection"):
-                conn_ctx = pool.connection()
-            else:
-                conn_ctx = pool # Assume it's already a connection or manages its own
             async with conn_ctx as conn:
                 async with conn.cursor() as cur:
-                    try:
-                        # Try various common timestamp column names
-                        await cur.execute(
-                            """
-                            SELECT thread_id, MAX(COALESCE(ts, updated_at, created_at)) AS ts
-                            FROM checkpoints
-                            WHERE thread_id LIKE %s
-                            GROUP BY thread_id
-                            """,
-                            (prefix + "%",),
-                        )
-                    except Exception as pg_err:
-                        logger.debug(
-                            "Postgres MAX(ts) failed, trying metadata: %s",
-                            pg_err,
-                        )
-                        try:
-                            await cur.execute(
-                                """
-                                SELECT thread_id, MAX(created_at) AS ts
-                                FROM checkpoint_metadata
-                                WHERE thread_id LIKE %s
-                                GROUP BY thread_id
-                                """,
-                                (prefix + "%",),
-                            )
-                        except Exception:
-                            await cur.execute(
-                                "SELECT DISTINCT thread_id FROM checkpoints WHERE thread_id LIKE %s",
-                                (prefix + "%",),
-                            )
-                            for row in await cur.fetchall():
-                                raw = (
-                                    row.get("thread_id")
-                                    if isinstance(row, dict)
-                                    else (row[0] if row else None)
-                                )
-                                if isinstance(raw, str) and raw.startswith(prefix):
-                                    rows.append((raw[len(prefix) :], None))
-                        else:
-                            for row in await cur.fetchall():
-                                raw = row.get("thread_id") if isinstance(row, dict) else row[0]
-                                ts_val = row.get("ts") if isinstance(row, dict) else row[1]
-                                if isinstance(raw, str) and raw.startswith(prefix):
-                                    rows.append((raw[len(prefix) :], _iso_ts(ts_val)))
-                    else:
-                        for row in await cur.fetchall():
-                            raw = (
-                                row.get("thread_id")
-                                if isinstance(row, dict)
-                                else (row[0] if row else None)
-                            )
-                            ts_val = (
-                                row.get("ts")
-                                if isinstance(row, dict)
-                                else (
-                                    row[1]
-                                    if isinstance(row, (list, tuple)) and len(row) > 1
-                                    else None
-                                )
-                            )
-                            if isinstance(raw, str) and raw.startswith(prefix):
-                                rows.append((raw[len(prefix) :], _iso_ts(ts_val)))
-        # SQLite: GROUP BY thread_id, MAX(ts)
         elif hasattr(checkpointer, "conn"):
-            conn = checkpointer.conn
-            try:
-                cursor = await conn.execute(
-                    """
-                    SELECT thread_id, MAX(COALESCE(ts, updated_at, created_at)) AS ts
-                    FROM checkpoints
-                    WHERE thread_id LIKE ?
-                    GROUP BY thread_id
-                    """,
-                    (prefix + "%",),
-                )
-                for row in await cursor.fetchall():
-                    raw = row[0] if isinstance(row, (list, tuple)) else row
-                    ts_val = row[1] if isinstance(row, (list, tuple)) and len(row) > 1 else None
-                    if isinstance(raw, str) and raw.startswith(prefix):
-                        rows.append((raw[len(prefix) :], _iso_ts(ts_val)))
-            except Exception as sqlite_err:
-                logger.debug("SQLite MAX(ts) failed, listing by thread_id only: %s", sqlite_err)
-                cursor = await conn.execute(
-                    "SELECT DISTINCT thread_id FROM checkpoints WHERE thread_id LIKE ?",
-                    (prefix + "%",),
-                )
-                for row in await cursor.fetchall():
-                    raw = row[0] if isinstance(row, (list, tuple)) else row
-                    if isinstance(raw, str) and raw.startswith(prefix):
-                        rows.append((raw[len(prefix) :], None))
         else:
-            logger.warning("Unknown checkpointer type; cannot list threads by prefix")
     except Exception as e:
-        logger.error(f"Error listing threads for user: {e}")
         raise HTTPException(status_code=500, detail="Failed to list threads") from e
-    # Sort by updated_at desc (None last), then by thread_id
-    def _sort_key(item: tuple[str, str | None]) -> tuple[bool, str, str]:
-        logical, ts = item
-        # desc sort: flip characters in timestamp if it exists
-        return (ts is None, (ts or "")[::-1], logical)
-    rows.sort(key=_sort_key)
-    # Filter by search
     search_clean = (search or "").strip().lower()
     if search_clean:
-        rows = [(logical, ts) for logical, ts in rows if search_clean in logical.lower()]
-    total = len(rows)
-    rows = rows[offset : offset + limit]
-    # Fetch additional details (preview, precise timestamp) for the requested page in parallel
-    async def get_thread_summary(logical_id: str, existing_ts: str | None) -> ThreadSummary:
         try:
             config = RunnableConfig(
-                configurable={
-                    "thread_id": f"{user_id}:{logical_id}",
-                    "user_id": user_id,
-                }
             )
             state = await agent.aget_state(config)
-            preview = None
-            ts = existing_ts
-            if state.values and "messages" in state.values:
-                msgs = state.values["messages"]
-                if msgs:
-                    # Get the last non-custom message for better preview
-                    last_msg = msgs[-1]
-                    preview = convert_message_content_to_string(last_msg.content)
-                    if preview and len(preview) > 120:
-                        preview = preview[:117] + "..."
-            # If we don't have a timestamp, try to get it from state metadata
-            if not ts and state.metadata:
-                m_ts = state.metadata.get("ts") or state.metadata.get("created_at")
-                if m_ts:
-                    ts = _iso_ts(m_ts)
-            return ThreadSummary(thread_id=logical_id, updated_at=ts, preview=preview)
         except Exception as e:
-            logger.warning(f"Failed to fetch state for thread {logical_id}: {e}")
-            return ThreadSummary(thread_id=logical_id, updated_at=existing_ts, preview=None)
-    summaries = await asyncio.gather(*[get_thread_summary(logical, ts) for logical, ts in rows])
-    return ThreadList(threads=list(summaries), total=total)

 import logging
 import re
 from collections.abc import AsyncGenerator
 from typing import Any
 from uuid import UUID
 from fastapi import HTTPException
+from langchain_core.messages import (
+    AIMessage,
+    AIMessageChunk,
+    AnyMessage,
+    HumanMessage,
+    ToolMessage,
+)
 from langchain_core.runnables import RunnableConfig
 from langfuse.langchain import CallbackHandler
 from langgraph.types import Command, Interrupt
 from agents.llama_guard import SafetyAssessment
 from core import settings
 from schema import (
     ChatMessage,
     Feedback,
     FeedbackResponse,
     ThreadSummary,
     UserInput,
 )
+from service import history_service
 from service.utils import (
     convert_message_content_to_string,
     langchain_to_chat_message,
     agent: AgentGraph = get_agent(agent_id)
     kwargs, run_id = await _handle_input(user_input, agent, agent_id)
+    if user_input.user_id and user_input.thread_id:
+        history_service.invalidate_history(user_input.user_id, user_input.thread_id)
     # ── Input safety guard ──────────────────────────────────────────
     input_check = await safety.check_input([HumanMessage(content=user_input.message)])
     if input_check.safety_assessment == SafetyAssessment.UNSAFE:
     agent: AgentGraph = get_agent(agent_id)
     kwargs, run_id = await _handle_input(user_input, agent, agent_id)
+    if user_input.user_id and user_input.thread_id:
+        history_service.invalidate_history(user_input.user_id, user_input.thread_id)
     # ── Input safety guard ──────────────────────────────────────────────
     input_check = await safety.check_input([HumanMessage(content=user_input.message)])
     if input_check.safety_assessment == SafetyAssessment.UNSAFE:
     return f"{user_id}:{thread_id}"
 async def list_threads(
     user_id: str,
     *,
     search: str | None = None,
 ) -> ThreadList:
     """
+    List thread IDs for a user. Returns logical thread_ids (without user prefix) with preview.
+    Supports pagination and optional search filter.
     """
     if not (user_id or "").strip():
         return ThreadList(threads=[], total=0)
     if checkpointer is None:
         return ThreadList(threads=[], total=0)
     prefix = f"{user_id}:"
+    logical_ids: list[str] = []
     try:
         if hasattr(checkpointer, "checkpoint_collection"):
+            raw_ids = await checkpointer.checkpoint_collection.distinct(
+                "thread_id",
+                {"thread_id": {"$regex": f"^{re.escape(user_id)}:"}},
+            )
+            for tid in raw_ids:
+                if isinstance(tid, str) and tid.startswith(prefix):
+                    logical_ids.append(tid[len(prefix) :])
         elif hasattr(checkpointer, "pool") or hasattr(checkpointer, "conn"):
             pool = getattr(checkpointer, "pool", getattr(checkpointer, "conn", None))
+            conn_ctx = pool.connection() if hasattr(pool, "connection") else pool
             async with conn_ctx as conn:
                 async with conn.cursor() as cur:
+                    await cur.execute(
+                        "SELECT DISTINCT thread_id FROM checkpoints WHERE thread_id LIKE %s",
+                        (prefix + "%",),
+                    )
+                    for row in await cur.fetchall():
+                        raw = row.get("thread_id") if isinstance(row, dict) else (row[0] if row else None)
+                        if isinstance(raw, str) and raw.startswith(prefix):
+                            logical_ids.append(raw[len(prefix) :])
         elif hasattr(checkpointer, "conn"):
+            cursor = await checkpointer.conn.execute(
+                "SELECT DISTINCT thread_id FROM checkpoints WHERE thread_id LIKE ?",
+                (prefix + "%",),
+            )
+            for row in await cursor.fetchall():
+                raw = row[0] if isinstance(row, (list, tuple)) else row
+                if isinstance(raw, str) and raw.startswith(prefix):
+                    logical_ids.append(raw[len(prefix) :])
         else:
+            logger.warning("Unknown checkpointer type; cannot list threads")
     except Exception as e:
+        logger.error("Error listing threads for user: %s", e)
         raise HTTPException(status_code=500, detail="Failed to list threads") from e
+    logical_ids.sort(reverse=True)
     search_clean = (search or "").strip().lower()
     if search_clean:
+        logical_ids = [tid for tid in logical_ids if search_clean in tid.lower()]
+    total = len(logical_ids)
+    page = logical_ids[offset : offset + limit]
+    async def get_preview(logical_id: str) -> ThreadSummary:
+        preview = None
         try:
             config = RunnableConfig(
+                configurable={"thread_id": f"{user_id}:{logical_id}", "user_id": user_id},
             )
             state = await agent.aget_state(config)
+            if state.values and "messages" in state.values and state.values["messages"]:
+                last_msg = state.values["messages"][-1]
+                preview = convert_message_content_to_string(last_msg.content)
+                if preview and len(preview) > 120:
+                    preview = preview[:117] + "..."
         except Exception as e:
+            logger.debug("Preview for thread %s: %s", logical_id, e)
+        return ThreadSummary(thread_id=logical_id, updated_at=None, preview=preview)
+    threads = await asyncio.gather(*[get_preview(tid) for tid in page])
+    return ThreadList(threads=list(threads), total=total)

src/service/dependencies.py CHANGED Viewed

@@ -1,10 +1,29 @@
-from typing import Annotated
-from fastapi import Depends, HTTPException, status
 from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
 from core import settings
 def verify_bearer(
     http_auth: Annotated[

+import logging
+from typing import Annotated, Any
+from fastapi import Depends, HTTPException, Request, status
 from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
 from core import settings
+logger = logging.getLogger(__name__)
+def get_checkpointer(request: Request) -> Any:
+    """Provide checkpointer from app state, or fall back to default agent's checkpointer."""
+    checkpointer = getattr(request.app.state, "checkpointer", None)
+    if checkpointer is not None:
+        return checkpointer
+    try:
+        from agents import DEFAULT_AGENT, get_agent
+        agent = get_agent(DEFAULT_AGENT)
+        checkpointer = getattr(agent, "checkpointer", None)
+        if checkpointer is not None:
+            return checkpointer
+    except Exception as e:
+        logger.debug("Fallback checkpointer from agent failed: %s", e)
+    return None
 def verify_bearer(
     http_auth: Annotated[

src/service/history_service.py ADDED Viewed

	@@ -0,0 +1,164 @@

+"""
+Chat history service: load messages for a thread from the checkpointer only.
+Single responsibility; no agent dependency. Supports cursor pagination,
+selective view (preview/full), and TTL cache.
+"""
+import logging
+import threading
+import time
+from typing import Any
+from fastapi import HTTPException
+from langchain_core.messages import AnyMessage, messages_from_dict
+from langchain_core.runnables import RunnableConfig
+from schema import (
+    ChatHistoryInput,
+    ChatHistoryResponse,
+    ChatMessage,
+    ChatMessagePreview,
+)
+from service.utils import langchain_to_chat_message, message_to_preview
+logger = logging.getLogger(__name__)
+# TTL in seconds for cached message lists
+HISTORY_CACHE_TTL = 60
+# In-memory cache: (user_id, thread_id) -> (list[AnyMessage], expiry_ts)
+_history_cache: dict[tuple[str, str], tuple[list[AnyMessage], float]] = {}
+_cache_lock = threading.Lock()
+def _checkpoint_thread_id(user_id: str, thread_id: str) -> str:
+    """Composite key so checkpoints are user-scoped."""
+    return f"{user_id}:{thread_id}"
+def _raw_messages_from_checkpoint(tuple_result: Any) -> list[AnyMessage]:
+    """
+    Extract and deserialize messages from a checkpoint tuple.
+    Single path: handles both message objects and serialized dicts.
+    Checkpoint is dict-like; use .get() for channel_values.
+    """
+    if not tuple_result or not tuple_result.checkpoint:
+        return []
+    checkpoint = tuple_result.checkpoint
+    # Checkpoint is a dict subclass in LangGraph
+    channel_values = checkpoint.get("channel_values", {}) if isinstance(checkpoint, dict) else getattr(checkpoint, "channel_values", {})
+    if not channel_values:
+        return []
+    raw = channel_values.get("messages", []) if isinstance(channel_values, dict) else getattr(channel_values, "get", lambda k, default=None: default)("messages") or []
+    if not raw:
+        return []
+    raw = list(raw)
+    if not raw:
+        return []
+    if isinstance(raw[0], dict):
+        return list(messages_from_dict(raw))
+    return list(raw)
+def _get_cached_messages(user_id: str, thread_id: str) -> list[AnyMessage] | None:
+    """Return cached message list if present and not expired."""
+    key = (user_id.strip(), thread_id.strip())
+    with _cache_lock:
+        entry = _history_cache.get(key)
+        if not entry:
+            return None
+        messages, expiry = entry
+        if time.monotonic() > expiry:
+            del _history_cache[key]
+            return None
+        return messages
+def _set_cached_messages(user_id: str, thread_id: str, messages: list[AnyMessage]) -> None:
+    """Store message list in cache with TTL."""
+    key = (user_id.strip(), thread_id.strip())
+    expiry = time.monotonic() + HISTORY_CACHE_TTL
+    with _cache_lock:
+        _history_cache[key] = (messages, expiry)
+def invalidate_history(user_id: str, thread_id: str) -> None:
+    """Invalidate cache for this thread (call after writing to the thread)."""
+    key = (user_id or "").strip(), (thread_id or "").strip()
+    with _cache_lock:
+        _history_cache.pop(key, None)
+async def get_history(checkpointer: Any, input: ChatHistoryInput) -> ChatHistoryResponse:
+    """
+    Load chat history for (user_id, thread_id) with optional pagination and view.
+    Depends only on checkpointer; no agent.
+    """
+    user_id = (input.user_id or "").strip()
+    thread_id = (input.thread_id or "").strip()
+    if not user_id:
+        raise HTTPException(
+            status_code=422,
+            detail="user_id is required and must be non-empty",
+        )
+    if checkpointer is None:
+        logger.warning("History: no checkpointer available (app.state or agent)")
+        return ChatHistoryResponse(messages=[], next_cursor=None, prev_cursor=None)
+    # Try cache first
+    messages = _get_cached_messages(user_id, thread_id)
+    if messages is None:
+        checkpoint_thread_id = _checkpoint_thread_id(user_id, thread_id)
+        config = RunnableConfig(
+            configurable={"thread_id": checkpoint_thread_id, "user_id": user_id},
+        )
+        try:
+            tuple_result = await checkpointer.aget_tuple(config)
+            messages = _raw_messages_from_checkpoint(tuple_result)
+            if not messages:
+                logger.debug(
+                    "History: no messages for thread_id=%s (checkpoint missing or empty)",
+                    checkpoint_thread_id,
+                )
+            _set_cached_messages(user_id, thread_id, messages)
+        except Exception as e:
+            logger.error("Chat history error: %s", e)
+            raise HTTPException(
+                status_code=500,
+                detail="Failed to load chat history",
+            ) from e
+    total = len(messages)
+    if total == 0:
+        return ChatHistoryResponse(messages=[], next_cursor=None, prev_cursor=None)
+    # Cursor pagination: cursor = exclusive end index of the window (older messages)
+    # First request: no cursor -> return latest [total-limit : total], next_cursor = total - limit
+    if input.cursor is None or input.cursor == "":
+        end_index = total
+    else:
+        try:
+            end_index = int(input.cursor)
+        except ValueError:
+            end_index = total
+    end_index = min(end_index, total)
+    start_index = max(0, end_index - input.limit)
+    window = messages[start_index:end_index]
+    next_cursor = str(start_index) if start_index > 0 else None
+    prev_cursor = str(end_index) if end_index < total else None
+    if input.view == "preview":
+        out_messages: list[ChatMessage] | list[ChatMessagePreview] = [
+            message_to_preview(m, start_index + i) for i, m in enumerate(window)
+        ]
+    else:
+        out_messages = [langchain_to_chat_message(m) for m in window]
+    # Return chronological (oldest first); UI scrolls to bottom so latest is visible
+    return ChatHistoryResponse(
+        messages=out_messages,
+        next_cursor=next_cursor,
+        prev_cursor=prev_cursor,
+    )

src/service/router.py CHANGED Viewed

@@ -1,13 +1,13 @@
-from typing import Any
-from fastapi import APIRouter, Depends, status
 from fastapi.responses import StreamingResponse
 from agents import DEFAULT_AGENT, get_all_agent_info
 from core import settings
 from schema import (
-    ChatHistory,
     ChatHistoryInput,
     ChatMessage,
     Feedback,
     FeedbackResponse,
@@ -17,8 +17,8 @@ from schema import (
     ThreadListInput,
     UserInput,
 )
-from service import agent_service
-from service.dependencies import verify_bearer
 router = APIRouter(dependencies=[Depends(verify_bearer)])
@@ -106,13 +106,39 @@ async def feedback(feedback: Feedback) -> FeedbackResponse:
     return await agent_service.submit_feedback(feedback)
-@router.post("/history")
-async def history(input: ChatHistoryInput) -> ChatHistory:
     """
-    Get chat history for a thread. Requires user_id and thread_id.
-    Returns only messages for the given user's thread.
     """
-    return await agent_service.get_history(input)
 @router.post("/history/threads")

+from typing import Any, Literal
+from fastapi import APIRouter, Depends, Query, status
 from fastapi.responses import StreamingResponse
 from agents import DEFAULT_AGENT, get_all_agent_info
 from core import settings
 from schema import (
     ChatHistoryInput,
+    ChatHistoryResponse,
     ChatMessage,
     Feedback,
     FeedbackResponse,
     ThreadListInput,
     UserInput,
 )
+from service import agent_service, history_service
+from service.dependencies import get_checkpointer, verify_bearer
 router = APIRouter(dependencies=[Depends(verify_bearer)])
     return await agent_service.submit_feedback(feedback)
+@router.get("/history", response_model=ChatHistoryResponse)
+async def history_get(
+    user_id: str = Query(..., description="User ID to scope history."),
+    thread_id: str = Query(..., description="Thread ID for the conversation."),
+    limit: int = Query(50, ge=1, le=200, description="Max messages per page."),
+    cursor: str | None = Query(None, description="Pagination cursor (older messages)."),
+    view: Literal["full", "preview"] = Query("full", description="full or preview (minimal fields)."),
+    checkpointer: Any = Depends(get_checkpointer),
+) -> ChatHistoryResponse:
     """
+    Get chat history for a thread (GET). Prefer this for read-only loading.
+    Returns messages with optional next_cursor and prev_cursor for pagination.
     """
+    input = ChatHistoryInput(
+        user_id=user_id,
+        thread_id=thread_id,
+        limit=limit,
+        cursor=cursor,
+        view=view,
+    )
+    return await history_service.get_history(checkpointer, input)
+@router.post("/history", response_model=ChatHistoryResponse)
+async def history_post(
+    input: ChatHistoryInput,
+    checkpointer: Any = Depends(get_checkpointer),
+) -> ChatHistoryResponse:
+    """
+    Get chat history for a thread (POST). Same as GET; body instead of query.
+    Returns messages with optional next_cursor and prev_cursor for pagination.
+    """
+    return await history_service.get_history(checkpointer, input)
 @router.post("/history/threads")

src/service/service.py CHANGED Viewed

@@ -39,6 +39,9 @@ async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
             if hasattr(store, "setup"):  # ignore: union-attr
                 await store.setup()
             # Configure agents with both memory components and async loading
             agents = get_all_agent_info()
             for a in agents:

             if hasattr(store, "setup"):  # ignore: union-attr
                 await store.setup()
+            # Expose checkpointer for history service (no agent dependency)
+            app.state.checkpointer = saver
             # Configure agents with both memory components and async loading
             agents = get_all_agent_info()
             for a in agents:

src/service/utils.py CHANGED Viewed

@@ -1,4 +1,5 @@
 import json
 import toons
 from langchain_core.messages import (
@@ -11,7 +12,7 @@ from langchain_core.messages import (
     ChatMessage as LangchainChatMessage,
 )
-from schema import ChatMessage
 def convert_tool_response_to_toon(content: str) -> str:
@@ -81,6 +82,33 @@ def langchain_to_chat_message(message: BaseMessage) -> ChatMessage:
             raise ValueError(f"Unsupported message type: {message.__class__.__name__}")
 def remove_tool_calls(content: str | list[str | dict]) -> str | list[str | dict]:
     """Remove tool calls from content."""
     if isinstance(content, str):

 import json
+from typing import Literal
 import toons
 from langchain_core.messages import (
     ChatMessage as LangchainChatMessage,
 )
+from schema import ChatMessage, ChatMessagePreview
 def convert_tool_response_to_toon(content: str) -> str:
             raise ValueError(f"Unsupported message type: {message.__class__.__name__}")
+def message_to_preview(
+    message: BaseMessage,
+    index: int,
+    content_max: int = 200,
+) -> ChatMessagePreview:
+    """Build a minimal preview DTO from a LangChain message."""
+    content = convert_message_content_to_string(message.content)
+    if len(content) > content_max:
+        content = content[: content_max].rstrip() + "…"
+    msg_type: Literal["human", "ai", "tool", "custom"]
+    if isinstance(message, HumanMessage):
+        msg_type = "human"
+    elif isinstance(message, AIMessage):
+        msg_type = "ai"
+    elif isinstance(message, ToolMessage):
+        msg_type = "tool"
+    elif isinstance(message, LangchainChatMessage) and getattr(message, "role", None) == "custom":
+        msg_type = "custom"
+    else:
+        msg_type = "human"
+    return ChatMessagePreview(
+        type=msg_type,
+        content=content,
+        id=str(index),
+    )
 def remove_tool_calls(content: str | list[str | dict]) -> str | list[str | dict]:
     """Remove tool calls from content."""
     if isinstance(content, str):

uv.lock CHANGED Viewed

@@ -7,115 +7,6 @@ resolution-markers = [
     "python_full_version < '3.12'",
 ]
-[[package]]
-name = "chatbot"
-version = "0.1.0"
-source = { virtual = "." }
-dependencies = [
-    { name = "ddgs" },
-    { name = "duckduckgo-search" },
-    { name = "fastapi" },
-    { name = "grpcio" },
-    { name = "httpx" },
-    { name = "jiter" },
-    { name = "langchain" },
-    { name = "langchain-anthropic" },
-    { name = "langchain-aws" },
-    { name = "langchain-community" },
-    { name = "langchain-core" },
-    { name = "langchain-google-genai" },
-    { name = "langchain-google-vertexai" },
-    { name = "langchain-groq" },
-    { name = "langchain-mcp-adapters" },
-    { name = "langchain-ollama" },
-    { name = "langchain-openai" },
-    { name = "langchain-postgres" },
-    { name = "langfuse" },
-    { name = "langgraph" },
-    { name = "langgraph-checkpoint-mongodb" },
-    { name = "langgraph-checkpoint-postgres" },
-    { name = "langgraph-checkpoint-sqlite" },
-    { name = "langsmith" },
-    { name = "psycopg", extra = ["binary", "pool"] },
-    { name = "pydantic" },
-    { name = "pydantic-settings" },
-    { name = "python-dotenv" },
-    { name = "setuptools" },
-    { name = "tiktoken" },
-    { name = "toons" },
-    { name = "uvicorn" },
-]
-[package.dev-dependencies]
-client = [
-    { name = "httpx" },
-    { name = "pydantic" },
-    { name = "python-dotenv" },
-]
-dev = [
-    { name = "langgraph-cli", extra = ["inmem"] },
-    { name = "mypy" },
-    { name = "pre-commit" },
-    { name = "pytest" },
-    { name = "pytest-asyncio" },
-    { name = "pytest-cov" },
-    { name = "pytest-env" },
-    { name = "ruff" },
-]
-[package.metadata]
-requires-dist = [
-    { name = "ddgs", specifier = ">=9.9.1" },
-    { name = "duckduckgo-search", specifier = ">=7.3.0" },
-    { name = "fastapi", specifier = "~=0.115.5" },
-    { name = "grpcio", specifier = ">=1.68.0" },
-    { name = "httpx", specifier = "~=0.28.0" },
-    { name = "jiter", specifier = "~=0.8.2" },
-    { name = "langchain", specifier = "~=1.0.5" },
-    { name = "langchain-anthropic", specifier = "~=1.0.0" },
-    { name = "langchain-aws", specifier = "~=1.0.0" },
-    { name = "langchain-community", specifier = "~=0.4.1" },
-    { name = "langchain-core", specifier = "~=1.0.0" },
-    { name = "langchain-google-genai", specifier = "~=3.0.0" },
-    { name = "langchain-google-vertexai", specifier = ">=3.0.3" },
-    { name = "langchain-groq", specifier = "~=1.0.1" },
-    { name = "langchain-mcp-adapters", specifier = ">=0.1.10" },
-    { name = "langchain-ollama", specifier = "~=1.0.0" },
-    { name = "langchain-openai", specifier = "~=1.0.2" },
-    { name = "langchain-postgres", specifier = "~=0.0.9" },
-    { name = "langfuse", specifier = ">=2.65.0" },
-    { name = "langgraph", specifier = "~=1.0.0" },
-    { name = "langgraph-checkpoint-mongodb", specifier = "~=0.1.3" },
-    { name = "langgraph-checkpoint-postgres", specifier = "~=2.0.13" },
-    { name = "langgraph-checkpoint-sqlite", specifier = "~=2.0.1" },
-    { name = "langsmith", specifier = "~=0.4.0" },
-    { name = "psycopg", extras = ["binary", "pool"], specifier = "~=3.2.4" },
-    { name = "pydantic", specifier = "~=2.10.1" },
-    { name = "pydantic-settings", specifier = "~=2.12.0" },
-    { name = "python-dotenv", specifier = "~=1.0.1" },
-    { name = "setuptools", specifier = "~=75.6.0" },
-    { name = "tiktoken", specifier = ">=0.8.0" },
-    { name = "toons", specifier = ">=0.5.2" },
-    { name = "uvicorn", specifier = "~=0.32.1" },
-]
-[package.metadata.requires-dev]
-client = [
-    { name = "httpx", specifier = "~=0.28.0" },
-    { name = "pydantic", specifier = "~=2.10.1" },
-    { name = "python-dotenv", specifier = "~=1.0.1" },
-]
-dev = [
-    { name = "langgraph-cli", extras = ["inmem"] },
-    { name = "mypy" },
-    { name = "pre-commit" },
-    { name = "pytest" },
-    { name = "pytest-asyncio" },
-    { name = "pytest-cov" },
-    { name = "pytest-env" },
-    { name = "ruff" },
-]
 [[package]]
 name = "aiohappyeyeballs"
 version = "2.6.1"
@@ -576,6 +467,117 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/0a/4c/925909008ed5a988ccbb72dcc897407e5d6d3bd72410d69e051fc0c14647/charset_normalizer-3.4.4-py3-none-any.whl", hash = "sha256:7a32c560861a02ff789ad905a2fe94e3f840803362c84fecf1851cb4cf3dc37f", size = 53402, upload-time = "2025-10-14T04:42:31.76Z" },
 ]
 [[package]]
 name = "click"
 version = "8.3.0"
@@ -1741,6 +1743,20 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/30/88/e0b957b2d86defbfeb8181860c2bff3379ac16e918a155aed815a18190ed/langchain_mongodb-0.7.1-py3-none-any.whl", hash = "sha256:dda81023e499025b8c911103ab756d2e9cc40f953727fbbf72165bb85e684e16", size = 60724, upload-time = "2025-10-13T14:03:00.192Z" },
 ]
 [[package]]
 name = "langchain-ollama"
 version = "1.0.0"

     "python_full_version < '3.12'",
 ]
 [[package]]
 name = "aiohappyeyeballs"
 version = "2.6.1"
     { url = "https://files.pythonhosted.org/packages/0a/4c/925909008ed5a988ccbb72dcc897407e5d6d3bd72410d69e051fc0c14647/charset_normalizer-3.4.4-py3-none-any.whl", hash = "sha256:7a32c560861a02ff789ad905a2fe94e3f840803362c84fecf1851cb4cf3dc37f", size = 53402, upload-time = "2025-10-14T04:42:31.76Z" },
 ]
+[[package]]
+name = "chatbot"
+version = "0.1.0"
+source = { virtual = "." }
+dependencies = [
+    { name = "ddgs" },
+    { name = "duckduckgo-search" },
+    { name = "fastapi" },
+    { name = "grpcio" },
+    { name = "httpx" },
+    { name = "jiter" },
+    { name = "langchain" },
+    { name = "langchain-anthropic" },
+    { name = "langchain-aws" },
+    { name = "langchain-community" },
+    { name = "langchain-core" },
+    { name = "langchain-google-genai" },
+    { name = "langchain-google-vertexai" },
+    { name = "langchain-groq" },
+    { name = "langchain-mcp-adapters" },
+    { name = "langchain-nvidia-ai-endpoints" },
+    { name = "langchain-ollama" },
+    { name = "langchain-openai" },
+    { name = "langchain-postgres" },
+    { name = "langfuse" },
+    { name = "langgraph" },
+    { name = "langgraph-checkpoint-mongodb" },
+    { name = "langgraph-checkpoint-postgres" },
+    { name = "langgraph-checkpoint-sqlite" },
+    { name = "langsmith" },
+    { name = "psycopg", extra = ["binary", "pool"] },
+    { name = "pydantic" },
+    { name = "pydantic-settings" },
+    { name = "python-dotenv" },
+    { name = "setuptools" },
+    { name = "tiktoken" },
+    { name = "toons" },
+    { name = "uvicorn" },
+]
+[package.dev-dependencies]
+client = [
+    { name = "httpx" },
+    { name = "pydantic" },
+    { name = "python-dotenv" },
+]
+dev = [
+    { name = "langgraph-cli", extra = ["inmem"] },
+    { name = "mypy" },
+    { name = "pre-commit" },
+    { name = "pytest" },
+    { name = "pytest-asyncio" },
+    { name = "pytest-cov" },
+    { name = "pytest-env" },
+    { name = "ruff" },
+]
+[package.metadata]
+requires-dist = [
+    { name = "ddgs", specifier = ">=9.9.1" },
+    { name = "duckduckgo-search", specifier = ">=7.3.0" },
+    { name = "fastapi", specifier = "~=0.115.5" },
+    { name = "grpcio", specifier = ">=1.68.0" },
+    { name = "httpx", specifier = "~=0.28.0" },
+    { name = "jiter", specifier = "~=0.8.2" },
+    { name = "langchain", specifier = "~=1.0.5" },
+    { name = "langchain-anthropic", specifier = "~=1.0.0" },
+    { name = "langchain-aws", specifier = "~=1.0.0" },
+    { name = "langchain-community", specifier = "~=0.4.1" },
+    { name = "langchain-core", specifier = "~=1.0.0" },
+    { name = "langchain-google-genai", specifier = "~=3.0.0" },
+    { name = "langchain-google-vertexai", specifier = ">=3.0.3" },
+    { name = "langchain-groq", specifier = "~=1.0.1" },
+    { name = "langchain-mcp-adapters", specifier = ">=0.1.10" },
+    { name = "langchain-nvidia-ai-endpoints", specifier = ">=1.0.4" },
+    { name = "langchain-ollama", specifier = "~=1.0.0" },
+    { name = "langchain-openai", specifier = "~=1.0.2" },
+    { name = "langchain-postgres", specifier = "~=0.0.9" },
+    { name = "langfuse", specifier = ">=2.65.0" },
+    { name = "langgraph", specifier = "~=1.0.0" },
+    { name = "langgraph-checkpoint-mongodb", specifier = "~=0.1.3" },
+    { name = "langgraph-checkpoint-postgres", specifier = "~=2.0.13" },
+    { name = "langgraph-checkpoint-sqlite", specifier = "~=2.0.1" },
+    { name = "langsmith", specifier = "~=0.4.0" },
+    { name = "psycopg", extras = ["binary", "pool"], specifier = "~=3.2.4" },
+    { name = "pydantic", specifier = "~=2.10.1" },
+    { name = "pydantic-settings", specifier = "~=2.12.0" },
+    { name = "python-dotenv", specifier = "~=1.0.1" },
+    { name = "setuptools", specifier = "~=75.6.0" },
+    { name = "tiktoken", specifier = ">=0.8.0" },
+    { name = "toons", specifier = ">=0.5.2" },
+    { name = "uvicorn", specifier = "~=0.32.1" },
+]
+[package.metadata.requires-dev]
+client = [
+    { name = "httpx", specifier = "~=0.28.0" },
+    { name = "pydantic", specifier = "~=2.10.1" },
+    { name = "python-dotenv", specifier = "~=1.0.1" },
+]
+dev = [
+    { name = "langgraph-cli", extras = ["inmem"] },
+    { name = "mypy" },
+    { name = "pre-commit" },
+    { name = "pytest" },
+    { name = "pytest-asyncio" },
+    { name = "pytest-cov" },
+    { name = "pytest-env" },
+    { name = "ruff" },
+]
 [[package]]
 name = "click"
 version = "8.3.0"
     { url = "https://files.pythonhosted.org/packages/30/88/e0b957b2d86defbfeb8181860c2bff3379ac16e918a155aed815a18190ed/langchain_mongodb-0.7.1-py3-none-any.whl", hash = "sha256:dda81023e499025b8c911103ab756d2e9cc40f953727fbbf72165bb85e684e16", size = 60724, upload-time = "2025-10-13T14:03:00.192Z" },
 ]
+[[package]]
+name = "langchain-nvidia-ai-endpoints"
+version = "1.0.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "aiohttp" },
+    { name = "filetype" },
+    { name = "langchain-core" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/8d/2e/0b3e6ec5df7426e3ab19c8dfedd0b4a9e97461a6a536e02f6429618664ec/langchain_nvidia_ai_endpoints-1.0.4.tar.gz", hash = "sha256:831decd67e94f104bc2fecc596ef2953ea30e7adc1c3b99bd35861e018dd1fb2", size = 46600, upload-time = "2026-02-13T17:17:56.135Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c8/3e/a711094b31777ac4a7993507b8a3e0a45307cbab94425b5eba012a49c0cd/langchain_nvidia_ai_endpoints-1.0.4-py3-none-any.whl", hash = "sha256:49018362fca9c951488dffcf3e1372365778946e2a3b87ff7d769589e7b3c497", size = 50173, upload-time = "2026-02-13T17:17:54.759Z" },
+]
 [[package]]
 name = "langchain-ollama"
 version = "1.0.0"