anujjoshi3105 commited on
Commit
5e03012
·
1 Parent(s): 7e8c521

feat: nvidia llm

Browse files
.env.example CHANGED
@@ -6,6 +6,7 @@ ANTHROPIC_API_KEY=
6
  GOOGLE_API_KEY=
7
  GROQ_API_KEY=
8
  OPENROUTER_API_KEY=
 
9
  USE_AWS_BEDROCK=false
10
 
11
  #Vertex AI
 
6
  GOOGLE_API_KEY=
7
  GROQ_API_KEY=
8
  OPENROUTER_API_KEY=
9
+ NVIDIA_API_KEY=
10
  USE_AWS_BEDROCK=false
11
 
12
  #Vertex AI
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: AI Agent Service Toolkit
3
  emoji: 🧰
4
  colorFrom: blue
5
  colorTo: indigo
@@ -7,234 +7,157 @@ sdk: docker
7
  pinned: false
8
  ---
9
 
10
- # 🧰 AI Agent Service Toolkit
11
 
12
- [![build status](https://github.com/JoshuaC215/chatbot/actions/workflows/test.yml/badge.svg)](https://github.com/JoshuaC215/chatbot/actions/workflows/test.yml) [![codecov](https://codecov.io/github/JoshuaC215/chatbot/graph/badge.svg?token=5MTJSYWD05)](https://codecov.io/github/JoshuaC215/chatbot) [![Python Version](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2FJoshuaC215%2Fchatbot%2Frefs%2Fheads%2Fmain%2Fpyproject.toml)](https://github.com/JoshuaC215/chatbot/blob/main/pyproject.toml)
13
- [![GitHub License](https://img.shields.io/github/license/JoshuaC215/chatbot)](https://github.com/JoshuaC215/chatbot/blob/main/LICENSE) [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_red.svg)](https://chatbot.streamlit.app/)
14
 
15
- A full toolkit for running an AI agent service built with LangGraph, FastAPI and Streamlit.
16
 
17
- It includes a [LangGraph](https://langchain-ai.github.io/langgraph/) agent, a [FastAPI](https://fastapi.tiangolo.com/) service to serve it, a client to interact with the service, and a [Streamlit](https://streamlit.io/) app that uses the client to provide a chat interface. Data structures and settings are built with [Pydantic](https://github.com/pydantic/pydantic).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- This project offers a template for you to easily build and run your own agents using the LangGraph framework. It demonstrates a complete setup from agent definition to user interface, making it easier to get started with LangGraph-based projects by providing a full, robust toolkit.
20
 
21
- **[🎥 Watch a video walkthrough of the repo and app](https://www.youtube.com/watch?v=pdYVHw_YCNY)**
 
 
 
 
22
 
23
- ## Overview
24
 
25
- ### [Try the app!](https://chatbot.streamlit.app/)
26
-
27
- <a href="https://chatbot.streamlit.app/"><img src="media/app_screenshot.png" width="600"></a>
28
-
29
- ### Quickstart
30
-
31
- Run directly in python
32
-
33
- ```sh
34
- # At least one LLM API key is required
35
- echo 'OPENAI_API_KEY=your_openai_api_key' >> .env
36
-
37
- # uv is the recommended way to install chatbot, but "pip install ." also works
38
- # For uv installation options, see: https://docs.astral.sh/uv/getting-started/installation/
39
- curl -LsSf https://astral.sh/uv/0.7.19/install.sh | sh
40
-
41
- # Install dependencies. "uv sync" creates .venv automatically
42
- uv sync --frozen
43
- source .venv/bin/activate
44
- python src/run_service.py
45
-
46
- # In another shell
47
- source .venv/bin/activate
48
- streamlit run src/streamlit_app.py
49
  ```
50
-
51
- Run with docker
52
-
53
- ```sh
54
- echo 'OPENAI_API_KEY=your_openai_api_key' >> .env
55
- docker compose watch
 
 
 
 
 
 
 
 
 
 
56
  ```
57
 
58
- ### Architecture Diagram
59
-
60
- <img src="media/agent_architecture.png" width="600">
61
-
62
- ### Key Features
63
-
64
- 1. **LangGraph Agent and latest features**: A customizable agent built using the LangGraph framework. Implements the latest LangGraph v1.0 features including human in the loop with `interrupt()`, flow control with `Command`, long-term memory with `Store`, and `langgraph-supervisor`.
65
- 1. **FastAPI Service**: Serves the agent with both streaming and non-streaming endpoints.
66
- 1. **Advanced Streaming**: A novel approach to support both token-based and message-based streaming.
67
- 1. **Streamlit Interface**: Provides a user-friendly chat interface for interacting with the agent, including voice input and output.
68
- 1. **Multiple Agent Support**: Run multiple agents in the service and call by URL path. Available agents and models are described in `/info`
69
- 1. **Asynchronous Design**: Utilizes async/await for efficient handling of concurrent requests.
70
- 1. **Content Moderation**: Implements LlamaGuard for content moderation (requires Groq API key).
71
- 1. **RAG Agent**: A basic RAG agent implementation using ChromaDB - see [docs](docs/RAG_Assistant.md).
72
- 1. **Feedback Mechanism**: Includes a star-based feedback system integrated with LangSmith.
73
- 1. **Docker Support**: Includes Dockerfiles and a docker compose file for easy development and deployment.
74
- 1. **Testing**: Includes robust unit and integration tests for the full repo.
75
-
76
- ### Key Files
77
-
78
- The repository is structured as follows:
79
-
80
- - `src/agents/`: Defines several agents with different capabilities
81
- - `src/schema/`: Defines the protocol schema
82
- - `src/core/`: Core modules including LLM definition and settings
83
- - `src/service/service.py`: FastAPI service to serve the agents
84
- - `src/client/client.py`: Client to interact with the agent service
85
- - `src/streamlit_app.py`: Streamlit app providing a chat interface
86
- - `tests/`: Unit and integration tests
87
-
88
- ## Setup and Usage
89
-
90
- 1. Clone the repository:
91
-
92
- ```sh
93
- git clone https://github.com/JoshuaC215/chatbot.git
94
- cd chatbot
95
- ```
96
-
97
- 2. Set up environment variables:
98
- Create a `.env` file in the root directory. At least one LLM API key or configuration is required. See the [`.env.example` file](./.env.example) for a full list of available environment variables, including a variety of model provider API keys, header-based authentication, LangSmith tracing, testing and development modes, and OpenWeatherMap API key.
99
-
100
- 3. You can now run the agent service and the Streamlit app locally, either with Docker or just using Python. The Docker setup is recommended for simpler environment setup and immediate reloading of the services when you make changes to your code.
101
-
102
- ### Additional setup for specific AI providers
103
-
104
- - [Setting up Ollama](docs/Ollama.md)
105
- - [Setting up VertexAI](docs/VertexAI.md)
106
- - [Setting up RAG with ChromaDB](docs/RAG_Assistant.md)
107
-
108
- ### Building or customizing your own agent
109
 
110
- To customize the agent for your own use case:
111
 
112
- 1. Add your new agent to the `src/agents` directory. You can copy `research_assistant.py` or `chatbot.py` and modify it to change the agent's behavior and tools.
113
- 1. Import and add your new agent to the `agents` dictionary in `src/agents/agents.py`. Your agent can be called by `/<your_agent_name>/invoke` or `/<your_agent_name>/stream`.
114
- 1. Adjust the Streamlit interface in `src/streamlit_app.py` to match your agent's capabilities.
115
 
 
116
 
117
- ### Handling Private Credential files
 
 
 
 
118
 
119
- If your agents or chosen LLM require file-based credential files or certificates, the `privatecredentials/` has been provided for your development convenience. All contents, excluding the `.gitkeep` files, are ignored by git and docker's build process. See [Working with File-based Credentials](docs/File_Based_Credentials.md) for suggested use.
 
120
 
 
 
 
 
121
 
122
- ### Docker Setup
 
 
 
 
 
123
 
124
- This project includes a Docker setup for easy development and deployment. The `compose.yaml` file defines three services: `postgres`, `agent_service` and `streamlit_app`. The `Dockerfile` for each service is in their respective directories.
 
125
 
126
- For local development, we recommend using [docker compose watch](https://docs.docker.com/compose/file-watch/). This feature allows for a smoother development experience by automatically updating your containers when changes are detected in your source code.
 
 
127
 
128
- 1. Make sure you have Docker and Docker Compose (>= [v2.23.0](https://docs.docker.com/compose/release-notes/#2230)) installed on your system.
 
 
 
 
 
129
 
130
- 2. Create a `.env` file from the `.env.example`. At minimum, you need to provide an LLM API key (e.g., OPENAI_API_KEY).
131
- ```sh
132
- cp .env.example .env
133
- # Edit .env to add your API keys
134
- ```
135
 
136
- 3. Build and launch the services in watch mode:
137
 
138
- ```sh
139
- docker compose watch
140
- ```
141
-
142
- This will automatically:
143
- - Start a PostgreSQL database service that the agent service connects to
144
- - Start the agent service with FastAPI
145
- - Start the Streamlit app for the user interface
146
-
147
- 4. The services will now automatically update when you make changes to your code:
148
- - Changes in the relevant python files and directories will trigger updates for the relevant services.
149
- - NOTE: If you make changes to the `pyproject.toml` or `uv.lock` files, you will need to rebuild the services by running `docker compose up --build`.
150
-
151
- 5. Access the Streamlit app by navigating to `http://localhost:8501` in your web browser.
152
-
153
- 6. The agent service API will be available at `http://0.0.0.0:7860`. You can also use the OpenAPI docs at `http://0.0.0.0:7860/redoc`.
154
-
155
- 7. Use `docker compose down` to stop the services.
156
-
157
- This setup allows you to develop and test your changes in real-time without manually restarting the services.
158
-
159
- ### Building other apps on the AgentClient
160
-
161
- The repo includes a generic `src/client/client.AgentClient` that can be used to interact with the agent service. This client is designed to be flexible and can be used to build other apps on top of the agent. It supports both synchronous and asynchronous invocations, and streaming and non-streaming requests.
162
-
163
- See the `src/run_client.py` file for full examples of how to use the `AgentClient`. A quick example:
164
-
165
- ```python
166
- from client import AgentClient
167
- client = AgentClient()
168
-
169
- response = client.invoke("Tell me a brief joke?")
170
- response.pretty_print()
171
- # ================================== Ai Message ==================================
172
- #
173
- # A man walked into a library and asked the librarian, "Do you have any books on Pavlov's dogs and Schrödinger's cat?"
174
- # The librarian replied, "It rings a bell, but I'm not sure if it's here or not."
175
 
 
 
176
  ```
177
 
178
- ### Development with LangGraph Studio
 
179
 
180
- The agent supports [LangGraph Studio](https://langchain-ai.github.io/langgraph/concepts/langgraph_studio/), the IDE for developing agents in LangGraph.
181
 
182
- `langgraph-cli[inmem]` is installed with `uv sync`. You can simply add your `.env` file to the root directory as described above, and then launch LangGraph Studio with `langgraph dev`. Customize `langgraph.json` as needed. See the [local quickstart](https://langchain-ai.github.io/langgraph/cloud/how-tos/studio/quick_start/#local-development-server) to learn more.
 
 
 
183
 
184
- ### Local development without Docker
185
 
186
- You can also run the agent service and the Streamlit app locally without Docker, just using a Python virtual environment.
187
 
188
- 1. Create a virtual environment and install dependencies:
189
 
190
- ```sh
191
- uv sync --frozen
192
- source .venv/bin/activate
193
- ```
194
 
195
- 2. Run the FastAPI server:
 
 
 
196
 
197
- ```sh
198
- python src/run_service.py
199
- ```
200
 
201
- 3. In a separate terminal, run the Streamlit app:
202
-
203
- ```sh
204
- streamlit run src/streamlit_app.py
205
- ```
206
-
207
- 4. Open your browser and navigate to the URL provided by Streamlit (usually `http://localhost:8501`).
208
-
209
- ## Projects built with or inspired by chatbot
210
-
211
- The following are a few of the public projects that drew code or inspiration from this repo.
212
-
213
- - **[PolyRAG](https://github.com/QuentinFuxa/PolyRAG)** - Extends chatbot with RAG capabilities over both PostgreSQL databases and PDF documents.
214
- - **[alexrisch/agent-web-kit](https://github.com/alexrisch/agent-web-kit)** - A Next.JS frontend for chatbot
215
- - **[raushan-in/dapa](https://github.com/raushan-in/dapa)** - Digital Arrest Protection App (DAPA) enables users to report financial scams and frauds efficiently via a user-friendly platform.
216
-
217
- **Please create a pull request editing the README or open a discussion with any new ones to be added!** Would love to include more projects.
218
 
219
  ## Contributing
220
 
221
- Contributions are welcome! Please feel free to submit a Pull Request. Currently the tests need to be run using the local development without Docker setup. To run the tests for the agent service:
222
-
223
- 1. Ensure you're in the project root directory and have activated your virtual environment.
224
-
225
- 2. Install the development dependencies and pre-commit hooks:
226
-
227
- ```sh
228
- uv sync --frozen
229
- pre-commit install
230
- ```
231
-
232
- 3. Run the tests using pytest:
233
-
234
- ```sh
235
- pytest
236
- ```
237
 
238
  ## License
239
 
240
- This project is licensed under the MIT License - see the LICENSE file for details.
 
1
  ---
2
+ title: Portfolio Chatbot
3
  emoji: 🧰
4
  colorFrom: blue
5
  colorTo: indigo
 
7
  pinned: false
8
  ---
9
 
10
+ # Portfolio Chatbot Backend
11
 
12
+ A robust, production-grade AI agent service built with **LangGraph**, **FastAPI**, and **Python**. Designed to power an intelligent portfolio assistant, this backend orchestrates multiple specialized agents to answer questions about professional experience, analyze GitHub contributions, and track competitive programming statistics.
 
13
 
14
+ ## Features
15
 
16
+ * **Multi-Agent Architecture**: Orchestrates specialized agents for different domains.
17
+ * **Portfolio Agent**: An expert on Anuj Joshi's background, skills, projects, and work experience, powered by a curated knowledge base.
18
+ * **Open Source Agent**: Integrates with GitHub (via MCP) to analyze repositories, summarize contributions, and provide code insights.
19
+ * **Competitive Programming Agent**: Tracks real-time performance and statistics from platforms like LeetCode and Codeforces.
20
+ * **Advanced Memory System**:
21
+ * **Short-term Memory**: Manages conversation history using LangGraph checkpointers (Postgres, SQLite, or MongoDB).
22
+ * **Long-term Memory**: Persists cross-conversation knowledge using a durable store.
23
+ * **Model Agnostic**: Supports a wide range of LLM providers including **OpenAI**, **Anthropic**, **Google Gemini/Vertex AI**, **Groq**, **NVIDIA**, **DeepSeek**, **Azure OpenAI**, and **Ollama**.
24
+ * **Production Ready API**:
25
+ * RESTful endpoints built with **FastAPI**.
26
+ * Full streaming support (Server-Sent Events) for real-time responses.
27
+ * Comprehensive conversation history and thread management.
28
+ * Built-in feedback collection endpoints.
29
+ * **Observability & Tracing**: First-class integration with **LangSmith** and **LangFuse** for monitoring and debugging agent traces.
30
+ * **Dockerized**: extensive Docker support for easy deployment and scaling.
31
 
32
+ ## Tech Stack
33
 
34
+ * **Language**: Python 3.11+
35
+ * **Framework**: FastAPI, Uvicorn
36
+ * **AI orchestration**: LangChain, LangGraph
37
+ * **Database**: PostgreSQL (recommended for production), SQLite (dev), MongoDB
38
+ * **Package Manager**: uv (fast Python package installer)
39
 
40
+ ## Project Structure
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ```
43
+ backend/
44
+ ├── src/
45
+ │ ├── agents/ # Agent definitions and workflows
46
+ │ │ ├── agents.py # Agent registry and loading logic
47
+ │ │ ├── portfolio_agent.py
48
+ │ │ ├── open_source_agent.py
49
+ │ │ └── ...
50
+ │ ├── core/ # Core configurations and settings
51
+ │ ├── memory/ # Database and checkpoint initialization
52
+ │ ├── schema/ # Pydantic models and data schemas
53
+ │ ├── service/ # FastAPI application and routes
54
+ │ └── run_service.py # Application entry point
55
+ ├── .env.example # Environment variable template
56
+ ├── pyproject.toml # Dependencies and project metadata
57
+ ├── compose.yaml # Docker Compose configuration
58
+ └── Dockerfile # Docker build instructions
59
  ```
60
 
61
+ ## Getting Started
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
+ ### Prerequisites
64
 
65
+ * **Python 3.11+** or **Docker**
66
+ * **Git**
67
+ * API Keys for your preferred LLM provider (e.g., OpenAI, Anthropic, Groq).
68
 
69
+ ### Installation (Local)
70
 
71
+ 1. **Clone the repository:**
72
+ ```bash
73
+ git clone https://github.com/Anujjoshi3105/portfolio-chatbot-backend.git
74
+ cd portfolio-chatbot-backend
75
+ ```
76
 
77
+ 2. **Set up the environment:**
78
+ Create a virtual environment and install dependencies. We recommend using `uv` for speed, but `pip` works too.
79
 
80
+ ```bash
81
+ # Using uv (Recommended)
82
+ pip install uv
83
+ uv sync
84
 
85
+ # OR using standard pip
86
+ python -m venv .venv
87
+ source .venv/bin/activate # On Windows: .venv\Scripts\activate
88
+ pip install -r requirements.txt # generate with uv pip compile... or just install from pyproject.toml
89
+ pip install . # Install the project in editable mode
90
+ ```
91
 
92
+ 3. **Configure Environment Variables:**
93
+ Copy `.env.example` to `.env` and fill in your API keys.
94
 
95
+ ```bash
96
+ cp .env.example .env
97
+ ```
98
 
99
+ **Key Configuration Options:**
100
+ * `OPENAI_API_KEY`, `GROQ_API_KEY`, etc.: API keys for LLMs.
101
+ * `DEFAULT_MODEL`: The default model to use (e.g., `gpt-4o`, `llama-3.1-70b-versatile`).
102
+ * `DATABASE_TYPE`: `postgres` or `sqlite`.
103
+ * `GITHUB_PAT`: GitHub Personal Access Token (for Open Source Agent).
104
+ * `LANGSMITH_TRACING`: Set to `true` to enable LangSmith tracing.
105
 
106
+ ### Running the Service
 
 
 
 
107
 
108
+ Start the backend server:
109
 
110
+ ```bash
111
+ # Run using the python script
112
+ python src/run_service.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
 
114
+ # OR using uvicorn directly
115
+ uvicorn service:app --host 0.0.0.0 --port 7860 --reload
116
  ```
117
 
118
+ The API will be available at `http://localhost:7860`.
119
+ Access the interactive API docs (Swagger UI) at `http://localhost:7860/docs`.
120
 
121
+ ## Docker Deployment
122
 
123
+ 1. **Build and Run with Docker Compose:**
124
+ ```bash
125
+ docker compose up --build
126
+ ```
127
 
128
+ This will start the backend service along with a PostgreSQL database (if configured in `compose.yaml`).
129
 
130
+ ## API Endpoints
131
 
132
+ The service exposes several key endpoints for interacting with the agents:
133
 
134
+ ### 1. **Invoke Agent**
135
+ - **POST** `/invoke` or `/{agent_id}/invoke`
136
+ - Get a complete response from an agent.
137
+ - **Body:** `{ "message": "Tell me about your projects", "thread_id": "optional-uuid" }`
138
 
139
+ ### 2. **Stream Response**
140
+ - **POST** `/stream` or `/{agent_id}/stream`
141
+ - Stream the agent's reasoning and response token-by-token (SSE).
142
+ - **Body:** `{ "message": "...", "stream_tokens": true }`
143
 
144
+ ### 3. **Chat History**
145
+ - **POST** `/history`
146
+ - Retrieve past messages for a specific thread.
147
 
148
+ ### 4. **Service Info**
149
+ - **GET** `/info`
150
+ - Returns available agents, models, and configuration metadata.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
151
 
152
  ## Contributing
153
 
154
+ Contributions are welcome! Please perform the following steps:
155
+ 1. Fork the repository.
156
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`).
157
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`).
158
+ 4. Push to the branch (`git push origin feature/amazing-feature`).
159
+ 5. Open a Pull Request.
 
 
 
 
 
 
 
 
 
 
160
 
161
  ## License
162
 
163
+ This project is licensed under the MIT License - see the `LICENSE` file for details.
pyproject.toml CHANGED
@@ -48,6 +48,7 @@ dependencies = [
48
  "langchain-mcp-adapters>=0.1.10",
49
  "ddgs>=9.9.1",
50
  "toons>=0.5.2",
 
51
  ]
52
 
53
  [dependency-groups]
 
48
  "langchain-mcp-adapters>=0.1.10",
49
  "ddgs>=9.9.1",
50
  "toons>=0.5.2",
51
+ "langchain-nvidia-ai-endpoints>=1.0.4",
52
  ]
53
 
54
  [dependency-groups]
src/agents/agents.py CHANGED
@@ -4,8 +4,8 @@ from langgraph.graph.state import CompiledStateGraph
4
  from langgraph.pregel import Pregel
5
 
6
  from agents.portfolio_agent import portfolio_agent
7
- from agents.github_mcp_agent import github_mcp_agent
8
- from agents.cpstat_agent import cpstat_agent
9
  from agents.lazy_agent import LazyLoadingAgent
10
  from schema import AgentInfo
11
 
@@ -37,9 +37,9 @@ agents: dict[str, Agent] = {
37
  "How can I contact Anuj?",
38
  ],
39
  ),
40
- "github-mcp-agent": Agent(
41
- description="An agent equipped with GitHub MCP tools to explore repositories, view code, and track development activity.",
42
- graph_like=github_mcp_agent,
43
  prompts=[
44
  "List anujjoshi3105's top repositories",
45
  "Show recent activity on anujjoshi3105's GitHub",
@@ -47,9 +47,9 @@ agents: dict[str, Agent] = {
47
  "Show me anujjoshi3105's contributions in the last month",
48
  ],
49
  ),
50
- "cpstat-agent": Agent(
51
- description="An agent specializing in Competitive Programming, capable of fetching real-time contest ratings and stats from various platforms.",
52
- graph_like=cpstat_agent,
53
  prompts=[
54
  "Show Anuj's LeetCode rating and stats",
55
  "What is Anuj's Codeforces rank?",
 
4
  from langgraph.pregel import Pregel
5
 
6
  from agents.portfolio_agent import portfolio_agent
7
+ from agents.open_source_agent import open_source_agent
8
+ from agents.competitive_programming_agent import competitive_programming_agent
9
  from agents.lazy_agent import LazyLoadingAgent
10
  from schema import AgentInfo
11
 
 
37
  "How can I contact Anuj?",
38
  ],
39
  ),
40
+ "open-source-agent": Agent(
41
+ description="An intelligent assistant that integrates with GitHub to showcase Anuj's open-source contributions. It can analyze repositories, summarize recent activity, and provide insights into code quality and development impact.",
42
+ graph_like=open_source_agent,
43
  prompts=[
44
  "List anujjoshi3105's top repositories",
45
  "Show recent activity on anujjoshi3105's GitHub",
 
47
  "Show me anujjoshi3105's contributions in the last month",
48
  ],
49
  ),
50
+ "competitive-programming-agent": Agent(
51
+ description="A dedicated competitive programming analyst that tracks Anuj's performance across major platforms like LeetCode and Codeforces. It provides real-time ratings, contest history, and detailed problem-solving statistics.",
52
+ graph_like=competitive_programming_agent,
53
  prompts=[
54
  "Show Anuj's LeetCode rating and stats",
55
  "What is Anuj's Codeforces rank?",
src/agents/{cpstat_agent.py → competitive_programming_agent.py} RENAMED
@@ -6,10 +6,9 @@ from langchain_mcp_adapters.client import MultiServerMCPClient
6
  from langchain_mcp_adapters.sessions import StreamableHttpConnection
7
  from langgraph.graph.state import CompiledStateGraph
8
 
9
- from agents.utils import filter_mcp_tools
10
  from agents.lazy_agent import LazyLoadingAgent
11
- from agents.middlewares import ConfigurableModelMiddleware, SummarizationMiddleware
12
- from agents.prompts.cpstat import SYSTEM_PROMPT
13
  from core import get_model, settings
14
 
15
  logger = logging.getLogger(__name__)
@@ -54,7 +53,7 @@ ALLOWED_TOOLS = {
54
  }
55
 
56
 
57
- class CPStatAgent(LazyLoadingAgent):
58
  """CP Stat Agent with async initialization for contest and rating info."""
59
 
60
  def __init__(self) -> None:
@@ -104,17 +103,12 @@ class CPStatAgent(LazyLoadingAgent):
104
  tools=self._mcp_tools,
105
  middleware=[
106
  ConfigurableModelMiddleware(),
107
- SummarizationMiddleware(
108
- max_tokens_before_summary=1000,
109
- messages_to_keep=4,
110
- use_llm=False
111
- ),
112
  ],
113
- name="cpstat-agent",
114
  system_prompt=SYSTEM_PROMPT,
115
  debug=True,
116
  )
117
 
118
 
119
  # Create the agent instance
120
- cpstat_agent = CPStatAgent()
 
6
  from langchain_mcp_adapters.sessions import StreamableHttpConnection
7
  from langgraph.graph.state import CompiledStateGraph
8
 
 
9
  from agents.lazy_agent import LazyLoadingAgent
10
+ from agents.middlewares import ConfigurableModelMiddleware
11
+ from agents.prompts.competitive_programming import SYSTEM_PROMPT
12
  from core import get_model, settings
13
 
14
  logger = logging.getLogger(__name__)
 
53
  }
54
 
55
 
56
+ class CompetitiveProgrammingAgent(LazyLoadingAgent):
57
  """CP Stat Agent with async initialization for contest and rating info."""
58
 
59
  def __init__(self) -> None:
 
103
  tools=self._mcp_tools,
104
  middleware=[
105
  ConfigurableModelMiddleware(),
 
 
 
 
 
106
  ],
107
+ name="competitive-programming-agent",
108
  system_prompt=SYSTEM_PROMPT,
109
  debug=True,
110
  )
111
 
112
 
113
  # Create the agent instance
114
+ competitive_programming_agent = CompetitiveProgrammingAgent()
src/agents/llama_guard.py CHANGED
@@ -1,3 +1,4 @@
 
1
  from enum import Enum
2
 
3
  from langchain_core.messages import AIMessage, AnyMessage, HumanMessage
@@ -28,7 +29,7 @@ unsafe_content_categories = {
28
  "S4": "Child Exploitation.",
29
  "S5": "Defamation.",
30
  "S6": "Specialized Advice.",
31
- "S7": "Privacy.",
32
  "S8": "Intellectual Property.",
33
  "S9": "Indiscriminate Weapons.",
34
  "S10": "Hate.",
@@ -77,11 +78,11 @@ def parse_llama_guard_output(output: str) -> LlamaGuardOutput:
77
 
78
  class LlamaGuard:
79
  def __init__(self) -> None:
80
- if settings.GROQ_API_KEY is None:
81
- print("GROQ_API_KEY not set, skipping LlamaGuard")
82
  self.model = None
83
  return
84
- self.model = get_model(GroqModelName.LLAMA_GUARD_4_12B).with_config(tags=["skip_stream"])
85
  self.prompt = PromptTemplate.from_template(llama_guard_instructions)
86
 
87
  def _compile_prompt(self, role: str, messages: list[AnyMessage]) -> str:
 
1
+ from schema.models import NvidiaModelName
2
  from enum import Enum
3
 
4
  from langchain_core.messages import AIMessage, AnyMessage, HumanMessage
 
29
  "S4": "Child Exploitation.",
30
  "S5": "Defamation.",
31
  "S6": "Specialized Advice.",
32
+ # "S7": "Privacy.",
33
  "S8": "Intellectual Property.",
34
  "S9": "Indiscriminate Weapons.",
35
  "S10": "Hate.",
 
78
 
79
  class LlamaGuard:
80
  def __init__(self) -> None:
81
+ if settings.NVIDIA_API_KEY is None:
82
+ print("NVIDIA_API_KEY not set, skipping LlamaGuard")
83
  self.model = None
84
  return
85
+ self.model = get_model(NvidiaModelName.META_LLAMA_GUARD_4_12B).with_config(tags=["skip_stream"])
86
  self.prompt = PromptTemplate.from_template(llama_guard_instructions)
87
 
88
  def _compile_prompt(self, role: str, messages: list[AnyMessage]) -> str:
src/agents/middlewares/__init__.py CHANGED
@@ -3,12 +3,10 @@
3
  from agents.middlewares.configurable_model import ConfigurableModelMiddleware
4
  from agents.middlewares.followup import FollowUpMiddleware
5
  from agents.middlewares.safety import SafetyMiddleware, UNSAFE_RESPONSE
6
- from agents.middlewares.summarization import SummarizationMiddleware
7
 
8
  __all__ = [
9
  "ConfigurableModelMiddleware",
10
  "FollowUpMiddleware",
11
  "SafetyMiddleware",
12
  "UNSAFE_RESPONSE",
13
- "SummarizationMiddleware",
14
  ]
 
3
  from agents.middlewares.configurable_model import ConfigurableModelMiddleware
4
  from agents.middlewares.followup import FollowUpMiddleware
5
  from agents.middlewares.safety import SafetyMiddleware, UNSAFE_RESPONSE
 
6
 
7
  __all__ = [
8
  "ConfigurableModelMiddleware",
9
  "FollowUpMiddleware",
10
  "SafetyMiddleware",
11
  "UNSAFE_RESPONSE",
 
12
  ]
src/agents/middlewares/followup.py CHANGED
@@ -10,6 +10,7 @@ from langchain_core.messages import (
10
  AIMessage,
11
  ChatMessage,
12
  )
 
13
  from langchain_core.runnables import RunnableConfig
14
 
15
  from core import get_model, settings
@@ -23,12 +24,7 @@ logger = logging.getLogger(__name__)
23
 
24
  class FollowUpOutput(BaseModel):
25
  """Schema for follow-up generation response."""
26
-
27
- questions: List[str] = Field(
28
- min_length=1,
29
- max_length=5,
30
- description="List of 1-5 suggested follow-up questions for the user.",
31
- )
32
 
33
  @field_validator("questions")
34
  @classmethod
@@ -42,48 +38,25 @@ class FollowUpOutput(BaseModel):
42
  class FollowUpMiddleware:
43
  """Generates structured follow-up suggestions after an agent response."""
44
 
45
- async def generate(
46
- self,
47
- messages: List[AnyMessage],
48
- config: RunnableConfig,
49
- ) -> List[str]:
50
  try:
51
- model_name = config.get("configurable", {}).get(
52
- "model",
53
- settings.DEFAULT_MODEL,
54
- )
55
-
56
- base_model = get_model(model_name)
57
-
58
- # Force structured output
59
- model = base_model.with_structured_output(FollowUpOutput)
60
-
61
- # Clean messages
62
- cleaned_messages: List[AnyMessage] = [
63
- SystemMessage(content=FOLLOWUP_GENERATION_PROMPT),
64
- *[
65
- m
66
- for m in messages
67
- if not (isinstance(m, ChatMessage) and m.role == "custom")
68
- ],
69
- ]
70
-
71
- # Invoke model
72
- response: FollowUpOutput = await model.ainvoke(
73
- cleaned_messages,
74
- config,
75
- )
76
-
77
- if not response or not response.questions:
78
  raise ValueError("Empty follow-up response")
79
 
80
- return response.questions
81
 
82
  except Exception as e:
83
- logger.warning(
84
- "Follow-up generation failed, using defaults: %s",
85
- e,
86
- exc_info=True,
87
- )
88
 
89
  return DEFAULT_FOLLOWUP_PROMPTS
 
10
  AIMessage,
11
  ChatMessage,
12
  )
13
+ from langchain_core.output_parsers import PydanticOutputParser
14
  from langchain_core.runnables import RunnableConfig
15
 
16
  from core import get_model, settings
 
24
 
25
  class FollowUpOutput(BaseModel):
26
  """Schema for follow-up generation response."""
27
+ questions: List[str] = Field(min_length=1, max_length=5, description="List of 1-5 suggested follow-up questions for the user.",)
 
 
 
 
 
28
 
29
  @field_validator("questions")
30
  @classmethod
 
38
  class FollowUpMiddleware:
39
  """Generates structured follow-up suggestions after an agent response."""
40
 
41
+ async def generate(self, messages: List[AnyMessage], config: RunnableConfig) -> List[str]:
 
 
 
 
42
  try:
43
+ model_name = config.get("configurable", {}).get("model", settings.DEFAULT_MODEL)
44
+
45
+ model = get_model(model_name)
46
+ parser = PydanticOutputParser(pydantic_object=FollowUpOutput)
47
+
48
+ system_message = f"{FOLLOWUP_GENERATION_PROMPT}\n\n{parser.get_format_instructions()}"
49
+ messages = [SystemMessage(content=system_message), messages[-1]]
50
+
51
+ response = await model.ainvoke(messages, config)
52
+ content = parser.parse(response.content)
53
+
54
+ if not content or not content.questions:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
  raise ValueError("Empty follow-up response")
56
 
57
+ return content.questions
58
 
59
  except Exception as e:
60
+ logger.warning("Follow-up generation failed, using defaults: %s", e, exc_info=True)
 
 
 
 
61
 
62
  return DEFAULT_FOLLOWUP_PROMPTS
src/agents/middlewares/summarization.py DELETED
@@ -1,288 +0,0 @@
1
- """Summarization middleware."""
2
-
3
- import uuid
4
- from collections.abc import Callable, Iterable
5
- from typing import Any, cast
6
-
7
- from langchain_core.messages import (
8
- AIMessage,
9
- AnyMessage,
10
- MessageLikeRepresentation,
11
- RemoveMessage,
12
- ToolMessage,
13
- )
14
- from langchain_core.messages.human import HumanMessage
15
- from langchain_core.messages.utils import count_tokens_approximately, trim_messages
16
- from langgraph.graph.message import REMOVE_ALL_MESSAGES
17
- from langgraph.runtime import Runtime
18
- from langgraph.config import get_config
19
-
20
- from langchain.agents.middleware.types import AgentMiddleware, AgentState
21
- from langchain.chat_models import BaseChatModel, init_chat_model
22
- from core import get_model, settings
23
-
24
- TokenCounter = Callable[[Iterable[MessageLikeRepresentation]], int]
25
-
26
- DEFAULT_SUMMARY_PROMPT = """<role>
27
- Context Extraction Assistant
28
- </role>
29
-
30
- <primary_objective>
31
- Your sole objective in this task is to extract the highest quality/most relevant context from the conversation history below.
32
- </primary_objective>
33
-
34
- <objective_information>
35
- You're nearing the total number of input tokens you can accept, so you must extract the highest quality/most relevant pieces of information from your conversation history.
36
- This context will then overwrite the conversation history presented below. Because of this, ensure the context you extract is only the most important information to your overall goal.
37
- </objective_information>
38
-
39
- <instructions>
40
- The conversation history below will be replaced with the context you extract in this step. Because of this, you must do your very best to extract and record all of the most important context from the conversation history.
41
- You want to ensure that you don't repeat any actions you've already completed, so the context you extract from the conversation history should be focused on the most important information to your overall goal.
42
- </instructions>
43
-
44
- The user will message you with the full message history you'll be extracting context from, to then replace. Carefully read over it all, and think deeply about what information is most important to your overall goal that should be saved:
45
-
46
- With all of this in mind, please carefully read over the entire conversation history, and extract the most important and relevant context to replace it so that you can free up space in the conversation history.
47
- Respond ONLY with the extracted context. Do not include any additional information, or text before or after the extracted context.
48
-
49
- <messages>
50
- Messages to summarize:
51
- {messages}
52
- </messages>""" # noqa: E501
53
-
54
- SUMMARY_PREFIX = "## Previous conversation summary:"
55
-
56
- _DEFAULT_MESSAGES_TO_KEEP = 5
57
- _DEFAULT_TRIM_TOKEN_LIMIT = 1500
58
- _DEFAULT_FALLBACK_MESSAGE_COUNT = 8
59
- _SEARCH_RANGE_FOR_TOOL_PAIRS = 5
60
-
61
-
62
- class SummarizationMiddleware(AgentMiddleware):
63
- """Summarizes conversation history when token limits are approached.
64
-
65
- This middleware monitors message token counts and automatically summarizes older
66
- messages when a threshold is reached, preserving recent messages and maintaining
67
- context continuity by ensuring AI/Tool message pairs remain together.
68
- """
69
-
70
- def __init__(
71
- self,
72
- model: str | BaseChatModel | None = None,
73
- max_tokens_before_summary: int | None = None,
74
- messages_to_keep: int = _DEFAULT_MESSAGES_TO_KEEP,
75
- token_counter: TokenCounter = count_tokens_approximately,
76
- summary_prompt: str = DEFAULT_SUMMARY_PROMPT,
77
- summary_prefix: str = SUMMARY_PREFIX,
78
- use_llm: bool = True,
79
- ) -> None:
80
- """Initialize summarization middleware.
81
-
82
- Args:
83
- model: The language model to use for generating summaries.
84
- If None or a string, will be resolved at runtime from config.
85
- max_tokens_before_summary: Token threshold to trigger summarization.
86
- If `None`, summarization is disabled.
87
- messages_to_keep: Number of recent messages to preserve after summarization.
88
- token_counter: Function to count tokens in messages.
89
- summary_prompt: Prompt template for generating summaries.
90
- summary_prefix: Prefix added to system message when including summary.
91
- use_llm: Whether to use LLM for generating summary. If False, just trims and joins message contents.
92
- """
93
- super().__init__()
94
-
95
- if isinstance(model, str):
96
- model = init_chat_model(model)
97
-
98
- self.model = model
99
- self.max_tokens_before_summary = max_tokens_before_summary
100
- self.messages_to_keep = messages_to_keep
101
- self.token_counter = token_counter
102
- self.summary_prompt = summary_prompt
103
- self.summary_prefix = summary_prefix
104
- self.use_llm = use_llm
105
-
106
- def _get_model(self) -> BaseChatModel:
107
- """Resolve the model to use for summarization."""
108
- if isinstance(self.model, BaseChatModel):
109
- return self.model
110
-
111
- # Resolve from runtime config if not explicitly provided as a BaseChatModel
112
- config = get_config()
113
- model_key = config.get("configurable", {}).get("model", settings.DEFAULT_MODEL)
114
- return get_model(model_key)
115
-
116
- def before_model(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None: # noqa: ARG002
117
- """Process messages before model invocation, potentially triggering summarization."""
118
- messages = state["messages"]
119
- self._ensure_message_ids(messages)
120
-
121
- total_tokens = self.token_counter(messages)
122
- if (
123
- self.max_tokens_before_summary is not None
124
- and total_tokens < self.max_tokens_before_summary
125
- ):
126
- return None
127
-
128
- cutoff_index = self._find_safe_cutoff(messages)
129
-
130
- if cutoff_index <= 0:
131
- return None
132
-
133
- messages_to_summarize, preserved_messages = self._partition_messages(messages, cutoff_index)
134
-
135
- summary = self._create_summary(messages_to_summarize)
136
- new_messages = self._build_new_messages(summary)
137
-
138
- return {
139
- "messages": [
140
- RemoveMessage(id=REMOVE_ALL_MESSAGES),
141
- *new_messages,
142
- *preserved_messages,
143
- ]
144
- }
145
-
146
- def _build_new_messages(self, summary: str) -> list[HumanMessage]:
147
- return [
148
- HumanMessage(content=f"Here is a summary of the conversation to date:\n\n{summary}")
149
- ]
150
-
151
- def _ensure_message_ids(self, messages: list[AnyMessage]) -> None:
152
- """Ensure all messages have unique IDs for the add_messages reducer."""
153
- for msg in messages:
154
- if msg.id is None:
155
- msg.id = str(uuid.uuid4())
156
-
157
- def _partition_messages(
158
- self,
159
- conversation_messages: list[AnyMessage],
160
- cutoff_index: int,
161
- ) -> tuple[list[AnyMessage], list[AnyMessage]]:
162
- """Partition messages into those to summarize and those to preserve."""
163
- messages_to_summarize = conversation_messages[:cutoff_index]
164
- preserved_messages = conversation_messages[cutoff_index:]
165
-
166
- return messages_to_summarize, preserved_messages
167
-
168
- def _find_safe_cutoff(self, messages: list[AnyMessage]) -> int:
169
- """Find safe cutoff point that preserves AI/Tool message pairs.
170
-
171
- Returns the index where messages can be safely cut without separating
172
- related AI and Tool messages. Returns 0 if no safe cutoff is found.
173
- """
174
- if len(messages) <= self.messages_to_keep:
175
- return 0
176
-
177
- target_cutoff = len(messages) - self.messages_to_keep
178
-
179
- for i in range(target_cutoff, -1, -1):
180
- if self._is_safe_cutoff_point(messages, i):
181
- return i
182
-
183
- return 0
184
-
185
- def _is_safe_cutoff_point(self, messages: list[AnyMessage], cutoff_index: int) -> bool:
186
- """Check if cutting at index would separate AI/Tool message pairs."""
187
- if cutoff_index >= len(messages):
188
- return True
189
-
190
- search_start = max(0, cutoff_index - _SEARCH_RANGE_FOR_TOOL_PAIRS)
191
- search_end = min(len(messages), cutoff_index + _SEARCH_RANGE_FOR_TOOL_PAIRS)
192
-
193
- for i in range(search_start, search_end):
194
- if not self._has_tool_calls(messages[i]):
195
- continue
196
-
197
- tool_call_ids = self._extract_tool_call_ids(cast("AIMessage", messages[i]))
198
- if self._cutoff_separates_tool_pair(messages, i, cutoff_index, tool_call_ids):
199
- return False
200
-
201
- return True
202
-
203
- def _has_tool_calls(self, message: AnyMessage) -> bool:
204
- """Check if message is an AI message with tool calls."""
205
- return (
206
- isinstance(message, AIMessage) and hasattr(message, "tool_calls") and message.tool_calls # type: ignore[return-value]
207
- )
208
-
209
- def _extract_tool_call_ids(self, ai_message: AIMessage) -> set[str]:
210
- """Extract tool call IDs from an AI message."""
211
- tool_call_ids = set()
212
- for tc in ai_message.tool_calls:
213
- call_id = tc.get("id") if isinstance(tc, dict) else getattr(tc, "id", None)
214
- if call_id is not None:
215
- tool_call_ids.add(call_id)
216
- return tool_call_ids
217
-
218
- def _cutoff_separates_tool_pair(
219
- self,
220
- messages: list[AnyMessage],
221
- ai_message_index: int,
222
- cutoff_index: int,
223
- tool_call_ids: set[str],
224
- ) -> bool:
225
- """Check if cutoff separates an AI message from its corresponding tool messages."""
226
- for j in range(ai_message_index + 1, len(messages)):
227
- message = messages[j]
228
- if isinstance(message, ToolMessage) and message.tool_call_id in tool_call_ids:
229
- ai_before_cutoff = ai_message_index < cutoff_index
230
- tool_before_cutoff = j < cutoff_index
231
- if ai_before_cutoff != tool_before_cutoff:
232
- return True
233
- return False
234
-
235
- def _create_summary(self, messages_to_summarize: list[AnyMessage]) -> str:
236
- """Generate summary for the given messages."""
237
- if not messages_to_summarize:
238
- return "No previous conversation history."
239
-
240
- for msg in messages_to_summarize:
241
- if isinstance(msg, ToolMessage) and len(str(msg.content)) > self.max_tokens_before_summary:
242
- msg.content = str(msg.content)[:self.max_tokens_before_summary] + "... [Tool output truncated for summary]"
243
-
244
- trimmed_messages = self._trim_messages_for_summary(messages_to_summarize)
245
-
246
- if not self.use_llm:
247
- summary_parts = []
248
-
249
- if self.messages_to_keep > 0:
250
- messages_to_summarize = trimmed_messages[:-self.messages_to_keep]
251
- else:
252
- messages_to_summarize = trimmed_messages
253
-
254
- for msg in messages_to_summarize:
255
- content = msg.content if isinstance(msg.content, str) else str(msg.content)
256
- summary_parts.append(f"[{msg.type}] {content}")
257
-
258
- summary = "\n\n".join(summary_parts)
259
-
260
- if len(summary) > self.max_tokens_before_summary:
261
- return summary[:self.max_tokens_before_summary] + "..."
262
-
263
- return summary
264
-
265
- if not trimmed_messages:
266
- return "Previous conversation was too long to summarize."
267
-
268
- try:
269
- model = self._get_model()
270
- response = model.invoke(self.summary_prompt.format(messages=trimmed_messages))
271
- return cast("str", response.content).strip()
272
- except Exception as e: # noqa: BLE001
273
- return f"Error generating summary: {e!s}"
274
-
275
- def _trim_messages_for_summary(self, messages: list[AnyMessage]) -> list[AnyMessage]:
276
- """Trim messages to fit within summary generation limits."""
277
- try:
278
- return trim_messages(
279
- messages,
280
- max_tokens=_DEFAULT_TRIM_TOKEN_LIMIT,
281
- token_counter=self.token_counter,
282
- start_on="human",
283
- strategy="last",
284
- allow_partial=True,
285
- include_system=True,
286
- )
287
- except Exception: # noqa: BLE001
288
- return messages[-_DEFAULT_FALLBACK_MESSAGE_COUNT:]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/agents/{github_mcp_agent.py → open_source_agent.py} RENAMED
@@ -6,53 +6,14 @@ from langchain_mcp_adapters.client import MultiServerMCPClient
6
  from langchain_mcp_adapters.sessions import StreamableHttpConnection
7
  from langgraph.graph.state import CompiledStateGraph
8
 
9
- from agents.utils import filter_mcp_tools
10
  from agents.lazy_agent import LazyLoadingAgent
11
- from agents.middlewares import ConfigurableModelMiddleware, SummarizationMiddleware
12
  from core import get_model, settings
13
- from agents.prompts.github import SYSTEM_PROMPT
14
 
15
  logger = logging.getLogger(__name__)
16
 
17
- ALLOWED_TOOLS = {
18
- # User & Profile
19
- "get_me", # Get authenticated user's profile
20
- "get_teams", # Get teams the user belongs to
21
- "get_team_members", # Get team member usernames
22
-
23
- # Repository & Code
24
- "search_repositories", # Find repositories by name/description
25
- "get_file_contents", # Get file/directory contents
26
- "search_code", # Search code across repositories
27
- "list_branches", # List repository branches
28
-
29
- # Activity & Contributions
30
- "list_commits", # List commits in a repository
31
- "get_commit", # Get commit details with diff
32
- "list_pull_requests", # List PRs in a repository
33
- "pull_request_read", # Get PR details, diff, reviews
34
- "search_pull_requests",# Search PRs by author
35
-
36
- # Issues
37
- "list_issues", # List issues in a repository
38
- "issue_read", # Get issue details/comments
39
- "search_issues", # Search issues
40
-
41
- # Releases & Tags
42
- "list_releases", # List releases
43
- "get_latest_release", # Get latest release
44
- "get_release_by_tag", # Get release by tag
45
- "list_tags", # List git tags
46
- "get_tag", # Get tag details
47
-
48
- # Discovery
49
- "search_users", # Find GitHub users
50
- "get_label", # Get repository label
51
- "list_issue_types", # List issue types for org
52
- }
53
-
54
-
55
- class GitHubMCPAgent(LazyLoadingAgent):
56
  """GitHub MCP Agent with async initialization for portfolio assistant."""
57
 
58
  def __init__(self) -> None:
@@ -86,8 +47,6 @@ class GitHubMCPAgent(LazyLoadingAgent):
86
  self._mcp_client = MultiServerMCPClient(connections)
87
  logger.info("MCP client initialized successfully")
88
 
89
- # all_tools = await self._mcp_client.get_tools()
90
- # self._mcp_tools = filter_mcp_tools(all_tools, ALLOWED_TOOLS)
91
  self._mcp_tools = await self._mcp_client.get_tools()
92
  except Exception as e:
93
  logger.error(f"Failed to initialize GitHub MCP agent: {e}")
@@ -105,17 +64,12 @@ class GitHubMCPAgent(LazyLoadingAgent):
105
  tools=self._mcp_tools,
106
  middleware=[
107
  ConfigurableModelMiddleware(),
108
- SummarizationMiddleware(
109
- max_tokens_before_summary=1000,
110
- messages_to_keep=4,
111
- use_llm=False
112
- ),
113
  ],
114
- name="github-mcp-agent",
115
  system_prompt=SYSTEM_PROMPT,
116
  debug=True,
117
  )
118
 
119
 
120
  # Create the agent instance
121
- github_mcp_agent = GitHubMCPAgent()
 
6
  from langchain_mcp_adapters.sessions import StreamableHttpConnection
7
  from langgraph.graph.state import CompiledStateGraph
8
 
 
9
  from agents.lazy_agent import LazyLoadingAgent
10
+ from agents.middlewares import ConfigurableModelMiddleware
11
  from core import get_model, settings
12
+ from agents.prompts.open_source import SYSTEM_PROMPT
13
 
14
  logger = logging.getLogger(__name__)
15
 
16
+ class OpenSourceAgent(LazyLoadingAgent):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  """GitHub MCP Agent with async initialization for portfolio assistant."""
18
 
19
  def __init__(self) -> None:
 
47
  self._mcp_client = MultiServerMCPClient(connections)
48
  logger.info("MCP client initialized successfully")
49
 
 
 
50
  self._mcp_tools = await self._mcp_client.get_tools()
51
  except Exception as e:
52
  logger.error(f"Failed to initialize GitHub MCP agent: {e}")
 
64
  tools=self._mcp_tools,
65
  middleware=[
66
  ConfigurableModelMiddleware(),
 
 
 
 
 
67
  ],
68
+ name="open-source-agent",
69
  system_prompt=SYSTEM_PROMPT,
70
  debug=True,
71
  )
72
 
73
 
74
  # Create the agent instance
75
+ open_source_agent = OpenSourceAgent()
src/agents/portfolio_agent.py CHANGED
@@ -4,7 +4,7 @@ from langchain.agents import create_agent
4
  from langgraph.graph.state import CompiledStateGraph
5
 
6
  from agents.lazy_agent import LazyLoadingAgent
7
- from agents.middlewares import ConfigurableModelMiddleware, SummarizationMiddleware
8
  from agents.prompts.portfolio import SYSTEM_PROMPT
9
  from agents.tools.database_search import database_search
10
  from core import get_model, settings
@@ -32,11 +32,6 @@ class PortfolioAgent(LazyLoadingAgent):
32
  tools=self._tools,
33
  middleware=[
34
  ConfigurableModelMiddleware(),
35
- SummarizationMiddleware(
36
- max_tokens_before_summary=1000,
37
- messages_to_keep=4,
38
- use_llm=False
39
- ),
40
  ],
41
  name="portfolio-agent",
42
  system_prompt=SYSTEM_PROMPT,
 
4
  from langgraph.graph.state import CompiledStateGraph
5
 
6
  from agents.lazy_agent import LazyLoadingAgent
7
+ from agents.middlewares import ConfigurableModelMiddleware
8
  from agents.prompts.portfolio import SYSTEM_PROMPT
9
  from agents.tools.database_search import database_search
10
  from core import get_model, settings
 
32
  tools=self._tools,
33
  middleware=[
34
  ConfigurableModelMiddleware(),
 
 
 
 
 
35
  ],
36
  name="portfolio-agent",
37
  system_prompt=SYSTEM_PROMPT,
src/agents/prompts/competitive_programming.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from datetime import datetime
2
+
3
+ PORTFOLIO_URL = "https://anujjoshi.netlify.app"
4
+ OWNER = "Anuj Joshi"
5
+ HANDLE = "anujjoshi3105"
6
+ current_date = datetime.now().strftime("%B %d, %Y")
7
+
8
+ SYSTEM_PROMPT = f"""
9
+ # ROLE: You are the **Lead Algorithmic Strategist** chatbot for {OWNER} (@{HANDLE}).
10
+ # GOAL: Prove elite competitive programming capability to Recruiters and CTOs.
11
+ # DATE: {current_date}.
12
+
13
+ # COMPETITIVE PROGRAMMING (OLD STATS)
14
+ - LeetCode: 1910 (Knight), 750+ solved | Codeforces: 1434 (specialist) | AtCoder: 929 (Green) | GeeksforGeeks: College Rank 46
15
+
16
+ # GUIDELINES:
17
+ - ALWAYS use your tools to fetch the latest stats (ratings, problem counts, streaks) before answering specific questions. Do not guess.
18
+ - Interpret the data: Don't just list numbers; explain what they mean (e.g., "A rating of X puts him in the top Y%," or "A Z-day streak shows consistency").
19
+ - If fail to fetch the data, use the old stats with a disclaimer that the data is not up to date.
20
+ - Be professional, concise, and humble but confident.
21
+ - If asked about contact info or hireability, direct them to the contact section {PORTFOLIO_URL}/contact.
22
+ - Never hallucinate or make up information. Always use the tools to fetch the latest information.
23
+ - Never claim anything that is not in the tools or competitive programming platform. If you don't know the answer, say you don't know and suggest the user other information they can find on the competitive programming platform.
24
+ """
src/agents/prompts/cpstat.py DELETED
@@ -1,22 +0,0 @@
1
- from datetime import datetime
2
-
3
- PORTFOLIO_URL = "https://anujjoshi.netlify.app"
4
- OWNER = "Anuj Joshi"
5
- HANDLE = "anujjoshi3105"
6
- current_date = datetime.now().strftime("%B %d, %Y")
7
-
8
- SYSTEM_PROMPT = f"""
9
- ### ROLE
10
- You are the **Lead Algorithmic Strategist** chatbot for {OWNER} (@{HANDLE}).
11
- **Date:** {current_date}.
12
-
13
- # Competitive Programming (OLD STATS)
14
- - LeetCode: 1910 (Knight), 750+ solved | Codeforces: 1434 (specialist) | AtCoder: 929 (Green) | GeeksforGeeks: College Rank 46
15
-
16
- # OUTPUT GUIDELINES:
17
- - **Translate Stats to Value:** Do not just list ratings. Explain that {OWNER}'s CP background guarantees **low-latency code, O(n) optimization habits, and edge-case resilience** in production.
18
- - **Cross-Platform Mastery:** Highlight versatility. Success across LeetCode (Interviews), Codeforces (Math/Logic), and AtCoder (Precision) proves adaptability.
19
- - **Tone:** Analytical, precise, and impressive. Speak like a Principal Engineer evaluating talent.
20
- - **Contextualize:** "750+ problems" isn't just a number; it's a library of known design patterns ready for deployment.
21
- - **The Bottom Line:** Always conclude by stating: "{OWNER} doesn't just code; he engineers mathematically optimal solutions."
22
- """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/agents/prompts/followup.py CHANGED
@@ -1,3 +1,4 @@
 
1
  DEFAULT_FOLLOWUP_PROMPTS = [
2
  "Tell me about Anuj's background",
3
  "What are Anuj's key technical skills?",
@@ -8,5 +9,4 @@ DEFAULT_FOLLOWUP_PROMPTS = [
8
 
9
  FOLLOWUP_GENERATION_PROMPT = f"""
10
  # Role: You are a predictive user intent engine.
11
- # Task: Generate 3-5 suggested follow-up options that the USER would click to ask the chatbot. Based on the previous message, predict the most logical next steps for the user.
12
- """
 
1
+ OWNER = "Anuj Joshi"
2
  DEFAULT_FOLLOWUP_PROMPTS = [
3
  "Tell me about Anuj's background",
4
  "What are Anuj's key technical skills?",
 
9
 
10
  FOLLOWUP_GENERATION_PROMPT = f"""
11
  # Role: You are a predictive user intent engine.
12
+ # Task: Generate 3-5 suggested follow-up options (each max 4-6 words) that the portfolio visitor, recruiter, or employer would like to ask the {OWNER}. Based on the previous message, predict the most logical next steps for the {OWNER}."""
 
src/agents/prompts/github.py DELETED
@@ -1,21 +0,0 @@
1
- from datetime import datetime
2
-
3
- PORTFOLIO_URL = "https://anujjoshi.netlify.app"
4
- OWNER = "Anuj Joshi"
5
- GITHUB_HANDLE = "anujjoshi3105"
6
- current_date = datetime.now().strftime("%B %d, %Y")
7
-
8
- SYSTEM_PROMPT = f"""
9
- # ROLE
10
- You are the **Senior Technical Architect** chatbot for {OWNER} (@{GITHUB_HANDLE}).
11
- **Goal:** Prove elite engineering capability to Recruiters and CTOs.
12
-
13
- # OUTPUT GUIDELINES:
14
- - **Insight over Inventory:** Never just list files. Analyze **architectural choices**, **scalability**, and **complexity**. Explain *why* the code matters.
15
- - **Fail-Safe Protocol:** If a specific repo isn't found, **never admit defeat**. Pivot immediately to {OWNER}'s core strengths (Full Stack/AI) or top pinned projects.
16
- - **Output Style:** Executive summaries. High-density technical language. Concise (max 150 words).
17
- - **No Code Walls:** Summarize logic only.
18
- - **The Closer:** Every response must subtly guide the user to witness the live work at {PORTFOLIO_URL}.
19
-
20
- **Tone:** Confident, Precise, 10x Engineer.
21
- """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/agents/prompts/open_source.py ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from datetime import datetime
2
+
3
+ PORTFOLIO_URL = "https://anujjoshi.netlify.app"
4
+ OWNER = "Anuj Joshi"
5
+ GITHUB_HANDLE = "anujjoshi3105"
6
+ current_date = datetime.now().strftime("%B %d, %Y")
7
+
8
+ SYSTEM_PROMPT = f"""
9
+ # ROLE: You are the **Senior Technical Architect** chatbot for {OWNER} (@{GITHUB_HANDLE}).
10
+ # GOAL: Prove elite engineering capability to Recruiters and CTOs.
11
+
12
+ # OUTPUT GUIDELINES:
13
+ - Instead of just counting commits, focus on significant projects, languages used, and contributions to major external repositories.
14
+ - Fail-Safe Protocol: If a specific repo isn't found, never admit defeat. Pivot immediately to {OWNER}'s core strengths (Full Stack/AI) or top pinned projects.
15
+ - The Closer: Every response must subtly guide the user to witness the live work at {PORTFOLIO_URL}.
16
+ - Never hallucinate or make up information. Always use the tools to fetch the latest information.
17
+ - Never claim anything that is not in the tools or GitHub. If you don't know the answer, say you don't know and suggest the user other information they can find on the GitHub.
18
+ - Tone: Transparent, tech-savvy, and evidence-based. If a recruiter asks "What is their best work?", use tool data to identify the repo with the most activity or stars.
19
+ """
src/agents/prompts/portfolio.py CHANGED
@@ -3,8 +3,8 @@ PORTFOLIO_URL = "https://anujjoshi.netlify.app"
3
  OWNER = "Anuj Joshi"
4
 
5
  SYSTEM_PROMPT = f"""
6
- You are an **Award-Winning Professional Portfolio Assistant** for {OWNER}, (a Full Stack Developer and AI & Machine Learning Engineer from New Delhi, India).
7
- Your goal is to answer questions from recruiter, potential employer, or visitors about {OWNER}'s skills, projects, qualifications, and experience.
8
 
9
  # CONTACT ({PORTFOLIO_URL}/contact)
10
  - Portfolio: {PORTFOLIO_URL} | Email: anujjoshi3105@gmail.com | LinkedIn: linkedin.com/in/anujjoshi3105 | GitHub: github.com/anujjoshi3105 | X: x.com/anujjoshi3105
@@ -79,11 +79,13 @@ https://drive.google.com/file/d/150EAtBVjP1DV-b_v0JKhVYzhIVoCvAWO/view
79
  - Volunteer, Summer School on AI (DTU) https://drive.google.com/file/d/10Jx3yC8gmFYHkl0KXucaUOZJqtf9QkJq/view?usp=drive_link: Supported hands-on sessions on deep learning, transformers, and generative AI.
80
 
81
  # TOOLS
82
- - Database_Search: Search portfolio info (education, experience, testimonials, skills, projects, blog). Cite {PORTFOLIO_URL}/blog
83
 
84
  # OUTPUT GUIDELINES:
85
- - **Content:** Be very specific and accurate with the information you provide.
86
- - **Information-Dense:** Every sentence must provide a new fact.
87
- - **Interesting and Engaging:** Use a mix of facts and interesting details to keep the reader engaged.
88
- - **Style:** Professional, concise, witty and helpful.
 
 
89
  """
 
3
  OWNER = "Anuj Joshi"
4
 
5
  SYSTEM_PROMPT = f"""
6
+ # ROLE: You are an **Award-Winning Professional Portfolio Assistant** for {OWNER}, (a Full Stack Developer and AI & Machine Learning Engineer from New Delhi, India).
7
+ # GOAL: Prove elite engineering capability to Recruiters and CTOs, concisely but informatively.
8
 
9
  # CONTACT ({PORTFOLIO_URL}/contact)
10
  - Portfolio: {PORTFOLIO_URL} | Email: anujjoshi3105@gmail.com | LinkedIn: linkedin.com/in/anujjoshi3105 | GitHub: github.com/anujjoshi3105 | X: x.com/anujjoshi3105
 
79
  - Volunteer, Summer School on AI (DTU) https://drive.google.com/file/d/10Jx3yC8gmFYHkl0KXucaUOZJqtf9QkJq/view?usp=drive_link: Supported hands-on sessions on deep learning, transformers, and generative AI.
80
 
81
  # TOOLS
82
+ - Database_Search: Search portfolio info (education, experience, testimonials, skills, projects, blog).
83
 
84
  # OUTPUT GUIDELINES:
85
+ - Never hallucinate or make up information. Always use the tools to fetch the latest information.
86
+ - Never claim anything that is not in the tools or portfolio. If you don't know the answer, say you don't know and suggest the user other information they can find on the portfolio.
87
+ - Content: Be very specific and accurate with the information you provide.
88
+ - Information Dense: Every sentence must provide a new fact.
89
+ - Interesting and Engaging: Use a mix of facts and interesting details to keep the reader engaged.
90
+ - Style: Professional, concise, witty and helpful.
91
  """
src/agents/tools/database_search.py CHANGED
@@ -19,7 +19,7 @@ def database_search(query: str) -> str:
19
  This tool should be used whenever the user asks about Anuj's background, work, or specific accomplishments.
20
  """
21
 
22
- retriever = load_pgvector_retriever(3)
23
  documents = retriever.invoke(query)
24
 
25
  if not documents:
 
19
  This tool should be used whenever the user asks about Anuj's background, work, or specific accomplishments.
20
  """
21
 
22
+ retriever = load_pgvector_retriever(4)
23
  documents = retriever.invoke(query)
24
 
25
  if not documents:
src/agents/utils.py DELETED
@@ -1,15 +0,0 @@
1
- from langchain_core.tools import BaseTool
2
- from typing import Iterable, Set, List
3
-
4
-
5
- def filter_mcp_tools(tools: Iterable[BaseTool], allowed: Set[str]) -> List[BaseTool]:
6
- """Keep only allowed MCP tools and remove namespace (cpstat.)."""
7
- filtered = []
8
-
9
- for t in tools:
10
- short_name = t.name.rsplit(".", 1)[-1]
11
- if short_name in allowed:
12
- t.name = short_name
13
- filtered.append(t)
14
-
15
- return filtered
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/core/embeddings.py CHANGED
@@ -4,6 +4,7 @@ from typing import TypeAlias
4
  from langchain_google_genai import GoogleGenerativeAIEmbeddings
5
  from langchain_ollama import OllamaEmbeddings
6
  from langchain_openai import OpenAIEmbeddings
 
7
 
8
  from core.settings import settings
9
  from schema.models import (
@@ -11,12 +12,14 @@ from schema.models import (
11
  GoogleEmbeddingModelName,
12
  OllamaEmbeddingModelName,
13
  OpenAIEmbeddingModelName,
 
14
  )
15
 
16
  EmbeddingT: TypeAlias = (
17
  OpenAIEmbeddings
18
  | GoogleGenerativeAIEmbeddings
19
  | OllamaEmbeddings
 
20
  )
21
 
22
 
@@ -34,4 +37,10 @@ def get_embeddings(model_name: AllEmbeddingModelEnum, /) -> EmbeddingT:
34
  base_url=settings.OLLAMA_BASE_URL,
35
  )
36
 
 
 
 
 
 
 
37
  raise ValueError(f"Unsupported embedding model: {model_name}")
 
4
  from langchain_google_genai import GoogleGenerativeAIEmbeddings
5
  from langchain_ollama import OllamaEmbeddings
6
  from langchain_openai import OpenAIEmbeddings
7
+ from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
8
 
9
  from core.settings import settings
10
  from schema.models import (
 
12
  GoogleEmbeddingModelName,
13
  OllamaEmbeddingModelName,
14
  OpenAIEmbeddingModelName,
15
+ NvidiaEmbeddingModelName,
16
  )
17
 
18
  EmbeddingT: TypeAlias = (
19
  OpenAIEmbeddings
20
  | GoogleGenerativeAIEmbeddings
21
  | OllamaEmbeddings
22
+ | NVIDIAEmbeddings
23
  )
24
 
25
 
 
37
  base_url=settings.OLLAMA_BASE_URL,
38
  )
39
 
40
+ if model_name in NvidiaEmbeddingModelName:
41
+ return NVIDIAEmbeddings(
42
+ model=model_name.value,
43
+ api_key=settings.NVIDIA_API_KEY.get_secret_value() if settings.NVIDIA_API_KEY else None,
44
+ )
45
+
46
  raise ValueError(f"Unsupported embedding model: {model_name}")
src/core/llm.py CHANGED
@@ -8,6 +8,7 @@ from langchain_google_genai import ChatGoogleGenerativeAI
8
  from langchain_google_vertexai import ChatVertexAI
9
  from langchain_groq import ChatGroq
10
  from langchain_ollama import ChatOllama
 
11
  from langchain_openai import AzureChatOpenAI, ChatOpenAI
12
 
13
  from core.settings import settings
@@ -24,6 +25,7 @@ from schema.models import (
24
  OpenAICompatibleName,
25
  OpenAIModelName,
26
  OpenRouterModelName,
 
27
  VertexAIModelName,
28
  )
29
 
@@ -39,6 +41,7 @@ _MODEL_TABLE = (
39
  | {m: m.value for m in AWSModelName}
40
  | {m: m.value for m in OllamaModelName}
41
  | {m: m.value for m in OpenRouterModelName}
 
42
  | {m: m.value for m in FakeModelName}
43
  )
44
 
@@ -60,6 +63,7 @@ ModelT: TypeAlias = (
60
  | ChatGroq
61
  | ChatBedrock
62
  | ChatOllama
 
63
  | FakeToolModel
64
  )
65
 
@@ -112,9 +116,13 @@ def get_model(model_name: AllModelEnum, /) -> ModelT:
112
  return ChatGoogleGenerativeAI(model=api_model_name, temperature=0.5, streaming=True)
113
  if model_name in VertexAIModelName:
114
  return ChatVertexAI(model=api_model_name, temperature=0.5, streaming=True)
 
 
 
 
 
 
115
  if model_name in GroqModelName:
116
- if model_name == GroqModelName.LLAMA_GUARD_4_12B:
117
- return ChatGroq(model=api_model_name, temperature=0.0) # type: ignore[call-arg]
118
  return ChatGroq(model=api_model_name, temperature=0.5) # type: ignore[call-arg]
119
  if model_name in AWSModelName:
120
  return ChatBedrock(model_id=api_model_name, temperature=0.5)
 
8
  from langchain_google_vertexai import ChatVertexAI
9
  from langchain_groq import ChatGroq
10
  from langchain_ollama import ChatOllama
11
+ from langchain_nvidia_ai_endpoints import ChatNVIDIA
12
  from langchain_openai import AzureChatOpenAI, ChatOpenAI
13
 
14
  from core.settings import settings
 
25
  OpenAICompatibleName,
26
  OpenAIModelName,
27
  OpenRouterModelName,
28
+ NvidiaModelName,
29
  VertexAIModelName,
30
  )
31
 
 
41
  | {m: m.value for m in AWSModelName}
42
  | {m: m.value for m in OllamaModelName}
43
  | {m: m.value for m in OpenRouterModelName}
44
+ | {m: m.value for m in NvidiaModelName}
45
  | {m: m.value for m in FakeModelName}
46
  )
47
 
 
63
  | ChatGroq
64
  | ChatBedrock
65
  | ChatOllama
66
+ | ChatNVIDIA
67
  | FakeToolModel
68
  )
69
 
 
116
  return ChatGoogleGenerativeAI(model=api_model_name, temperature=0.5, streaming=True)
117
  if model_name in VertexAIModelName:
118
  return ChatVertexAI(model=api_model_name, temperature=0.5, streaming=True)
119
+ if model_name in NvidiaModelName:
120
+ return ChatNVIDIA(
121
+ model=api_model_name,
122
+ temperature=0.5,
123
+ api_key=settings.NVIDIA_API_KEY,
124
+ )
125
  if model_name in GroqModelName:
 
 
126
  return ChatGroq(model=api_model_name, temperature=0.5) # type: ignore[call-arg]
127
  if model_name in AWSModelName:
128
  return ChatBedrock(model_id=api_model_name, temperature=0.5)
src/core/settings.py CHANGED
@@ -22,6 +22,7 @@ from schema.models import (
22
  FakeModelName,
23
  GoogleModelName,
24
  GroqModelName,
 
25
  OllamaModelName,
26
  OpenAICompatibleName,
27
  OpenAIModelName,
@@ -32,6 +33,7 @@ from schema.models import (
32
  OpenAIEmbeddingModelName,
33
  GoogleEmbeddingModelName,
34
  OllamaEmbeddingModelName,
 
35
  )
36
 
37
 
@@ -100,6 +102,7 @@ class Settings(BaseSettings):
100
  OLLAMA_BASE_URL: str | None = None
101
  USE_FAKE_MODEL: bool = False
102
  OPENROUTER_API_KEY: str | None = None
 
103
 
104
  # If DEFAULT_MODEL is None, it will be set in model_post_init
105
  DEFAULT_MODEL: AllModelEnum | None = None # type: ignore[assignment]
@@ -184,6 +187,7 @@ class Settings(BaseSettings):
184
  Provider.FAKE: self.USE_FAKE_MODEL,
185
  Provider.AZURE_OPENAI: self.AZURE_OPENAI_API_KEY,
186
  Provider.OPENROUTER: self.OPENROUTER_API_KEY,
 
187
  }
188
  active_keys = [k for k, v in api_keys.items() if v]
189
  if not active_keys:
@@ -215,6 +219,10 @@ class Settings(BaseSettings):
215
  if self.DEFAULT_MODEL is None:
216
  self.DEFAULT_MODEL = VertexAIModelName.GEMINI_20_FLASH
217
  self.AVAILABLE_MODELS.update(set(VertexAIModelName))
 
 
 
 
218
  case Provider.GROQ:
219
  if self.DEFAULT_MODEL is None:
220
  self.DEFAULT_MODEL = GroqModelName.LLAMA_31_8B_INSTANT
@@ -280,6 +288,10 @@ class Settings(BaseSettings):
280
  self.AVAILABLE_EMBEDDING_MODELS.update(set(OllamaEmbeddingModelName))
281
  if not self.OLLAMA_EMBEDDING_MODEL:
282
  self.OLLAMA_EMBEDDING_MODEL = OllamaEmbeddingModelName.NOMIC_EMBED_TEXT
 
 
 
 
283
 
284
  @computed_field # type: ignore[prop-decorator]
285
  @property
 
22
  FakeModelName,
23
  GoogleModelName,
24
  GroqModelName,
25
+ NvidiaModelName,
26
  OllamaModelName,
27
  OpenAICompatibleName,
28
  OpenAIModelName,
 
33
  OpenAIEmbeddingModelName,
34
  GoogleEmbeddingModelName,
35
  OllamaEmbeddingModelName,
36
+ NvidiaEmbeddingModelName,
37
  )
38
 
39
 
 
102
  OLLAMA_BASE_URL: str | None = None
103
  USE_FAKE_MODEL: bool = False
104
  OPENROUTER_API_KEY: str | None = None
105
+ NVIDIA_API_KEY: SecretStr | None = None
106
 
107
  # If DEFAULT_MODEL is None, it will be set in model_post_init
108
  DEFAULT_MODEL: AllModelEnum | None = None # type: ignore[assignment]
 
187
  Provider.FAKE: self.USE_FAKE_MODEL,
188
  Provider.AZURE_OPENAI: self.AZURE_OPENAI_API_KEY,
189
  Provider.OPENROUTER: self.OPENROUTER_API_KEY,
190
+ Provider.NVIDIA: self.NVIDIA_API_KEY,
191
  }
192
  active_keys = [k for k, v in api_keys.items() if v]
193
  if not active_keys:
 
219
  if self.DEFAULT_MODEL is None:
220
  self.DEFAULT_MODEL = VertexAIModelName.GEMINI_20_FLASH
221
  self.AVAILABLE_MODELS.update(set(VertexAIModelName))
222
+ case Provider.NVIDIA:
223
+ if self.DEFAULT_MODEL is None:
224
+ self.DEFAULT_MODEL = NvidiaModelName.LLAMA_31_NEMOTRON_70B_INSTRUCT
225
+ self.AVAILABLE_MODELS.update(set(NvidiaModelName))
226
  case Provider.GROQ:
227
  if self.DEFAULT_MODEL is None:
228
  self.DEFAULT_MODEL = GroqModelName.LLAMA_31_8B_INSTANT
 
288
  self.AVAILABLE_EMBEDDING_MODELS.update(set(OllamaEmbeddingModelName))
289
  if not self.OLLAMA_EMBEDDING_MODEL:
290
  self.OLLAMA_EMBEDDING_MODEL = OllamaEmbeddingModelName.NOMIC_EMBED_TEXT
291
+ case Provider.NVIDIA:
292
+ if self.DEFAULT_EMBEDDING_MODEL is None:
293
+ self.DEFAULT_EMBEDDING_MODEL = NvidiaEmbeddingModelName.NV_EMBEDQA_MISTRAL_7B_V2
294
+ self.AVAILABLE_EMBEDDING_MODELS.update(set(NvidiaEmbeddingModelName))
295
 
296
  @computed_field # type: ignore[prop-decorator]
297
  @property
src/schema/__init__.py CHANGED
@@ -3,7 +3,9 @@ from schema.schema import (
3
  AgentInfo,
4
  ChatHistory,
5
  ChatHistoryInput,
 
6
  ChatMessage,
 
7
  Feedback,
8
  FeedbackResponse,
9
  ServiceMetadata,
@@ -19,12 +21,14 @@ __all__ = [
19
  "AllModelEnum",
20
  "UserInput",
21
  "ChatMessage",
 
22
  "ServiceMetadata",
23
  "StreamInput",
24
  "Feedback",
25
  "FeedbackResponse",
26
  "ChatHistoryInput",
27
  "ChatHistory",
 
28
  "ThreadSummary",
29
  "ThreadListInput",
30
  "ThreadList",
 
3
  AgentInfo,
4
  ChatHistory,
5
  ChatHistoryInput,
6
+ ChatHistoryResponse,
7
  ChatMessage,
8
+ ChatMessagePreview,
9
  Feedback,
10
  FeedbackResponse,
11
  ServiceMetadata,
 
21
  "AllModelEnum",
22
  "UserInput",
23
  "ChatMessage",
24
+ "ChatMessagePreview",
25
  "ServiceMetadata",
26
  "StreamInput",
27
  "Feedback",
28
  "FeedbackResponse",
29
  "ChatHistoryInput",
30
  "ChatHistory",
31
+ "ChatHistoryResponse",
32
  "ThreadSummary",
33
  "ThreadListInput",
34
  "ThreadList",
src/schema/models.py CHANGED
@@ -14,6 +14,7 @@ class Provider(StrEnum):
14
  AWS = auto()
15
  OLLAMA = auto()
16
  OPENROUTER = auto()
 
17
  FAKE = auto()
18
 
19
 
@@ -84,24 +85,24 @@ class VertexAIModelName(StrEnum):
84
  class GroqModelName(StrEnum):
85
  """https://console.groq.com/docs/models"""
86
 
87
- LLAMA_GUARD_4_12B = "meta-llama/llama-guard-4-12b"
88
- LLAMA_31_8B_INSTANT = "llama-3.1-8b-instant"
89
- LLAMA_33_70B_VERSATILE = "llama-3.3-70b-versatile"
90
- LLAMA_4_MAVERICK_17B_128E = "meta-llama/llama-4-maverick-17b-128e-instruct"
91
- LLAMA_4_SCOUT_17B_16E = "meta-llama/llama-4-scout-17b-16e-instruct"
92
  LLAMA_PROMPT_GUARD_2_22M = "meta-llama/llama-prompt-guard-2-22m"
93
  LLAMA_PROMPT_GUARD_2_86M = "meta-llama/llama-prompt-guard-2-86m"
94
- OPENAI_GPT_OSS_120B = "openai/gpt-oss-120b"
95
- OPENAI_GPT_OSS_20B = "openai/gpt-oss-20b"
96
  OPENAI_GPT_OSS_SAFEGUARD_20B = "openai/gpt-oss-safeguard-20b"
97
- GROQ_COMPOUND = "groq/compound"
98
  GROQ_COMPOUND_MINI = "groq/compound-mini"
99
- QWEN_3_32B = "qwen/qwen3-32b"
100
- KIMI_K2_INSTRUCT = "moonshotai/kimi-k2-instruct"
101
- KIMI_K2_INSTRUCT_0905 = "moonshotai/kimi-k2-instruct-0905"
102
  ORPHEUS_ARABIC_SAUDI = "canopylabs/orpheus-arabic-saudi"
103
  ORPHEUS_V1_ENGLISH = "canopylabs/orpheus-v1-english"
104
- WHISPER_LARGE_V3 = "whisper-large-v3"
105
  WHISPER_LARGE_V3_TURBO = "whisper-large-v3-turbo"
106
  ALLAM_2_7B = "allam-2-7b"
107
 
@@ -132,6 +133,192 @@ class OpenRouterModelName(StrEnum):
132
  GEMINI_25_FLASH = "google/gemini-2.5-flash"
133
 
134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
  class OpenAICompatibleName(StrEnum):
136
  """https://platform.openai.com/docs/guides/text-generation"""
137
 
@@ -156,11 +343,20 @@ AllModelEnum: TypeAlias = (
156
  | AWSModelName
157
  | OllamaModelName
158
  | OpenRouterModelName
 
159
  | FakeModelName
160
  )
161
 
 
 
 
 
 
 
 
162
  AllEmbeddingModelEnum: TypeAlias = (
163
  OpenAIEmbeddingModelName
164
  | GoogleEmbeddingModelName
165
  | OllamaEmbeddingModelName
 
166
  )
 
14
  AWS = auto()
15
  OLLAMA = auto()
16
  OPENROUTER = auto()
17
+ NVIDIA = auto()
18
  FAKE = auto()
19
 
20
 
 
85
  class GroqModelName(StrEnum):
86
  """https://console.groq.com/docs/models"""
87
 
88
+ # LLAMA_GUARD_4_12B = "meta-llama/llama-guard-4-12b"
89
+ # LLAMA_31_8B_INSTANT = "llama-3.1-8b-instant"
90
+ # LLAMA_33_70B_VERSATILE = "llama-3.3-70b-versatile"
91
+ # LLAMA_4_MAVERICK_17B_128E = "meta-llama/llama-4-maverick-17b-128e-instruct"
92
+ # LLAMA_4_SCOUT_17B_16E = "meta-llama/llama-4-scout-17b-16e-instruct"
93
  LLAMA_PROMPT_GUARD_2_22M = "meta-llama/llama-prompt-guard-2-22m"
94
  LLAMA_PROMPT_GUARD_2_86M = "meta-llama/llama-prompt-guard-2-86m"
95
+ # OPENAI_GPT_OSS_120B = "openai/gpt-oss-120b"
96
+ # OPENAI_GPT_OSS_20B = "openai/gpt-oss-20b"
97
  OPENAI_GPT_OSS_SAFEGUARD_20B = "openai/gpt-oss-safeguard-20b"
98
+ # GROQ_COMPOUND = "groq/compound"
99
  GROQ_COMPOUND_MINI = "groq/compound-mini"
100
+ # QWEN_3_32B = "qwen/qwen3-32b"
101
+ # KIMI_K2_INSTRUCT = "moonshotai/kimi-k2-instruct"
102
+ # KIMI_K2_INSTRUCT_0905 = "moonshotai/kimi-k2-instruct-0905"
103
  ORPHEUS_ARABIC_SAUDI = "canopylabs/orpheus-arabic-saudi"
104
  ORPHEUS_V1_ENGLISH = "canopylabs/orpheus-v1-english"
105
+ # WHISPER_LARGE_V3 = "whisper-large-v3"
106
  WHISPER_LARGE_V3_TURBO = "whisper-large-v3-turbo"
107
  ALLAM_2_7B = "allam-2-7b"
108
 
 
133
  GEMINI_25_FLASH = "google/gemini-2.5-flash"
134
 
135
 
136
+ class NvidiaModelName(StrEnum):
137
+ """https://build.nvidia.com/explore/discover"""
138
+
139
+ # ABACUSAI_DRACARYS_LLAMA_3_1_70B_INSTRUCT = "abacusai/dracarys-llama-3.1-70b-instruct"
140
+ # ADEPT_FUYU_8B = "adept/fuyu-8b"
141
+ # AI21LABS_JAMBA_1_5_LARGE_INSTRUCT = "ai21labs/jamba-1.5-large-instruct"
142
+ # AI21LABS_JAMBA_1_5_MINI_INSTRUCT = "ai21labs/jamba-1.5-mini-instruct"
143
+ # AISINGAPORE_SEA_LION_7B_INSTRUCT = "aisingapore/sea-lion-7b-instruct"
144
+ # BAAI_GET_M3 = "baai/get-m3"
145
+ # BAICHUAN_INC_BAICHUAN2_13B_CHAT = "baichuan-inc/baichuan2-13b-chat"
146
+ # BIGCODE_STARCODER2_15B = "bigcode/starcoder2-15b"
147
+ # BIGCODE_STARCODER2_7B = "bigcode/starcoder2-7b"
148
+ # BYTEDANCE_SEED_OSS_36B_INSTRUCT = "bytedance/seed-oss-36b-instruct"
149
+ # DATABRICKS_DBRX_INSTRUCT = "databricks/dbrx-instruct"
150
+ # DEEPSEEK_AI_DEEPSEEK_CODER_6_7B_INSTRUCT = "deepseek-ai/deepseek-coder-6.7b-instruct"
151
+ # DEEPSEEK_AI_DEEPSEEK_R1_DISTILL_LLAMA_8B = "deepseek-ai/deepseek-r1-distill-llama-8b"
152
+ # DEEPSEEK_AI_DEEPSEEK_R1_DISTILL_QWEN_14B = "deepseek-ai/deepseek-r1-distill-qwen-14b"
153
+ # DEEPSEEK_AI_DEEPSEEK_R1_DISTILL_QWEN_32B = "deepseek-ai/deepseek-r1-distill-qwen-32b"
154
+ # DEEPSEEK_AI_DEEPSEEK_R1_DISTILL_QWEN_7B = "deepseek-ai/deepseek-r1-distill-qwen-7b"
155
+ # DEEPSEEK_AI_DEEPSEEK_V3_1 = "deepseek-ai/deepseek-v3.1"
156
+ DEEPSEEK_AI_DEEPSEEK_V3_1_TERMINUS = "deepseek-ai/deepseek-v3.1-terminus"
157
+ DEEPSEEK_AI_DEEPSEEK_V3_2 = "deepseek-ai/deepseek-v3.2"
158
+ GOOGLE_CODEGEMMA_1_1_7B = "google/codegemma-1.1-7b"
159
+ GOOGLE_CODEGEMMA_7B = "google/codegemma-7b"
160
+ GOOGLE_DEPLOT = "google/deplot"
161
+ GOOGLE_GEMMA_2B = "google/gemma-2b"
162
+ GOOGLE_GEMMA_2_27B_IT = "google/gemma-2-27b-it"
163
+ GOOGLE_GEMMA_2_2B_IT = "google/gemma-2-2b-it"
164
+ GOOGLE_GEMMA_2_9B_IT = "google/gemma-2-9b-it"
165
+ GOOGLE_GEMMA_3N_E2B_IT = "google/gemma-3n-e2b-it"
166
+ GOOGLE_GEMMA_3N_E4B_IT = "google/gemma-3n-e4b-it"
167
+ GOOGLE_GEMMA_3_12B_IT = "google/gemma-3-12b-it"
168
+ GOOGLE_GEMMA_3_1B_IT = "google/gemma-3-1b-it"
169
+ GOOGLE_GEMMA_3_27B_IT = "google/gemma-3-27b-it"
170
+ GOOGLE_GEMMA_3_4B_IT = "google/gemma-3-4b-it"
171
+ GOOGLE_GEMMA_7B = "google/gemma-7b"
172
+ GOOGLE_PALIGEMMA = "google/paligemma"
173
+ GOOGLE_RECURRENTGEMMA_2B = "google/recurrentgemma-2b"
174
+ GOOGLE_SHIELDGEMMA_9B = "google/shieldgemma-9b"
175
+ GOTOCOMPANY_GEMMA_2_9B_CPT_SAHABATAI_INSTRUCT = "gotocompany/gemma-2-9b-cpt-sahabatai-instruct"
176
+ IBM_GRANITE_34B_CODE_INSTRUCT = "ibm/granite-34b-code-instruct"
177
+ IBM_GRANITE_3_0_3B_A800M_INSTRUCT = "ibm/granite-3.0-3b-a800m-instruct"
178
+ IBM_GRANITE_3_0_8B_INSTRUCT = "ibm/granite-3.0-8b-instruct"
179
+ IBM_GRANITE_3_3_8B_INSTRUCT = "ibm/granite-3.3-8b-instruct"
180
+ IBM_GRANITE_8B_CODE_INSTRUCT = "ibm/granite-8b-code-instruct"
181
+ IBM_GRANITE_GUARDIAN_3_0_8B = "ibm/granite-guardian-3.0-8b"
182
+ IGENIUS_COLOSSEUM_355B_INSTRUCT_16K = "igenius/colosseum_355b_instruct_16k"
183
+ IGENIUS_ITALIA_10B_INSTRUCT_16K = "igenius/italia_10b_instruct_16k"
184
+ INSTITUTE_OF_SCIENCE_TOKYO_LLAMA_3_1_SWALLOW_70B_INSTRUCT_V0_1 = "institute-of-science-tokyo/llama-3.1-swallow-70b-instruct-v0.1"
185
+ INSTITUTE_OF_SCIENCE_TOKYO_LLAMA_3_1_SWALLOW_8B_INSTRUCT_V0_1 = "institute-of-science-tokyo/llama-3.1-swallow-8b-instruct-v0.1"
186
+ MARIN_MARIN_8B_INSTRUCT = "marin/marin-8b-instruct"
187
+ MEDIATEK_BREEZE_7B_INSTRUCT = "mediatek/breeze-7b-instruct"
188
+ META_CODELLAMA_70B = "meta/codellama-70b"
189
+ META_LLAMA2_70B = "meta/llama2-70b"
190
+ META_LLAMA3_70B_INSTRUCT = "meta/llama3-70b-instruct"
191
+ META_LLAMA3_8B_INSTRUCT = "meta/llama3-8b-instruct"
192
+ META_LLAMA_3_1_405B_INSTRUCT = "meta/llama-3.1-405b-instruct"
193
+ META_LLAMA_3_1_70B_INSTRUCT = "meta/llama-3.1-70b-instruct"
194
+ META_LLAMA_3_1_8B_INSTRUCT = "meta/llama-3.1-8b-instruct"
195
+ META_LLAMA_3_2_11B_VISION_INSTRUCT = "meta/llama-3.2-11b-vision-instruct"
196
+ META_LLAMA_3_2_1B_INSTRUCT = "meta/llama-3.2-1b-instruct"
197
+ META_LLAMA_3_2_3B_INSTRUCT = "meta/llama-3.2-3b-instruct"
198
+ META_LLAMA_3_2_90B_VISION_INSTRUCT = "meta/llama-3.2-90b-vision-instruct"
199
+ META_LLAMA_3_3_70B_INSTRUCT = "meta/llama-3.3-70b-instruct"
200
+ META_LLAMA_4_MAVERICK_17B_128E_INSTRUCT = "meta/llama-4-maverick-17b-128e-instruct"
201
+ META_LLAMA_4_SCOUT_17B_16E_INSTRUCT = "meta/llama-4-scout-17b-16e-instruct"
202
+ META_LLAMA_GUARD_4_12B = "meta/llama-guard-4-12b"
203
+ MICROSOFT_KOSMOS_2 = "microsoft/kosmos-2"
204
+ MICROSOFT_PHI_3_5_MINI_INSTRUCT = "microsoft/phi-3.5-mini-instruct"
205
+ MICROSOFT_PHI_3_5_MOE_INSTRUCT = "microsoft/phi-3.5-moe-instruct"
206
+ MICROSOFT_PHI_3_5_VISION_INSTRUCT = "microsoft/phi-3.5-vision-instruct"
207
+ MICROSOFT_PHI_3_MEDIUM_128K_INSTRUCT = "microsoft/phi-3-medium-128k-instruct"
208
+ MICROSOFT_PHI_3_MEDIUM_4K_INSTRUCT = "microsoft/phi-3-medium-4k-instruct"
209
+ MICROSOFT_PHI_3_MINI_128K_INSTRUCT = "microsoft/phi-3-mini-128k-instruct"
210
+ MICROSOFT_PHI_3_MINI_4K_INSTRUCT = "microsoft/phi-3-mini-4k-instruct"
211
+ MICROSOFT_PHI_3_SMALL_128K_INSTRUCT = "microsoft/phi-3-small-128k-instruct"
212
+ MICROSOFT_PHI_3_SMALL_8K_INSTRUCT = "microsoft/phi-3-small-8k-instruct"
213
+ MICROSOFT_PHI_3_VISION_128K_INSTRUCT = "microsoft/phi-3-vision-128k-instruct"
214
+ MICROSOFT_PHI_4_MINI_FLASH_REASONING = "microsoft/phi-4-mini-flash-reasoning"
215
+ MICROSOFT_PHI_4_MINI_INSTRUCT = "microsoft/phi-4-mini-instruct"
216
+ MICROSOFT_PHI_4_MULTIMODAL_INSTRUCT = "microsoft/phi-4-multimodal-instruct"
217
+ MINIMAXAI_MINIMAX_M2 = "minimaxai/minimax-m2"
218
+ MINIMAXAI_MINIMAX_M2_1 = "minimaxai/minimax-m2.1"
219
+ MISTRALAI_CODESTRAL_22B_INSTRUCT_V0_1 = "mistralai/codestral-22b-instruct-v0.1"
220
+ MISTRALAI_DEVSTRAL_2_123B_INSTRUCT_2512 = "mistralai/devstral-2-123b-instruct-2512"
221
+ MISTRALAI_MAGISTRAL_SMALL_2506 = "mistralai/magistral-small-2506"
222
+ MISTRALAI_MAMBA_CODESTRAL_7B_V0_1 = "mistralai/mamba-codestral-7b-v0.1"
223
+ MISTRALAI_MATHSTRAL_7B_V0_1 = "mistralai/mathstral-7b-v0.1"
224
+ MISTRALAI_MINISTRAL_14B_INSTRUCT_2512 = "mistralai/ministral-14b-instruct-2512"
225
+ MISTRALAI_MISTRAL_7B_INSTRUCT_V0_2 = "mistralai/mistral-7b-instruct-v0.2"
226
+ MISTRALAI_MISTRAL_7B_INSTRUCT_V0_3 = "mistralai/mistral-7b-instruct-v0.3"
227
+ MISTRALAI_MISTRAL_LARGE = "mistralai/mistral-large"
228
+ MISTRALAI_MISTRAL_LARGE_2_INSTRUCT = "mistralai/mistral-large-2-instruct"
229
+ MISTRALAI_MISTRAL_LARGE_3_675B_INSTRUCT_2512 = "mistralai/mistral-large-3-675b-instruct-2512"
230
+ MISTRALAI_MISTRAL_MEDIUM_3_INSTRUCT = "mistralai/mistral-medium-3-instruct"
231
+ MISTRALAI_MISTRAL_NEMOTRON = "mistralai/mistral-nemotron"
232
+ MISTRALAI_MISTRAL_SMALL_24B_INSTRUCT = "mistralai/mistral-small-24b-instruct"
233
+ MISTRALAI_MISTRAL_SMALL_3_1_24B_INSTRUCT_2503 = "mistralai/mistral-small-3.1-24b-instruct-2503"
234
+ MISTRALAI_MIXTRAL_8X22B_INSTRUCT_V0_1 = "mistralai/mixtral-8x22b-instruct-v0.1"
235
+ MISTRALAI_MIXTRAL_8X22B_V0_1 = "mistralai/mixtral-8x22b-v0.1"
236
+ MISTRALAI_MIXTRAL_8X7B_INSTRUCT_V0_1 = "mistralai/mixtral-8x7b-instruct-v0.1"
237
+ # MODEL_01_AI_YI_LARGE = "01-ai/yi-large"
238
+ MOONSHOTAI_KIMI_K2_5 = "moonshotai/kimi-k2.5"
239
+ MOONSHOTAI_KIMI_K2_INSTRUCT = "moonshotai/kimi-k2-instruct"
240
+ MOONSHOTAI_KIMI_K2_INSTRUCT_0905 = "moonshotai/kimi-k2-instruct-0905"
241
+ MOONSHOTAI_KIMI_K2_THINKING = "moonshotai/kimi-k2-thinking"
242
+ NVIDIA_COSMOS_REASON2_8B = "nvidia/cosmos-reason2-8b"
243
+ NVIDIA_EMBED_QA_4 = "nvidia/embed-qa-4"
244
+ NVIDIA_LLAMA3_CHATQA_1_5_70B = "nvidia/llama3-chatqa-1.5-70b"
245
+ NVIDIA_LLAMA3_CHATQA_1_5_8B = "nvidia/llama3-chatqa-1.5-8b"
246
+ NVIDIA_LLAMA_3_1_NEMOGUARD_8B_CONTENT_SAFETY = "nvidia/llama-3.1-nemoguard-8b-content-safety"
247
+ NVIDIA_LLAMA_3_1_NEMOGUARD_8B_TOPIC_CONTROL = "nvidia/llama-3.1-nemoguard-8b-topic-control"
248
+ NVIDIA_LLAMA_3_1_NEMOTRON_51B_INSTRUCT = "nvidia/llama-3.1-nemotron-51b-instruct"
249
+ NVIDIA_LLAMA_3_1_NEMOTRON_70B_INSTRUCT = "nvidia/llama-3.1-nemotron-70b-instruct"
250
+ NVIDIA_LLAMA_3_1_NEMOTRON_70B_REWARD = "nvidia/llama-3.1-nemotron-70b-reward"
251
+ NVIDIA_LLAMA_3_1_NEMOTRON_NANO_4B_V1_1 = "nvidia/llama-3.1-nemotron-nano-4b-v1.1"
252
+ NVIDIA_LLAMA_3_1_NEMOTRON_NANO_8B_V1 = "nvidia/llama-3.1-nemotron-nano-8b-v1"
253
+ NVIDIA_LLAMA_3_1_NEMOTRON_NANO_VL_8B_V1 = "nvidia/llama-3.1-nemotron-nano-vl-8b-v1"
254
+ NVIDIA_LLAMA_3_1_NEMOTRON_SAFETY_GUARD_8B_V3 = "nvidia/llama-3.1-nemotron-safety-guard-8b-v3"
255
+ NVIDIA_LLAMA_3_1_NEMOTRON_ULTRA_253B_V1 = "nvidia/llama-3.1-nemotron-ultra-253b-v1"
256
+ NVIDIA_LLAMA_3_2_NEMORETRIEVER_1B_VLM_EMBED_V1 = "nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1"
257
+ NVIDIA_LLAMA_3_2_NEMORETRIEVER_300M_EMBED_V1 = "nvidia/llama-3.2-nemoretriever-300m-embed-v1"
258
+ NVIDIA_LLAMA_3_2_NEMORETRIEVER_300M_EMBED_V2 = "nvidia/llama-3.2-nemoretriever-300m-embed-v2"
259
+ NVIDIA_LLAMA_3_2_NV_EMBEDQA_1B_V1 = "nvidia/llama-3.2-nv-embedqa-1b-v1"
260
+ NVIDIA_LLAMA_3_2_NV_EMBEDQA_1B_V2 = "nvidia/llama-3.2-nv-embedqa-1b-v2"
261
+ NVIDIA_LLAMA_3_3_NEMOTRON_SUPER_49B_V1 = "nvidia/llama-3.3-nemotron-super-49b-v1"
262
+ NVIDIA_LLAMA_3_3_NEMOTRON_SUPER_49B_V1_5 = "nvidia/llama-3.3-nemotron-super-49b-v1.5"
263
+ NVIDIA_LLAMA_NEMOTRON_EMBED_VL_1B_V2 = "nvidia/llama-nemotron-embed-vl-1b-v2"
264
+ NVIDIA_MISTRAL_NEMO_MINITRON_8B_8K_INSTRUCT = "nvidia/mistral-nemo-minitron-8b-8k-instruct"
265
+ NVIDIA_MISTRAL_NEMO_MINITRON_8B_BASE = "nvidia/mistral-nemo-minitron-8b-base"
266
+ NVIDIA_NEMORETRIEVER_PARSE = "nvidia/nemoretriever-parse"
267
+ NVIDIA_NEMOTRON_3_NANO_30B_A3B = "nvidia/nemotron-3-nano-30b-a3b"
268
+ NVIDIA_NEMOTRON_4_340B_INSTRUCT = "nvidia/nemotron-4-340b-instruct"
269
+ NVIDIA_NEMOTRON_4_340B_REWARD = "nvidia/nemotron-4-340b-reward"
270
+ NVIDIA_NEMOTRON_4_MINI_HINDI_4B_INSTRUCT = "nvidia/nemotron-4-mini-hindi-4b-instruct"
271
+ NVIDIA_NEMOTRON_CONTENT_SAFETY_REASONING_4B = "nvidia/nemotron-content-safety-reasoning-4b"
272
+ NVIDIA_NEMOTRON_MINI_4B_INSTRUCT = "nvidia/nemotron-mini-4b-instruct"
273
+ NVIDIA_NEMOTRON_NANO_12B_V2_VL = "nvidia/nemotron-nano-12b-v2-vl"
274
+ NVIDIA_NEMOTRON_NANO_3_30B_A3B = "nvidia/nemotron-nano-3-30b-a3b"
275
+ NVIDIA_NEMOTRON_PARSE = "nvidia/nemotron-parse"
276
+ NVIDIA_NEVA_22B = "nvidia/neva-22b"
277
+ NVIDIA_NVCLIP = "nvidia/nvclip"
278
+ NVIDIA_NVIDIA_NEMOTRON_NANO_9B_V2 = "nvidia/nvidia-nemotron-nano-9b-v2"
279
+ NVIDIA_NV_EMBEDCODE_7B_V1 = "nvidia/nv-embedcode-7b-v1"
280
+ NVIDIA_NV_EMBEDQA_E5_V5 = "nvidia/nv-embedqa-e5-v5"
281
+ NVIDIA_NV_EMBEDQA_MISTRAL_7B_V2 = "nvidia/nv-embedqa-mistral-7b-v2"
282
+ NVIDIA_NV_EMBED_V1 = "nvidia/nv-embed-v1"
283
+ NVIDIA_RIVA_TRANSLATE_4B_INSTRUCT = "nvidia/riva-translate-4b-instruct"
284
+ NVIDIA_RIVA_TRANSLATE_4B_INSTRUCT_V1_1 = "nvidia/riva-translate-4b-instruct-v1.1"
285
+ NVIDIA_STREAMPETR = "nvidia/streampetr"
286
+ NVIDIA_USDCODE_LLAMA_3_1_70B_INSTRUCT = "nvidia/usdcode-llama-3.1-70b-instruct"
287
+ NVIDIA_VILA = "nvidia/vila"
288
+ NV_MISTRALAI_MISTRAL_NEMO_12B_INSTRUCT = "nv-mistralai/mistral-nemo-12b-instruct"
289
+ OPENAI_GPT_OSS_120B = "openai/gpt-oss-120b"
290
+ OPENAI_GPT_OSS_20B = "openai/gpt-oss-20b"
291
+ OPENGPT_X_TEUKEN_7B_INSTRUCT_COMMERCIAL_V0_4 = "opengpt-x/teuken-7b-instruct-commercial-v0.4"
292
+ QWEN_QWEN2_5_7B_INSTRUCT = "qwen/qwen2.5-7b-instruct"
293
+ QWEN_QWEN2_5_CODER_32B_INSTRUCT = "qwen/qwen2.5-coder-32b-instruct"
294
+ QWEN_QWEN2_5_CODER_7B_INSTRUCT = "qwen/qwen2.5-coder-7b-instruct"
295
+ QWEN_QWEN2_7B_INSTRUCT = "qwen/qwen2-7b-instruct"
296
+ QWEN_QWEN3_235B_A22B = "qwen/qwen3-235b-a22b"
297
+ QWEN_QWEN3_CODER_480B_A35B_INSTRUCT = "qwen/qwen3-coder-480b-a35b-instruct"
298
+ QWEN_QWEN3_NEXT_80B_A3B_INSTRUCT = "qwen/qwen3-next-80b-a3b-instruct"
299
+ QWEN_QWEN3_NEXT_80B_A3B_THINKING = "qwen/qwen3-next-80b-a3b-thinking"
300
+ QWEN_QWQ_32B = "qwen/qwq-32b"
301
+ RAKUTEN_RAKUTENAI_7B_CHAT = "rakuten/rakutenai-7b-chat"
302
+ RAKUTEN_RAKUTENAI_7B_INSTRUCT = "rakuten/rakutenai-7b-instruct"
303
+ SARVAMAI_SARVAM_M = "sarvamai/sarvam-m"
304
+ SNOWFLAKE_ARCTIC_EMBED_L = "snowflake/arctic-embed-l"
305
+ SPEAKLEASH_BIELIK_11B_V2_3_INSTRUCT = "speakleash/bielik-11b-v2.3-instruct"
306
+ SPEAKLEASH_BIELIK_11B_V2_6_INSTRUCT = "speakleash/bielik-11b-v2.6-instruct"
307
+ STEPFUN_AI_STEP_3_5_FLASH = "stepfun-ai/step-3.5-flash"
308
+ STOCKMARK_STOCKMARK_2_100B_INSTRUCT = "stockmark/stockmark-2-100b-instruct"
309
+ THUDM_CHATGLM3_6B = "thudm/chatglm3-6b"
310
+ TIIUAE_FALCON3_7B_INSTRUCT = "tiiuae/falcon3-7b-instruct"
311
+ TOKYOTECH_LLM_LLAMA_3_SWALLOW_70B_INSTRUCT_V0_1 = "tokyotech-llm/llama-3-swallow-70b-instruct-v0.1"
312
+ UPSTAGE_SOLAR_10_7B_INSTRUCT = "upstage/solar-10.7b-instruct"
313
+ UTTER_PROJECT_EUROLLM_9B_INSTRUCT = "utter-project/eurollm-9b-instruct"
314
+ WRITER_PALMYRA_CREATIVE_122B = "writer/palmyra-creative-122b"
315
+ WRITER_PALMYRA_FIN_70B_32K = "writer/palmyra-fin-70b-32k"
316
+ WRITER_PALMYRA_MED_70B = "writer/palmyra-med-70b"
317
+ WRITER_PALMYRA_MED_70B_32K = "writer/palmyra-med-70b-32k"
318
+ YENTINGLIN_LLAMA_3_TAIWAN_70B_INSTRUCT = "yentinglin/llama-3-taiwan-70b-instruct"
319
+ ZYPHRA_ZAMBA2_7B_INSTRUCT = "zyphra/zamba2-7b-instruct"
320
+ Z_AI_GLM4_7 = "z-ai/glm4.7"
321
+
322
  class OpenAICompatibleName(StrEnum):
323
  """https://platform.openai.com/docs/guides/text-generation"""
324
 
 
343
  | AWSModelName
344
  | OllamaModelName
345
  | OpenRouterModelName
346
+ | NvidiaModelName
347
  | FakeModelName
348
  )
349
 
350
+ class NvidiaEmbeddingModelName(StrEnum):
351
+ """https://build.nvidia.com/explore/discover"""
352
+
353
+ NV_EMBEDQA_MISTRAL_7B_V2 = "nvidia/nv-embedqa-mistral-7b-v2"
354
+ NV_EMBEDQA_E5_V5 = "nvidia/nv-embedqa-e5-v5"
355
+
356
+
357
  AllEmbeddingModelEnum: TypeAlias = (
358
  OpenAIEmbeddingModelName
359
  | GoogleEmbeddingModelName
360
  | OllamaEmbeddingModelName
361
+ | NvidiaEmbeddingModelName
362
  )
src/schema/schema.py CHANGED
@@ -112,6 +112,10 @@ class ChatMessage(BaseModel):
112
  default=None,
113
  examples=["call_Jja7J89XsjrOLA5r!MEOW!SL"],
114
  )
 
 
 
 
115
  run_id: str | None = Field(
116
  description="Run ID of the message.",
117
  default=None,
@@ -166,6 +170,21 @@ class FeedbackResponse(BaseModel):
166
  status: Literal["success"] = "success"
167
 
168
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169
  class ChatHistoryInput(BaseModel):
170
  """Input for retrieving chat history."""
171
 
@@ -177,12 +196,45 @@ class ChatHistoryInput(BaseModel):
177
  description="Thread ID to persist and continue a multi-turn conversation.",
178
  examples=["847c6285-8fc9-4560-a83f-4e6285809254"],
179
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
180
 
181
 
182
  class ChatHistory(BaseModel):
 
 
183
  messages: list[ChatMessage]
184
 
185
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
  class ThreadSummary(BaseModel):
187
  """Summary of a conversation thread for listing."""
188
 
 
112
  default=None,
113
  examples=["call_Jja7J89XsjrOLA5r!MEOW!SL"],
114
  )
115
+ name: str | None = Field(
116
+ description="Tool name for tool messages (type='tool'). Enables UI to show which tool produced the result.",
117
+ default=None,
118
+ )
119
  run_id: str | None = Field(
120
  description="Run ID of the message.",
121
  default=None,
 
170
  status: Literal["success"] = "success"
171
 
172
 
173
+ class ChatMessagePreview(BaseModel):
174
+ """Minimal message for preview/list views (type, content snippet, id)."""
175
+
176
+ type: Literal["human", "ai", "tool", "custom"] = Field(
177
+ description="Role of the message.",
178
+ )
179
+ content: str = Field(
180
+ description="Content of the message (may be truncated for preview).",
181
+ )
182
+ id: str | None = Field(
183
+ default=None,
184
+ description="Stable id for cursor/linking (e.g. index).",
185
+ )
186
+
187
+
188
  class ChatHistoryInput(BaseModel):
189
  """Input for retrieving chat history."""
190
 
 
196
  description="Thread ID to persist and continue a multi-turn conversation.",
197
  examples=["847c6285-8fc9-4560-a83f-4e6285809254"],
198
  )
199
+ limit: int = Field(
200
+ default=50,
201
+ ge=1,
202
+ le=200,
203
+ description="Max number of messages to return per page.",
204
+ )
205
+ cursor: str | None = Field(
206
+ default=None,
207
+ description="Opaque cursor for pagination (older messages).",
208
+ )
209
+ view: Literal["full", "preview"] = Field(
210
+ default="full",
211
+ description="full = all fields; preview = type, content (truncated), id only.",
212
+ )
213
 
214
 
215
  class ChatHistory(BaseModel):
216
+ """Legacy response: messages only (no cursors)."""
217
+
218
  messages: list[ChatMessage]
219
 
220
 
221
+ class ChatHistoryResponse(BaseModel):
222
+ """Paginated chat history with cursors."""
223
+
224
+ messages: list[ChatMessage] | list[ChatMessagePreview] = Field(
225
+ default_factory=list,
226
+ description="Messages in this page (full or preview by view).",
227
+ )
228
+ next_cursor: str | None = Field(
229
+ default=None,
230
+ description="Cursor for next page (older messages).",
231
+ )
232
+ prev_cursor: str | None = Field(
233
+ default=None,
234
+ description="Cursor for previous page (newer messages).",
235
+ )
236
+
237
+
238
  class ThreadSummary(BaseModel):
239
  """Summary of a conversation thread for listing."""
240
 
src/service/agent_service.py CHANGED
@@ -4,12 +4,17 @@ import json
4
  import logging
5
  import re
6
  from collections.abc import AsyncGenerator
7
- from datetime import datetime, timezone
8
  from typing import Any
9
  from uuid import UUID
10
 
11
  from fastapi import HTTPException
12
- from langchain_core.messages import AIMessage, AIMessageChunk, AnyMessage, HumanMessage, ToolMessage
 
 
 
 
 
 
13
  from langchain_core.runnables import RunnableConfig
14
  from langfuse.langchain import CallbackHandler
15
  from langgraph.types import Command, Interrupt
@@ -21,8 +26,6 @@ from agents.middlewares import FollowUpMiddleware, SafetyMiddleware, UNSAFE_RESP
21
  from agents.llama_guard import SafetyAssessment
22
  from core import settings
23
  from schema import (
24
- ChatHistory,
25
- ChatHistoryInput,
26
  ChatMessage,
27
  Feedback,
28
  FeedbackResponse,
@@ -31,6 +34,7 @@ from schema import (
31
  ThreadSummary,
32
  UserInput,
33
  )
 
34
  from service.utils import (
35
  convert_message_content_to_string,
36
  langchain_to_chat_message,
@@ -121,6 +125,9 @@ async def invoke_agent(user_input: UserInput, agent_id: str = DEFAULT_AGENT) ->
121
  agent: AgentGraph = get_agent(agent_id)
122
  kwargs, run_id = await _handle_input(user_input, agent, agent_id)
123
 
 
 
 
124
  # ── Input safety guard ──────────────────────────────────────────
125
  input_check = await safety.check_input([HumanMessage(content=user_input.message)])
126
  if input_check.safety_assessment == SafetyAssessment.UNSAFE:
@@ -173,6 +180,9 @@ async def message_generator(
173
  agent: AgentGraph = get_agent(agent_id)
174
  kwargs, run_id = await _handle_input(user_input, agent, agent_id)
175
 
 
 
 
176
  # ── Input safety guard ──────────────────────────────────────────────
177
  input_check = await safety.check_input([HumanMessage(content=user_input.message)])
178
  if input_check.safety_assessment == SafetyAssessment.UNSAFE:
@@ -304,48 +314,6 @@ def _checkpoint_thread_id(user_id: str, thread_id: str) -> str:
304
  return f"{user_id}:{thread_id}"
305
 
306
 
307
- async def get_history(input: ChatHistoryInput) -> ChatHistory:
308
- if not (input.user_id or "").strip():
309
- raise HTTPException(
310
- status_code=422,
311
- detail="user_id is required and must be non-empty",
312
- )
313
- # TODO: Hard-coding DEFAULT_AGENT here is wonky
314
- agent: AgentGraph = get_agent(DEFAULT_AGENT)
315
- checkpoint_thread_id = _checkpoint_thread_id(input.user_id, input.thread_id)
316
- try:
317
- state_snapshot = await agent.aget_state(
318
- config=RunnableConfig(
319
- configurable={
320
- "thread_id": checkpoint_thread_id,
321
- "user_id": input.user_id,
322
- },
323
- metadata={
324
- "thread_id": input.thread_id,
325
- "user_id": input.user_id,
326
- "agent_id": DEFAULT_AGENT,
327
- },
328
- )
329
- )
330
- messages: list[AnyMessage] = state_snapshot.values["messages"]
331
- chat_messages: list[ChatMessage] = [langchain_to_chat_message(m) for m in messages]
332
- return ChatHistory(messages=chat_messages)
333
- except Exception as e:
334
- logger.error(f"An exception occurred: {e}")
335
- raise HTTPException(status_code=500, detail="Unexpected error")
336
-
337
-
338
- def _iso_ts(dt: datetime | str | None) -> str | None:
339
- """Format datetime as ISO 8601 or return None."""
340
- if dt is None:
341
- return None
342
- if isinstance(dt, str):
343
- return dt
344
- if dt.tzinfo is None:
345
- dt = dt.replace(tzinfo=timezone.utc)
346
- return dt.isoformat()
347
-
348
-
349
  async def list_threads(
350
  user_id: str,
351
  *,
@@ -354,9 +322,8 @@ async def list_threads(
354
  search: str | None = None,
355
  ) -> ThreadList:
356
  """
357
- List thread IDs for a user by querying the checkpointer storage with prefix user_id:.
358
- Returns logical thread_ids (without the user prefix), with updated_at when available.
359
- Supports pagination (offset/limit) and optional search filter.
360
  """
361
  if not (user_id or "").strip():
362
  return ThreadList(threads=[], total=0)
@@ -365,198 +332,68 @@ async def list_threads(
365
  if checkpointer is None:
366
  return ThreadList(threads=[], total=0)
367
  prefix = f"{user_id}:"
368
- # List of (logical_id, updated_at_iso | None)
369
- rows: list[tuple[str, str | None]] = []
370
  try:
371
- # MongoDB: aggregation for thread_id + max timestamp
372
  if hasattr(checkpointer, "checkpoint_collection"):
373
- coll = checkpointer.checkpoint_collection
374
- try:
375
- pipeline = [
376
- {"$match": {"thread_id": {"$regex": f"^{re.escape(user_id)}:"}}},
377
- {
378
- "$group": {
379
- "_id": "$thread_id",
380
- "ts": {"$max": {"$ifNull": ["$ts", "$updated_at", "$created_at"]}},
381
- }
382
- },
383
- ]
384
- async for doc in coll.aggregate(pipeline):
385
- tid = doc.get("_id")
386
- if not isinstance(tid, str) or not tid.startswith(prefix):
387
- continue
388
- logical = tid[len(prefix) :]
389
- ts = doc.get("ts")
390
- rows.append((logical, _iso_ts(ts) if ts else None))
391
- except Exception as mongo_err:
392
- logger.debug(
393
- "MongoDB aggregation failed, listing by thread_id only: %s",
394
- mongo_err,
395
- )
396
- raw_ids = await coll.distinct(
397
- "thread_id",
398
- {"thread_id": {"$regex": f"^{re.escape(user_id)}:"}},
399
- )
400
- for tid in raw_ids:
401
- if isinstance(tid, str) and tid.startswith(prefix):
402
- rows.append((tid[len(prefix) :], None))
403
- # Postgres: GROUP BY thread_id, MAX(ts)
404
  elif hasattr(checkpointer, "pool") or hasattr(checkpointer, "conn"):
405
- # AsyncPostgresSaver often uses .pool, but can have .conn
406
  pool = getattr(checkpointer, "pool", getattr(checkpointer, "conn", None))
407
- # If it's a pool, we need an async connection
408
- if hasattr(pool, "connection"):
409
- conn_ctx = pool.connection()
410
- else:
411
- conn_ctx = pool # Assume it's already a connection or manages its own
412
-
413
  async with conn_ctx as conn:
414
  async with conn.cursor() as cur:
415
- try:
416
- # Try various common timestamp column names
417
- await cur.execute(
418
- """
419
- SELECT thread_id, MAX(COALESCE(ts, updated_at, created_at)) AS ts
420
- FROM checkpoints
421
- WHERE thread_id LIKE %s
422
- GROUP BY thread_id
423
- """,
424
- (prefix + "%",),
425
- )
426
- except Exception as pg_err:
427
- logger.debug(
428
- "Postgres MAX(ts) failed, trying metadata: %s",
429
- pg_err,
430
- )
431
- try:
432
- await cur.execute(
433
- """
434
- SELECT thread_id, MAX(created_at) AS ts
435
- FROM checkpoint_metadata
436
- WHERE thread_id LIKE %s
437
- GROUP BY thread_id
438
- """,
439
- (prefix + "%",),
440
- )
441
- except Exception:
442
- await cur.execute(
443
- "SELECT DISTINCT thread_id FROM checkpoints WHERE thread_id LIKE %s",
444
- (prefix + "%",),
445
- )
446
- for row in await cur.fetchall():
447
- raw = (
448
- row.get("thread_id")
449
- if isinstance(row, dict)
450
- else (row[0] if row else None)
451
- )
452
- if isinstance(raw, str) and raw.startswith(prefix):
453
- rows.append((raw[len(prefix) :], None))
454
- else:
455
- for row in await cur.fetchall():
456
- raw = row.get("thread_id") if isinstance(row, dict) else row[0]
457
- ts_val = row.get("ts") if isinstance(row, dict) else row[1]
458
- if isinstance(raw, str) and raw.startswith(prefix):
459
- rows.append((raw[len(prefix) :], _iso_ts(ts_val)))
460
- else:
461
- for row in await cur.fetchall():
462
- raw = (
463
- row.get("thread_id")
464
- if isinstance(row, dict)
465
- else (row[0] if row else None)
466
- )
467
- ts_val = (
468
- row.get("ts")
469
- if isinstance(row, dict)
470
- else (
471
- row[1]
472
- if isinstance(row, (list, tuple)) and len(row) > 1
473
- else None
474
- )
475
- )
476
- if isinstance(raw, str) and raw.startswith(prefix):
477
- rows.append((raw[len(prefix) :], _iso_ts(ts_val)))
478
- # SQLite: GROUP BY thread_id, MAX(ts)
479
  elif hasattr(checkpointer, "conn"):
480
- conn = checkpointer.conn
481
- try:
482
- cursor = await conn.execute(
483
- """
484
- SELECT thread_id, MAX(COALESCE(ts, updated_at, created_at)) AS ts
485
- FROM checkpoints
486
- WHERE thread_id LIKE ?
487
- GROUP BY thread_id
488
- """,
489
- (prefix + "%",),
490
- )
491
- for row in await cursor.fetchall():
492
- raw = row[0] if isinstance(row, (list, tuple)) else row
493
- ts_val = row[1] if isinstance(row, (list, tuple)) and len(row) > 1 else None
494
- if isinstance(raw, str) and raw.startswith(prefix):
495
- rows.append((raw[len(prefix) :], _iso_ts(ts_val)))
496
- except Exception as sqlite_err:
497
- logger.debug("SQLite MAX(ts) failed, listing by thread_id only: %s", sqlite_err)
498
- cursor = await conn.execute(
499
- "SELECT DISTINCT thread_id FROM checkpoints WHERE thread_id LIKE ?",
500
- (prefix + "%",),
501
- )
502
- for row in await cursor.fetchall():
503
- raw = row[0] if isinstance(row, (list, tuple)) else row
504
- if isinstance(raw, str) and raw.startswith(prefix):
505
- rows.append((raw[len(prefix) :], None))
506
  else:
507
- logger.warning("Unknown checkpointer type; cannot list threads by prefix")
508
  except Exception as e:
509
- logger.error(f"Error listing threads for user: {e}")
510
  raise HTTPException(status_code=500, detail="Failed to list threads") from e
511
 
512
- # Sort by updated_at desc (None last), then by thread_id
513
- def _sort_key(item: tuple[str, str | None]) -> tuple[bool, str, str]:
514
- logical, ts = item
515
- # desc sort: flip characters in timestamp if it exists
516
- return (ts is None, (ts or "")[::-1], logical)
517
-
518
- rows.sort(key=_sort_key)
519
 
520
- # Filter by search
521
  search_clean = (search or "").strip().lower()
522
  if search_clean:
523
- rows = [(logical, ts) for logical, ts in rows if search_clean in logical.lower()]
524
 
525
- total = len(rows)
526
- rows = rows[offset : offset + limit]
527
 
528
- # Fetch additional details (preview, precise timestamp) for the requested page in parallel
529
- async def get_thread_summary(logical_id: str, existing_ts: str | None) -> ThreadSummary:
530
  try:
531
  config = RunnableConfig(
532
- configurable={
533
- "thread_id": f"{user_id}:{logical_id}",
534
- "user_id": user_id,
535
- }
536
  )
537
  state = await agent.aget_state(config)
538
- preview = None
539
- ts = existing_ts
540
-
541
- if state.values and "messages" in state.values:
542
- msgs = state.values["messages"]
543
- if msgs:
544
- # Get the last non-custom message for better preview
545
- last_msg = msgs[-1]
546
- preview = convert_message_content_to_string(last_msg.content)
547
- if preview and len(preview) > 120:
548
- preview = preview[:117] + "..."
549
-
550
- # If we don't have a timestamp, try to get it from state metadata
551
- if not ts and state.metadata:
552
- m_ts = state.metadata.get("ts") or state.metadata.get("created_at")
553
- if m_ts:
554
- ts = _iso_ts(m_ts)
555
-
556
- return ThreadSummary(thread_id=logical_id, updated_at=ts, preview=preview)
557
  except Exception as e:
558
- logger.warning(f"Failed to fetch state for thread {logical_id}: {e}")
559
- return ThreadSummary(thread_id=logical_id, updated_at=existing_ts, preview=None)
560
 
561
- summaries = await asyncio.gather(*[get_thread_summary(logical, ts) for logical, ts in rows])
562
- return ThreadList(threads=list(summaries), total=total)
 
4
  import logging
5
  import re
6
  from collections.abc import AsyncGenerator
 
7
  from typing import Any
8
  from uuid import UUID
9
 
10
  from fastapi import HTTPException
11
+ from langchain_core.messages import (
12
+ AIMessage,
13
+ AIMessageChunk,
14
+ AnyMessage,
15
+ HumanMessage,
16
+ ToolMessage,
17
+ )
18
  from langchain_core.runnables import RunnableConfig
19
  from langfuse.langchain import CallbackHandler
20
  from langgraph.types import Command, Interrupt
 
26
  from agents.llama_guard import SafetyAssessment
27
  from core import settings
28
  from schema import (
 
 
29
  ChatMessage,
30
  Feedback,
31
  FeedbackResponse,
 
34
  ThreadSummary,
35
  UserInput,
36
  )
37
+ from service import history_service
38
  from service.utils import (
39
  convert_message_content_to_string,
40
  langchain_to_chat_message,
 
125
  agent: AgentGraph = get_agent(agent_id)
126
  kwargs, run_id = await _handle_input(user_input, agent, agent_id)
127
 
128
+ if user_input.user_id and user_input.thread_id:
129
+ history_service.invalidate_history(user_input.user_id, user_input.thread_id)
130
+
131
  # ── Input safety guard ──────────────────────────────────────────
132
  input_check = await safety.check_input([HumanMessage(content=user_input.message)])
133
  if input_check.safety_assessment == SafetyAssessment.UNSAFE:
 
180
  agent: AgentGraph = get_agent(agent_id)
181
  kwargs, run_id = await _handle_input(user_input, agent, agent_id)
182
 
183
+ if user_input.user_id and user_input.thread_id:
184
+ history_service.invalidate_history(user_input.user_id, user_input.thread_id)
185
+
186
  # ── Input safety guard ──────────────────────────────────────────────
187
  input_check = await safety.check_input([HumanMessage(content=user_input.message)])
188
  if input_check.safety_assessment == SafetyAssessment.UNSAFE:
 
314
  return f"{user_id}:{thread_id}"
315
 
316
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
317
  async def list_threads(
318
  user_id: str,
319
  *,
 
322
  search: str | None = None,
323
  ) -> ThreadList:
324
  """
325
+ List thread IDs for a user. Returns logical thread_ids (without user prefix) with preview.
326
+ Supports pagination and optional search filter.
 
327
  """
328
  if not (user_id or "").strip():
329
  return ThreadList(threads=[], total=0)
 
332
  if checkpointer is None:
333
  return ThreadList(threads=[], total=0)
334
  prefix = f"{user_id}:"
335
+ logical_ids: list[str] = []
 
336
  try:
 
337
  if hasattr(checkpointer, "checkpoint_collection"):
338
+ raw_ids = await checkpointer.checkpoint_collection.distinct(
339
+ "thread_id",
340
+ {"thread_id": {"$regex": f"^{re.escape(user_id)}:"}},
341
+ )
342
+ for tid in raw_ids:
343
+ if isinstance(tid, str) and tid.startswith(prefix):
344
+ logical_ids.append(tid[len(prefix) :])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
345
  elif hasattr(checkpointer, "pool") or hasattr(checkpointer, "conn"):
 
346
  pool = getattr(checkpointer, "pool", getattr(checkpointer, "conn", None))
347
+ conn_ctx = pool.connection() if hasattr(pool, "connection") else pool
 
 
 
 
 
348
  async with conn_ctx as conn:
349
  async with conn.cursor() as cur:
350
+ await cur.execute(
351
+ "SELECT DISTINCT thread_id FROM checkpoints WHERE thread_id LIKE %s",
352
+ (prefix + "%",),
353
+ )
354
+ for row in await cur.fetchall():
355
+ raw = row.get("thread_id") if isinstance(row, dict) else (row[0] if row else None)
356
+ if isinstance(raw, str) and raw.startswith(prefix):
357
+ logical_ids.append(raw[len(prefix) :])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
358
  elif hasattr(checkpointer, "conn"):
359
+ cursor = await checkpointer.conn.execute(
360
+ "SELECT DISTINCT thread_id FROM checkpoints WHERE thread_id LIKE ?",
361
+ (prefix + "%",),
362
+ )
363
+ for row in await cursor.fetchall():
364
+ raw = row[0] if isinstance(row, (list, tuple)) else row
365
+ if isinstance(raw, str) and raw.startswith(prefix):
366
+ logical_ids.append(raw[len(prefix) :])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
367
  else:
368
+ logger.warning("Unknown checkpointer type; cannot list threads")
369
  except Exception as e:
370
+ logger.error("Error listing threads for user: %s", e)
371
  raise HTTPException(status_code=500, detail="Failed to list threads") from e
372
 
373
+ logical_ids.sort(reverse=True)
 
 
 
 
 
 
374
 
 
375
  search_clean = (search or "").strip().lower()
376
  if search_clean:
377
+ logical_ids = [tid for tid in logical_ids if search_clean in tid.lower()]
378
 
379
+ total = len(logical_ids)
380
+ page = logical_ids[offset : offset + limit]
381
 
382
+ async def get_preview(logical_id: str) -> ThreadSummary:
383
+ preview = None
384
  try:
385
  config = RunnableConfig(
386
+ configurable={"thread_id": f"{user_id}:{logical_id}", "user_id": user_id},
 
 
 
387
  )
388
  state = await agent.aget_state(config)
389
+ if state.values and "messages" in state.values and state.values["messages"]:
390
+ last_msg = state.values["messages"][-1]
391
+ preview = convert_message_content_to_string(last_msg.content)
392
+ if preview and len(preview) > 120:
393
+ preview = preview[:117] + "..."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
394
  except Exception as e:
395
+ logger.debug("Preview for thread %s: %s", logical_id, e)
396
+ return ThreadSummary(thread_id=logical_id, updated_at=None, preview=preview)
397
 
398
+ threads = await asyncio.gather(*[get_preview(tid) for tid in page])
399
+ return ThreadList(threads=list(threads), total=total)
src/service/dependencies.py CHANGED
@@ -1,10 +1,29 @@
1
- from typing import Annotated
 
2
 
3
- from fastapi import Depends, HTTPException, status
4
  from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
5
 
6
  from core import settings
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  def verify_bearer(
10
  http_auth: Annotated[
 
1
+ import logging
2
+ from typing import Annotated, Any
3
 
4
+ from fastapi import Depends, HTTPException, Request, status
5
  from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
6
 
7
  from core import settings
8
 
9
+ logger = logging.getLogger(__name__)
10
+
11
+
12
+ def get_checkpointer(request: Request) -> Any:
13
+ """Provide checkpointer from app state, or fall back to default agent's checkpointer."""
14
+ checkpointer = getattr(request.app.state, "checkpointer", None)
15
+ if checkpointer is not None:
16
+ return checkpointer
17
+ try:
18
+ from agents import DEFAULT_AGENT, get_agent
19
+ agent = get_agent(DEFAULT_AGENT)
20
+ checkpointer = getattr(agent, "checkpointer", None)
21
+ if checkpointer is not None:
22
+ return checkpointer
23
+ except Exception as e:
24
+ logger.debug("Fallback checkpointer from agent failed: %s", e)
25
+ return None
26
+
27
 
28
  def verify_bearer(
29
  http_auth: Annotated[
src/service/history_service.py ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Chat history service: load messages for a thread from the checkpointer only.
3
+ Single responsibility; no agent dependency. Supports cursor pagination,
4
+ selective view (preview/full), and TTL cache.
5
+ """
6
+
7
+ import logging
8
+ import threading
9
+ import time
10
+ from typing import Any
11
+
12
+ from fastapi import HTTPException
13
+ from langchain_core.messages import AnyMessage, messages_from_dict
14
+ from langchain_core.runnables import RunnableConfig
15
+
16
+ from schema import (
17
+ ChatHistoryInput,
18
+ ChatHistoryResponse,
19
+ ChatMessage,
20
+ ChatMessagePreview,
21
+ )
22
+ from service.utils import langchain_to_chat_message, message_to_preview
23
+
24
+ logger = logging.getLogger(__name__)
25
+
26
+ # TTL in seconds for cached message lists
27
+ HISTORY_CACHE_TTL = 60
28
+
29
+ # In-memory cache: (user_id, thread_id) -> (list[AnyMessage], expiry_ts)
30
+ _history_cache: dict[tuple[str, str], tuple[list[AnyMessage], float]] = {}
31
+ _cache_lock = threading.Lock()
32
+
33
+
34
+ def _checkpoint_thread_id(user_id: str, thread_id: str) -> str:
35
+ """Composite key so checkpoints are user-scoped."""
36
+ return f"{user_id}:{thread_id}"
37
+
38
+
39
+ def _raw_messages_from_checkpoint(tuple_result: Any) -> list[AnyMessage]:
40
+ """
41
+ Extract and deserialize messages from a checkpoint tuple.
42
+ Single path: handles both message objects and serialized dicts.
43
+ Checkpoint is dict-like; use .get() for channel_values.
44
+ """
45
+ if not tuple_result or not tuple_result.checkpoint:
46
+ return []
47
+ checkpoint = tuple_result.checkpoint
48
+ # Checkpoint is a dict subclass in LangGraph
49
+ channel_values = checkpoint.get("channel_values", {}) if isinstance(checkpoint, dict) else getattr(checkpoint, "channel_values", {})
50
+ if not channel_values:
51
+ return []
52
+ raw = channel_values.get("messages", []) if isinstance(channel_values, dict) else getattr(channel_values, "get", lambda k, default=None: default)("messages") or []
53
+ if not raw:
54
+ return []
55
+ raw = list(raw)
56
+ if not raw:
57
+ return []
58
+ if isinstance(raw[0], dict):
59
+ return list(messages_from_dict(raw))
60
+ return list(raw)
61
+
62
+
63
+ def _get_cached_messages(user_id: str, thread_id: str) -> list[AnyMessage] | None:
64
+ """Return cached message list if present and not expired."""
65
+ key = (user_id.strip(), thread_id.strip())
66
+ with _cache_lock:
67
+ entry = _history_cache.get(key)
68
+ if not entry:
69
+ return None
70
+ messages, expiry = entry
71
+ if time.monotonic() > expiry:
72
+ del _history_cache[key]
73
+ return None
74
+ return messages
75
+
76
+
77
+ def _set_cached_messages(user_id: str, thread_id: str, messages: list[AnyMessage]) -> None:
78
+ """Store message list in cache with TTL."""
79
+ key = (user_id.strip(), thread_id.strip())
80
+ expiry = time.monotonic() + HISTORY_CACHE_TTL
81
+ with _cache_lock:
82
+ _history_cache[key] = (messages, expiry)
83
+
84
+
85
+ def invalidate_history(user_id: str, thread_id: str) -> None:
86
+ """Invalidate cache for this thread (call after writing to the thread)."""
87
+ key = (user_id or "").strip(), (thread_id or "").strip()
88
+ with _cache_lock:
89
+ _history_cache.pop(key, None)
90
+
91
+
92
+ async def get_history(checkpointer: Any, input: ChatHistoryInput) -> ChatHistoryResponse:
93
+ """
94
+ Load chat history for (user_id, thread_id) with optional pagination and view.
95
+ Depends only on checkpointer; no agent.
96
+ """
97
+ user_id = (input.user_id or "").strip()
98
+ thread_id = (input.thread_id or "").strip()
99
+ if not user_id:
100
+ raise HTTPException(
101
+ status_code=422,
102
+ detail="user_id is required and must be non-empty",
103
+ )
104
+
105
+ if checkpointer is None:
106
+ logger.warning("History: no checkpointer available (app.state or agent)")
107
+ return ChatHistoryResponse(messages=[], next_cursor=None, prev_cursor=None)
108
+
109
+ # Try cache first
110
+ messages = _get_cached_messages(user_id, thread_id)
111
+ if messages is None:
112
+ checkpoint_thread_id = _checkpoint_thread_id(user_id, thread_id)
113
+ config = RunnableConfig(
114
+ configurable={"thread_id": checkpoint_thread_id, "user_id": user_id},
115
+ )
116
+ try:
117
+ tuple_result = await checkpointer.aget_tuple(config)
118
+ messages = _raw_messages_from_checkpoint(tuple_result)
119
+ if not messages:
120
+ logger.debug(
121
+ "History: no messages for thread_id=%s (checkpoint missing or empty)",
122
+ checkpoint_thread_id,
123
+ )
124
+ _set_cached_messages(user_id, thread_id, messages)
125
+ except Exception as e:
126
+ logger.error("Chat history error: %s", e)
127
+ raise HTTPException(
128
+ status_code=500,
129
+ detail="Failed to load chat history",
130
+ ) from e
131
+
132
+ total = len(messages)
133
+ if total == 0:
134
+ return ChatHistoryResponse(messages=[], next_cursor=None, prev_cursor=None)
135
+
136
+ # Cursor pagination: cursor = exclusive end index of the window (older messages)
137
+ # First request: no cursor -> return latest [total-limit : total], next_cursor = total - limit
138
+ if input.cursor is None or input.cursor == "":
139
+ end_index = total
140
+ else:
141
+ try:
142
+ end_index = int(input.cursor)
143
+ except ValueError:
144
+ end_index = total
145
+ end_index = min(end_index, total)
146
+ start_index = max(0, end_index - input.limit)
147
+ window = messages[start_index:end_index]
148
+
149
+ next_cursor = str(start_index) if start_index > 0 else None
150
+ prev_cursor = str(end_index) if end_index < total else None
151
+
152
+ if input.view == "preview":
153
+ out_messages: list[ChatMessage] | list[ChatMessagePreview] = [
154
+ message_to_preview(m, start_index + i) for i, m in enumerate(window)
155
+ ]
156
+ else:
157
+ out_messages = [langchain_to_chat_message(m) for m in window]
158
+
159
+ # Return chronological (oldest first); UI scrolls to bottom so latest is visible
160
+ return ChatHistoryResponse(
161
+ messages=out_messages,
162
+ next_cursor=next_cursor,
163
+ prev_cursor=prev_cursor,
164
+ )
src/service/router.py CHANGED
@@ -1,13 +1,13 @@
1
- from typing import Any
2
 
3
- from fastapi import APIRouter, Depends, status
4
  from fastapi.responses import StreamingResponse
5
 
6
  from agents import DEFAULT_AGENT, get_all_agent_info
7
  from core import settings
8
  from schema import (
9
- ChatHistory,
10
  ChatHistoryInput,
 
11
  ChatMessage,
12
  Feedback,
13
  FeedbackResponse,
@@ -17,8 +17,8 @@ from schema import (
17
  ThreadListInput,
18
  UserInput,
19
  )
20
- from service import agent_service
21
- from service.dependencies import verify_bearer
22
 
23
 
24
  router = APIRouter(dependencies=[Depends(verify_bearer)])
@@ -106,13 +106,39 @@ async def feedback(feedback: Feedback) -> FeedbackResponse:
106
  return await agent_service.submit_feedback(feedback)
107
 
108
 
109
- @router.post("/history")
110
- async def history(input: ChatHistoryInput) -> ChatHistory:
 
 
 
 
 
 
 
111
  """
112
- Get chat history for a thread. Requires user_id and thread_id.
113
- Returns only messages for the given user's thread.
114
  """
115
- return await agent_service.get_history(input)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
 
117
 
118
  @router.post("/history/threads")
 
1
+ from typing import Any, Literal
2
 
3
+ from fastapi import APIRouter, Depends, Query, status
4
  from fastapi.responses import StreamingResponse
5
 
6
  from agents import DEFAULT_AGENT, get_all_agent_info
7
  from core import settings
8
  from schema import (
 
9
  ChatHistoryInput,
10
+ ChatHistoryResponse,
11
  ChatMessage,
12
  Feedback,
13
  FeedbackResponse,
 
17
  ThreadListInput,
18
  UserInput,
19
  )
20
+ from service import agent_service, history_service
21
+ from service.dependencies import get_checkpointer, verify_bearer
22
 
23
 
24
  router = APIRouter(dependencies=[Depends(verify_bearer)])
 
106
  return await agent_service.submit_feedback(feedback)
107
 
108
 
109
+ @router.get("/history", response_model=ChatHistoryResponse)
110
+ async def history_get(
111
+ user_id: str = Query(..., description="User ID to scope history."),
112
+ thread_id: str = Query(..., description="Thread ID for the conversation."),
113
+ limit: int = Query(50, ge=1, le=200, description="Max messages per page."),
114
+ cursor: str | None = Query(None, description="Pagination cursor (older messages)."),
115
+ view: Literal["full", "preview"] = Query("full", description="full or preview (minimal fields)."),
116
+ checkpointer: Any = Depends(get_checkpointer),
117
+ ) -> ChatHistoryResponse:
118
  """
119
+ Get chat history for a thread (GET). Prefer this for read-only loading.
120
+ Returns messages with optional next_cursor and prev_cursor for pagination.
121
  """
122
+ input = ChatHistoryInput(
123
+ user_id=user_id,
124
+ thread_id=thread_id,
125
+ limit=limit,
126
+ cursor=cursor,
127
+ view=view,
128
+ )
129
+ return await history_service.get_history(checkpointer, input)
130
+
131
+
132
+ @router.post("/history", response_model=ChatHistoryResponse)
133
+ async def history_post(
134
+ input: ChatHistoryInput,
135
+ checkpointer: Any = Depends(get_checkpointer),
136
+ ) -> ChatHistoryResponse:
137
+ """
138
+ Get chat history for a thread (POST). Same as GET; body instead of query.
139
+ Returns messages with optional next_cursor and prev_cursor for pagination.
140
+ """
141
+ return await history_service.get_history(checkpointer, input)
142
 
143
 
144
  @router.post("/history/threads")
src/service/service.py CHANGED
@@ -39,6 +39,9 @@ async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
39
  if hasattr(store, "setup"): # ignore: union-attr
40
  await store.setup()
41
 
 
 
 
42
  # Configure agents with both memory components and async loading
43
  agents = get_all_agent_info()
44
  for a in agents:
 
39
  if hasattr(store, "setup"): # ignore: union-attr
40
  await store.setup()
41
 
42
+ # Expose checkpointer for history service (no agent dependency)
43
+ app.state.checkpointer = saver
44
+
45
  # Configure agents with both memory components and async loading
46
  agents = get_all_agent_info()
47
  for a in agents:
src/service/utils.py CHANGED
@@ -1,4 +1,5 @@
1
  import json
 
2
 
3
  import toons
4
  from langchain_core.messages import (
@@ -11,7 +12,7 @@ from langchain_core.messages import (
11
  ChatMessage as LangchainChatMessage,
12
  )
13
 
14
- from schema import ChatMessage
15
 
16
 
17
  def convert_tool_response_to_toon(content: str) -> str:
@@ -81,6 +82,33 @@ def langchain_to_chat_message(message: BaseMessage) -> ChatMessage:
81
  raise ValueError(f"Unsupported message type: {message.__class__.__name__}")
82
 
83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  def remove_tool_calls(content: str | list[str | dict]) -> str | list[str | dict]:
85
  """Remove tool calls from content."""
86
  if isinstance(content, str):
 
1
  import json
2
+ from typing import Literal
3
 
4
  import toons
5
  from langchain_core.messages import (
 
12
  ChatMessage as LangchainChatMessage,
13
  )
14
 
15
+ from schema import ChatMessage, ChatMessagePreview
16
 
17
 
18
  def convert_tool_response_to_toon(content: str) -> str:
 
82
  raise ValueError(f"Unsupported message type: {message.__class__.__name__}")
83
 
84
 
85
+ def message_to_preview(
86
+ message: BaseMessage,
87
+ index: int,
88
+ content_max: int = 200,
89
+ ) -> ChatMessagePreview:
90
+ """Build a minimal preview DTO from a LangChain message."""
91
+ content = convert_message_content_to_string(message.content)
92
+ if len(content) > content_max:
93
+ content = content[: content_max].rstrip() + "…"
94
+ msg_type: Literal["human", "ai", "tool", "custom"]
95
+ if isinstance(message, HumanMessage):
96
+ msg_type = "human"
97
+ elif isinstance(message, AIMessage):
98
+ msg_type = "ai"
99
+ elif isinstance(message, ToolMessage):
100
+ msg_type = "tool"
101
+ elif isinstance(message, LangchainChatMessage) and getattr(message, "role", None) == "custom":
102
+ msg_type = "custom"
103
+ else:
104
+ msg_type = "human"
105
+ return ChatMessagePreview(
106
+ type=msg_type,
107
+ content=content,
108
+ id=str(index),
109
+ )
110
+
111
+
112
  def remove_tool_calls(content: str | list[str | dict]) -> str | list[str | dict]:
113
  """Remove tool calls from content."""
114
  if isinstance(content, str):
uv.lock CHANGED
@@ -7,115 +7,6 @@ resolution-markers = [
7
  "python_full_version < '3.12'",
8
  ]
9
 
10
- [[package]]
11
- name = "chatbot"
12
- version = "0.1.0"
13
- source = { virtual = "." }
14
- dependencies = [
15
- { name = "ddgs" },
16
- { name = "duckduckgo-search" },
17
- { name = "fastapi" },
18
- { name = "grpcio" },
19
- { name = "httpx" },
20
- { name = "jiter" },
21
- { name = "langchain" },
22
- { name = "langchain-anthropic" },
23
- { name = "langchain-aws" },
24
- { name = "langchain-community" },
25
- { name = "langchain-core" },
26
- { name = "langchain-google-genai" },
27
- { name = "langchain-google-vertexai" },
28
- { name = "langchain-groq" },
29
- { name = "langchain-mcp-adapters" },
30
- { name = "langchain-ollama" },
31
- { name = "langchain-openai" },
32
- { name = "langchain-postgres" },
33
- { name = "langfuse" },
34
- { name = "langgraph" },
35
- { name = "langgraph-checkpoint-mongodb" },
36
- { name = "langgraph-checkpoint-postgres" },
37
- { name = "langgraph-checkpoint-sqlite" },
38
- { name = "langsmith" },
39
- { name = "psycopg", extra = ["binary", "pool"] },
40
- { name = "pydantic" },
41
- { name = "pydantic-settings" },
42
- { name = "python-dotenv" },
43
- { name = "setuptools" },
44
- { name = "tiktoken" },
45
- { name = "toons" },
46
- { name = "uvicorn" },
47
- ]
48
-
49
- [package.dev-dependencies]
50
- client = [
51
- { name = "httpx" },
52
- { name = "pydantic" },
53
- { name = "python-dotenv" },
54
- ]
55
- dev = [
56
- { name = "langgraph-cli", extra = ["inmem"] },
57
- { name = "mypy" },
58
- { name = "pre-commit" },
59
- { name = "pytest" },
60
- { name = "pytest-asyncio" },
61
- { name = "pytest-cov" },
62
- { name = "pytest-env" },
63
- { name = "ruff" },
64
- ]
65
-
66
- [package.metadata]
67
- requires-dist = [
68
- { name = "ddgs", specifier = ">=9.9.1" },
69
- { name = "duckduckgo-search", specifier = ">=7.3.0" },
70
- { name = "fastapi", specifier = "~=0.115.5" },
71
- { name = "grpcio", specifier = ">=1.68.0" },
72
- { name = "httpx", specifier = "~=0.28.0" },
73
- { name = "jiter", specifier = "~=0.8.2" },
74
- { name = "langchain", specifier = "~=1.0.5" },
75
- { name = "langchain-anthropic", specifier = "~=1.0.0" },
76
- { name = "langchain-aws", specifier = "~=1.0.0" },
77
- { name = "langchain-community", specifier = "~=0.4.1" },
78
- { name = "langchain-core", specifier = "~=1.0.0" },
79
- { name = "langchain-google-genai", specifier = "~=3.0.0" },
80
- { name = "langchain-google-vertexai", specifier = ">=3.0.3" },
81
- { name = "langchain-groq", specifier = "~=1.0.1" },
82
- { name = "langchain-mcp-adapters", specifier = ">=0.1.10" },
83
- { name = "langchain-ollama", specifier = "~=1.0.0" },
84
- { name = "langchain-openai", specifier = "~=1.0.2" },
85
- { name = "langchain-postgres", specifier = "~=0.0.9" },
86
- { name = "langfuse", specifier = ">=2.65.0" },
87
- { name = "langgraph", specifier = "~=1.0.0" },
88
- { name = "langgraph-checkpoint-mongodb", specifier = "~=0.1.3" },
89
- { name = "langgraph-checkpoint-postgres", specifier = "~=2.0.13" },
90
- { name = "langgraph-checkpoint-sqlite", specifier = "~=2.0.1" },
91
- { name = "langsmith", specifier = "~=0.4.0" },
92
- { name = "psycopg", extras = ["binary", "pool"], specifier = "~=3.2.4" },
93
- { name = "pydantic", specifier = "~=2.10.1" },
94
- { name = "pydantic-settings", specifier = "~=2.12.0" },
95
- { name = "python-dotenv", specifier = "~=1.0.1" },
96
- { name = "setuptools", specifier = "~=75.6.0" },
97
- { name = "tiktoken", specifier = ">=0.8.0" },
98
- { name = "toons", specifier = ">=0.5.2" },
99
- { name = "uvicorn", specifier = "~=0.32.1" },
100
- ]
101
-
102
- [package.metadata.requires-dev]
103
- client = [
104
- { name = "httpx", specifier = "~=0.28.0" },
105
- { name = "pydantic", specifier = "~=2.10.1" },
106
- { name = "python-dotenv", specifier = "~=1.0.1" },
107
- ]
108
- dev = [
109
- { name = "langgraph-cli", extras = ["inmem"] },
110
- { name = "mypy" },
111
- { name = "pre-commit" },
112
- { name = "pytest" },
113
- { name = "pytest-asyncio" },
114
- { name = "pytest-cov" },
115
- { name = "pytest-env" },
116
- { name = "ruff" },
117
- ]
118
-
119
  [[package]]
120
  name = "aiohappyeyeballs"
121
  version = "2.6.1"
@@ -576,6 +467,117 @@ wheels = [
576
  { url = "https://files.pythonhosted.org/packages/0a/4c/925909008ed5a988ccbb72dcc897407e5d6d3bd72410d69e051fc0c14647/charset_normalizer-3.4.4-py3-none-any.whl", hash = "sha256:7a32c560861a02ff789ad905a2fe94e3f840803362c84fecf1851cb4cf3dc37f", size = 53402, upload-time = "2025-10-14T04:42:31.76Z" },
577
  ]
578
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
579
  [[package]]
580
  name = "click"
581
  version = "8.3.0"
@@ -1741,6 +1743,20 @@ wheels = [
1741
  { url = "https://files.pythonhosted.org/packages/30/88/e0b957b2d86defbfeb8181860c2bff3379ac16e918a155aed815a18190ed/langchain_mongodb-0.7.1-py3-none-any.whl", hash = "sha256:dda81023e499025b8c911103ab756d2e9cc40f953727fbbf72165bb85e684e16", size = 60724, upload-time = "2025-10-13T14:03:00.192Z" },
1742
  ]
1743
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1744
  [[package]]
1745
  name = "langchain-ollama"
1746
  version = "1.0.0"
 
7
  "python_full_version < '3.12'",
8
  ]
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  [[package]]
11
  name = "aiohappyeyeballs"
12
  version = "2.6.1"
 
467
  { url = "https://files.pythonhosted.org/packages/0a/4c/925909008ed5a988ccbb72dcc897407e5d6d3bd72410d69e051fc0c14647/charset_normalizer-3.4.4-py3-none-any.whl", hash = "sha256:7a32c560861a02ff789ad905a2fe94e3f840803362c84fecf1851cb4cf3dc37f", size = 53402, upload-time = "2025-10-14T04:42:31.76Z" },
468
  ]
469
 
470
+ [[package]]
471
+ name = "chatbot"
472
+ version = "0.1.0"
473
+ source = { virtual = "." }
474
+ dependencies = [
475
+ { name = "ddgs" },
476
+ { name = "duckduckgo-search" },
477
+ { name = "fastapi" },
478
+ { name = "grpcio" },
479
+ { name = "httpx" },
480
+ { name = "jiter" },
481
+ { name = "langchain" },
482
+ { name = "langchain-anthropic" },
483
+ { name = "langchain-aws" },
484
+ { name = "langchain-community" },
485
+ { name = "langchain-core" },
486
+ { name = "langchain-google-genai" },
487
+ { name = "langchain-google-vertexai" },
488
+ { name = "langchain-groq" },
489
+ { name = "langchain-mcp-adapters" },
490
+ { name = "langchain-nvidia-ai-endpoints" },
491
+ { name = "langchain-ollama" },
492
+ { name = "langchain-openai" },
493
+ { name = "langchain-postgres" },
494
+ { name = "langfuse" },
495
+ { name = "langgraph" },
496
+ { name = "langgraph-checkpoint-mongodb" },
497
+ { name = "langgraph-checkpoint-postgres" },
498
+ { name = "langgraph-checkpoint-sqlite" },
499
+ { name = "langsmith" },
500
+ { name = "psycopg", extra = ["binary", "pool"] },
501
+ { name = "pydantic" },
502
+ { name = "pydantic-settings" },
503
+ { name = "python-dotenv" },
504
+ { name = "setuptools" },
505
+ { name = "tiktoken" },
506
+ { name = "toons" },
507
+ { name = "uvicorn" },
508
+ ]
509
+
510
+ [package.dev-dependencies]
511
+ client = [
512
+ { name = "httpx" },
513
+ { name = "pydantic" },
514
+ { name = "python-dotenv" },
515
+ ]
516
+ dev = [
517
+ { name = "langgraph-cli", extra = ["inmem"] },
518
+ { name = "mypy" },
519
+ { name = "pre-commit" },
520
+ { name = "pytest" },
521
+ { name = "pytest-asyncio" },
522
+ { name = "pytest-cov" },
523
+ { name = "pytest-env" },
524
+ { name = "ruff" },
525
+ ]
526
+
527
+ [package.metadata]
528
+ requires-dist = [
529
+ { name = "ddgs", specifier = ">=9.9.1" },
530
+ { name = "duckduckgo-search", specifier = ">=7.3.0" },
531
+ { name = "fastapi", specifier = "~=0.115.5" },
532
+ { name = "grpcio", specifier = ">=1.68.0" },
533
+ { name = "httpx", specifier = "~=0.28.0" },
534
+ { name = "jiter", specifier = "~=0.8.2" },
535
+ { name = "langchain", specifier = "~=1.0.5" },
536
+ { name = "langchain-anthropic", specifier = "~=1.0.0" },
537
+ { name = "langchain-aws", specifier = "~=1.0.0" },
538
+ { name = "langchain-community", specifier = "~=0.4.1" },
539
+ { name = "langchain-core", specifier = "~=1.0.0" },
540
+ { name = "langchain-google-genai", specifier = "~=3.0.0" },
541
+ { name = "langchain-google-vertexai", specifier = ">=3.0.3" },
542
+ { name = "langchain-groq", specifier = "~=1.0.1" },
543
+ { name = "langchain-mcp-adapters", specifier = ">=0.1.10" },
544
+ { name = "langchain-nvidia-ai-endpoints", specifier = ">=1.0.4" },
545
+ { name = "langchain-ollama", specifier = "~=1.0.0" },
546
+ { name = "langchain-openai", specifier = "~=1.0.2" },
547
+ { name = "langchain-postgres", specifier = "~=0.0.9" },
548
+ { name = "langfuse", specifier = ">=2.65.0" },
549
+ { name = "langgraph", specifier = "~=1.0.0" },
550
+ { name = "langgraph-checkpoint-mongodb", specifier = "~=0.1.3" },
551
+ { name = "langgraph-checkpoint-postgres", specifier = "~=2.0.13" },
552
+ { name = "langgraph-checkpoint-sqlite", specifier = "~=2.0.1" },
553
+ { name = "langsmith", specifier = "~=0.4.0" },
554
+ { name = "psycopg", extras = ["binary", "pool"], specifier = "~=3.2.4" },
555
+ { name = "pydantic", specifier = "~=2.10.1" },
556
+ { name = "pydantic-settings", specifier = "~=2.12.0" },
557
+ { name = "python-dotenv", specifier = "~=1.0.1" },
558
+ { name = "setuptools", specifier = "~=75.6.0" },
559
+ { name = "tiktoken", specifier = ">=0.8.0" },
560
+ { name = "toons", specifier = ">=0.5.2" },
561
+ { name = "uvicorn", specifier = "~=0.32.1" },
562
+ ]
563
+
564
+ [package.metadata.requires-dev]
565
+ client = [
566
+ { name = "httpx", specifier = "~=0.28.0" },
567
+ { name = "pydantic", specifier = "~=2.10.1" },
568
+ { name = "python-dotenv", specifier = "~=1.0.1" },
569
+ ]
570
+ dev = [
571
+ { name = "langgraph-cli", extras = ["inmem"] },
572
+ { name = "mypy" },
573
+ { name = "pre-commit" },
574
+ { name = "pytest" },
575
+ { name = "pytest-asyncio" },
576
+ { name = "pytest-cov" },
577
+ { name = "pytest-env" },
578
+ { name = "ruff" },
579
+ ]
580
+
581
  [[package]]
582
  name = "click"
583
  version = "8.3.0"
 
1743
  { url = "https://files.pythonhosted.org/packages/30/88/e0b957b2d86defbfeb8181860c2bff3379ac16e918a155aed815a18190ed/langchain_mongodb-0.7.1-py3-none-any.whl", hash = "sha256:dda81023e499025b8c911103ab756d2e9cc40f953727fbbf72165bb85e684e16", size = 60724, upload-time = "2025-10-13T14:03:00.192Z" },
1744
  ]
1745
 
1746
+ [[package]]
1747
+ name = "langchain-nvidia-ai-endpoints"
1748
+ version = "1.0.4"
1749
+ source = { registry = "https://pypi.org/simple" }
1750
+ dependencies = [
1751
+ { name = "aiohttp" },
1752
+ { name = "filetype" },
1753
+ { name = "langchain-core" },
1754
+ ]
1755
+ sdist = { url = "https://files.pythonhosted.org/packages/8d/2e/0b3e6ec5df7426e3ab19c8dfedd0b4a9e97461a6a536e02f6429618664ec/langchain_nvidia_ai_endpoints-1.0.4.tar.gz", hash = "sha256:831decd67e94f104bc2fecc596ef2953ea30e7adc1c3b99bd35861e018dd1fb2", size = 46600, upload-time = "2026-02-13T17:17:56.135Z" }
1756
+ wheels = [
1757
+ { url = "https://files.pythonhosted.org/packages/c8/3e/a711094b31777ac4a7993507b8a3e0a45307cbab94425b5eba012a49c0cd/langchain_nvidia_ai_endpoints-1.0.4-py3-none-any.whl", hash = "sha256:49018362fca9c951488dffcf3e1372365778946e2a3b87ff7d769589e7b3c497", size = 50173, upload-time = "2026-02-13T17:17:54.759Z" },
1758
+ ]
1759
+
1760
  [[package]]
1761
  name = "langchain-ollama"
1762
  version = "1.0.0"