Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

Claude commited on 18 days ago

Commit

a4327d1

unverified ·

1 Parent(s): e4c6475

fix(P1): Replace stale 72B model fallbacks with 7B to avoid Novita 500 errors

Bug: Three files had hardcoded fallbacks to Qwen2.5-72B-Instruct which
routes to Novita provider (unreliable, 500 errors). The config.py was
updated to use 7B, but these fallbacks were stale.

Files fixed:
- src/clients/huggingface.py:59 - HuggingFaceChatClient init
- src/agent_factory/judges.py:85 - get_model() HuggingFace branch
- src/orchestrators/langgraph_orchestrator.py:43 - LangGraph init (deprecated)

All fallbacks now use Qwen/Qwen2.5-7B-Instruct to stay on HuggingFace
native serverless infrastructure (models < 30B params).

See: CLAUDE.md "CRITICAL: HuggingFace Free Tier Architecture" section

All 310 unit tests pass.

Files changed (3) hide show

src/agent_factory/judges.py +2 -1
src/clients/huggingface.py +3 -2
src/orchestrators/langgraph_orchestrator.py +3 -2

src/agent_factory/judges.py CHANGED Viewed

@@ -82,7 +82,8 @@ def get_model() -> Any:
     # Priority 3: HuggingFace (requires HF_TOKEN)
     if settings.has_huggingface_key:
-        model_name = settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
         hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
         return HuggingFaceModel(model_name, provider=hf_provider)

     # Priority 3: HuggingFace (requires HF_TOKEN)
     if settings.has_huggingface_key:
+        # FIX: Use 7B model to stay on HuggingFace native infrastructure (avoid Novita 500s)
+        model_name = settings.huggingface_model or "Qwen/Qwen2.5-7B-Instruct"
         hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
         return HuggingFaceModel(model_name, provider=hf_provider)

src/clients/huggingface.py CHANGED Viewed

@@ -51,12 +51,13 @@ class HuggingFaceChatClient(BaseChatClient):  # type: ignore[misc]
         """Initialize the HuggingFace chat client.
         Args:
-            model_id: The HuggingFace model ID (default: configured value or Qwen2.5-72B).
             api_key: HF_TOKEN (optional, defaults to env var).
             **kwargs: Additional arguments passed to BaseChatClient.
         """
         super().__init__(**kwargs)
-        self.model_id = model_id or settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
         self.api_key = api_key or settings.hf_token
         # Initialize the HF Inference Client

         """Initialize the HuggingFace chat client.
         Args:
+            model_id: The HuggingFace model ID (default: configured value or Qwen2.5-7B).
             api_key: HF_TOKEN (optional, defaults to env var).
             **kwargs: Additional arguments passed to BaseChatClient.
         """
         super().__init__(**kwargs)
+        # FIX: Use 7B model to stay on HuggingFace native infrastructure (avoid Novita 500s)
+        self.model_id = model_id or settings.huggingface_model or "Qwen/Qwen2.5-7B-Instruct"
         self.api_key = api_key or settings.hf_token
         # Initialize the HF Inference Client

src/orchestrators/langgraph_orchestrator.py CHANGED Viewed

@@ -38,8 +38,9 @@ class LangGraphOrchestrator(OrchestratorProtocol):
         # Initialize the LLM (Qwen 2.5 via HF Inference)
         # We use the serverless API by default
-        # NOTE: Llama-3.1-70B routes to Hyperbolic (unreliable staging mode)
-        repo_id = "Qwen/Qwen2.5-72B-Instruct"
         # Ensure we have an API key
         api_key = settings.hf_token

         # Initialize the LLM (Qwen 2.5 via HF Inference)
         # We use the serverless API by default
+        # FIX: Use 7B model to stay on HuggingFace native infrastructure
+        # Large models (70B+) route to Novita/Hyperbolic providers (500/401 errors)
+        repo_id = settings.huggingface_model or "Qwen/Qwen2.5-7B-Instruct"
         # Ensure we have an API key
         api_key = settings.hf_token