VibecoderMcSwaggins commited on
Commit
6dcd3d9
·
1 Parent(s): bf5812d

fix(auth): Robust HF_TOKEN loading and debug logging for Partner API auth failure

Browse files
docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md CHANGED
@@ -1,162 +1,24 @@
1
- # P1 Bug: HuggingFace Router 401 Unauthorized (Hyperbolic Provider)
2
 
3
- **Severity**: P1 (High) - Free Tier completely broken
4
- **Status**: Open
5
- **Discovered**: 2025-12-01
6
- **Reporter**: Production user via HuggingFace Spaces
7
-
8
- ## Symptom
9
 
 
10
  ```
11
- 401 Client Error: Unauthorized for url:
12
- https://router.huggingface.co/hyperbolic/v1/chat/completions
13
  Invalid username or password.
14
  ```
15
 
16
- ## Root Cause Analysis
17
-
18
- ### What Changed (NOT our code)
19
-
20
- HuggingFace has migrated their Inference API infrastructure:
21
-
22
- 1. **Old endpoint** (deprecated): `https://api-inference.huggingface.co`
23
- 2. **New endpoint**: `https://router.huggingface.co/{provider}/v1/chat/completions`
24
-
25
- The new "router" system routes requests to **partner providers** based on the model:
26
- - `meta-llama/Llama-3.1-70B-Instruct` → **Hyperbolic** (partner)
27
- - Other models → various providers
28
-
29
- **Critical Issue**: Hyperbolic requires authentication even for models that were previously "free tier" on HuggingFace's native infrastructure.
30
-
31
- ### Call Stack Trace
32
-
33
- ```
34
- User Query (HuggingFace Spaces)
35
-
36
- src/app.py:research_agent()
37
-
38
- src/orchestrators/advanced.py:AdvancedOrchestrator.run()
39
-
40
- src/clients/factory.py:get_chat_client() [line 69-76]
41
- → No OpenAI key → Falls back to HuggingFace
42
-
43
- src/clients/huggingface.py:HuggingFaceChatClient.__init__() [line 52-56]
44
- → InferenceClient(model="meta-llama/Llama-3.1-70B-Instruct", token=None)
45
-
46
- huggingface_hub.InferenceClient.chat_completion()
47
- → Routes to: https://router.huggingface.co/hyperbolic/v1/chat/completions
48
- → 401 Unauthorized (Hyperbolic rejects unauthenticated requests)
49
- ```
50
-
51
- ### Evidence
52
-
53
- - **huggingface_hub version**: 0.36.0 (latest)
54
- - **pyproject.toml constraint**: `>=0.24.0`
55
- - **HuggingFace Forum Reference**: [API endpoint migration thread](https://discuss.huggingface.co/t/error-https-api-inference-huggingface-co-is-no-longer-supported-please-use-https-router-huggingface-co-hf-inference-instead/169870)
56
-
57
- ## Impact
58
-
59
- | Component | Impact |
60
- |-----------|--------|
61
- | Free Tier (no API key) | **COMPLETELY BROKEN** |
62
- | HuggingFace Spaces demo | **BROKEN** |
63
- | Users without OpenAI key | **Cannot use app** |
64
- | Paid tier (OpenAI key) | Unaffected |
65
-
66
- ## Proposed Solutions
67
-
68
- ### Option 1: Switch to Smaller Free Model (Quick Fix)
69
-
70
- Change default model from `meta-llama/Llama-3.1-70B-Instruct` to a model that's still hosted on HuggingFace's native infrastructure:
71
-
72
- ```python
73
- # src/utils/config.py
74
- huggingface_model: str | None = Field(
75
- default="mistralai/Mistral-7B-Instruct-v0.3", # Still on HF native
76
- description="HuggingFace model name"
77
- )
78
- ```
79
-
80
- **Candidates** (need testing):
81
- - `mistralai/Mistral-7B-Instruct-v0.3`
82
- - `HuggingFaceH4/zephyr-7b-beta`
83
- - `microsoft/Phi-3-mini-4k-instruct`
84
- - `google/gemma-2-9b-it`
85
-
86
- **Pros**: Quick fix, no auth required
87
- **Cons**: Lower quality output than Llama 3.1 70B
88
-
89
- ### Option 2: Require HF_TOKEN for Free Tier
90
-
91
- Document that `HF_TOKEN` is now **required** (not optional) for Free Tier:
92
-
93
- ```python
94
- # src/clients/factory.py
95
- if not settings.hf_token:
96
- raise ConfigurationError(
97
- "HF_TOKEN is now required for HuggingFace free tier. "
98
- "Get yours at https://huggingface.co/settings/tokens"
99
- )
100
- ```
101
-
102
- **Pros**: Keeps Llama 3.1 70B quality
103
- **Cons**: Friction for users, not truly "free" anymore
104
-
105
- ### Option 3: Server-Side HF_TOKEN on Spaces
106
-
107
- Set `HF_TOKEN` as a secret in HuggingFace Spaces settings:
108
- 1. Go to Space Settings → Repository Secrets
109
- 2. Add `HF_TOKEN` with a valid token
110
- 3. Users get free tier without needing their own token
111
-
112
- **Pros**: Best UX, transparent to users
113
- **Cons**: Token usage counted against our account
114
-
115
- ### Option 4: Hybrid Fallback Chain
116
-
117
- Try multiple models in order until one works:
118
-
119
- ```python
120
- FALLBACK_MODELS = [
121
- "meta-llama/Llama-3.1-70B-Instruct", # Best quality (needs token)
122
- "mistralai/Mistral-7B-Instruct-v0.3", # Good quality (free)
123
- "microsoft/Phi-3-mini-4k-instruct", # Lightweight (free)
124
- ]
125
- ```
126
-
127
- **Pros**: Graceful degradation
128
- **Cons**: Complexity, inconsistent output quality
129
-
130
- ## Recommended Fix
131
-
132
- **Short-term (P1)**: Option 3 - Add `HF_TOKEN` to HuggingFace Spaces secrets
133
-
134
- **Long-term**: Option 4 - Implement fallback chain with clear user feedback about which model is active
135
-
136
- ## Testing
137
-
138
- ```bash
139
- # Test without token (should fail currently)
140
- unset HF_TOKEN
141
- uv run python -c "
142
- from huggingface_hub import InferenceClient
143
- client = InferenceClient(model='meta-llama/Llama-3.1-70B-Instruct')
144
- response = client.chat_completion(messages=[{'role': 'user', 'content': 'Hi'}])
145
- print(response)
146
- "
147
-
148
- # Test with token (should work)
149
- export HF_TOKEN=hf_xxxxx
150
- uv run python -c "
151
- from huggingface_hub import InferenceClient
152
- client = InferenceClient(model='meta-llama/Llama-3.1-70B-Instruct', token='$HF_TOKEN')
153
- response = client.chat_completion(messages=[{'role': 'user', 'content': 'Hi'}])
154
- print(response)
155
- "
156
- ```
157
 
158
- ## References
 
 
 
159
 
160
- - [HuggingFace API Migration Thread](https://discuss.huggingface.co/t/error-https-api-inference-huggingface-co-is-no-longer-supported-please-use-https-router-huggingface-co-hf-inference-instead/169870)
161
- - [GitHub Issue: 401 Unauthorized](https://github.com/huggingface/transformers/issues/38289)
162
- - [HuggingFace Inference Endpoints Docs](https://huggingface.co/docs/huggingface_hub/guides/inference)
 
1
+ ## Update 2025-12-01 21:45 PST
2
 
3
+ **Attempted Fix 1**: Switched model from `meta-llama/Llama-3.1-70B-Instruct` (Hyperbolic) to `Qwen/Qwen2.5-72B-Instruct` (routed to **Novita**).
 
 
 
 
 
4
 
5
+ **Result**: Failed with same 401 error on Novita.
6
  ```
7
+ 401 Client Error: Unauthorized for url: https://router.huggingface.co/novita/v3/openai/chat/completions
 
8
  Invalid username or password.
9
  ```
10
 
11
+ **New Findings**:
12
+ 1. **All Large Models are Partners**: Both Llama-70B and Qwen-72B are routed to partner providers (Hyperbolic, Novita).
13
+ 2. **Partners Require Auth**: Partner providers strictly require authentication. Anonymous access is blocked.
14
+ 3. **Token Propagation Failure**: Even with `HF_TOKEN` set in Spaces secrets, the `huggingface_hub` library might not be picking it up via Pydantic settings if `alias` resolution is flaky in the environment.
15
+ 4. **Possible Token Permission Issue**: The user's token might lack permissions for Partner Inference endpoints.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
+ **Corrective Actions**:
18
+ 1. **Robust Config Loading**: Modified `src/utils/config.py` to use `default_factory=lambda: os.environ.get("HF_TOKEN")` to guarantee environment variable reading.
19
+ 2. **Debug Logging**: Added explicit logging in `src/clients/huggingface.py` to confirming if a token is being used (masked).
20
+ 3. **Retain Qwen**: Keeping `Qwen/Qwen2.5-72B-Instruct` as it's a capable model. If auth is fixed, it should work.
21
 
22
+ **Next Steps**:
23
+ - Deploy these changes to debug the token loading.
24
+ - If token is loaded but still failing, the user must generate a new `HF_TOKEN` with **"Make calls to inference endpoints"** permissions.
src/clients/huggingface.py CHANGED
@@ -45,14 +45,27 @@ class HuggingFaceChatClient(BaseChatClient): # type: ignore[misc]
45
  self.model_id = model_id or settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
46
  self.api_key = api_key or settings.hf_token
47
 
48
- # Initialize the HF Inference Client
49
- # timeout=60 to prevent premature timeouts on long reasonings
50
- self._client = InferenceClient(
51
- model=self.model_id,
52
- token=self.api_key,
53
- timeout=60,
54
- )
55
- logger.info("Initialized HuggingFaceChatClient", model=self.model_id)
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
  def _convert_messages(self, messages: MutableSequence[ChatMessage]) -> list[dict[str, Any]]:
58
  """Convert framework messages to HuggingFace format."""
 
45
  self.model_id = model_id or settings.huggingface_model or "Qwen/Qwen2.5-72B-Instruct"
46
  self.api_key = api_key or settings.hf_token
47
 
48
+ # Debug logging for auth issues
49
+ if self.api_key:
50
+ masked_key = (
51
+ f"{self.api_key[:4]}...{self.api_key[-4:]}" if len(self.api_key) > 8 else "***"
52
+ )
53
+ logger.info(f"HuggingFaceChatClient using explicit API token: {masked_key}")
54
+ else:
55
+ logger.warning(
56
+ "HuggingFaceChatClient initialized WITHOUT explicit API token "
57
+ "(relying on cached token or anonymous access)"
58
+ )
59
+
60
+ try:
61
+ self._client = InferenceClient(
62
+ model=self.model_id,
63
+ token=self.api_key,
64
+ timeout=kwargs.get("timeout", 120), # Default to 120s
65
+ )
66
+ except Exception as e:
67
+ logger.error(f"Failed to initialize HuggingFace client: {e}")
68
+ raise
69
 
70
  def _convert_messages(self, messages: MutableSequence[ChatMessage]) -> list[dict[str, Any]]:
71
  """Convert framework messages to HuggingFace format."""
src/utils/config.py CHANGED
@@ -1,6 +1,7 @@
1
  """Application configuration using Pydantic Settings."""
2
 
3
  import logging
 
4
  from typing import Literal
5
 
6
  import structlog
@@ -42,7 +43,9 @@ class Settings(BaseSettings):
42
  default="Qwen/Qwen2.5-72B-Instruct", description="HuggingFace model name"
43
  )
44
  hf_token: str | None = Field(
45
- default=None, alias="HF_TOKEN", description="HuggingFace API token"
 
 
46
  )
47
 
48
  # Embedding Configuration
 
1
  """Application configuration using Pydantic Settings."""
2
 
3
  import logging
4
+ import os
5
  from typing import Literal
6
 
7
  import structlog
 
43
  default="Qwen/Qwen2.5-72B-Instruct", description="HuggingFace model name"
44
  )
45
  hf_token: str | None = Field(
46
+ default_factory=lambda: os.environ.get("HF_TOKEN"),
47
+ alias="HF_TOKEN",
48
+ description="HuggingFace API token",
49
  )
50
 
51
  # Embedding Configuration