Update README.md
Browse files
README.md
CHANGED
|
@@ -35,7 +35,8 @@ It is optimized for high-throughput CPU inference using Hugging Face's [Text Emb
|
|
| 35 |
|
| 36 |
## Usage with Text Embeddings Inference (TEI)
|
| 37 |
|
| 38 |
-
This model is pre-configured for TEI.
|
|
|
|
| 39 |
|
| 40 |
### Option A: Docker CLI
|
| 41 |
|
|
@@ -50,8 +51,6 @@ docker run --rm -p 8080:80 \\
|
|
| 50 |
|
| 51 |
### Option B: Docker Compose
|
| 52 |
|
| 53 |
-
Use this configuration to integrate the model into your stack:
|
| 54 |
-
|
| 55 |
```yaml
|
| 56 |
services:
|
| 57 |
embedding-service:
|
|
@@ -68,27 +67,12 @@ services:
|
|
| 68 |
- "8080:80"
|
| 69 |
```
|
| 70 |
|
| 71 |
-
### API Request Example
|
| 72 |
-
|
| 73 |
-
Once the container is running, you can generate embeddings via the HTTP API:
|
| 74 |
-
|
| 75 |
-
```bash
|
| 76 |
-
curl 127.0.0.1:8080/embed \\
|
| 77 |
-
-X POST \\
|
| 78 |
-
-d '{"inputs":"Deep learning is a subset of machine learning."}' \\
|
| 79 |
-
-H 'Content-Type: application/json'
|
| 80 |
-
```
|
| 81 |
-
|
| 82 |
## Usage with Python (Optimum)
|
| 83 |
|
| 84 |
-
You can also run this model locally using the `optimum` library with ONNX Runtime.
|
| 85 |
-
|
| 86 |
-
**Installation:**
|
| 87 |
```bash
|
| 88 |
pip install optimum[onnxruntime] transformers
|
| 89 |
```
|
| 90 |
|
| 91 |
-
**Inference Code:**
|
| 92 |
```python
|
| 93 |
from optimum.onnxruntime import ORTModelForFeatureExtraction
|
| 94 |
from transformers import AutoTokenizer
|
|
@@ -110,20 +94,11 @@ inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt"
|
|
| 110 |
with torch.no_grad():
|
| 111 |
outputs = model(**inputs)
|
| 112 |
|
| 113 |
-
# Mean Pooling
|
| 114 |
-
# Attention mask is needed to exclude padding tokens from the average
|
| 115 |
attention_mask = inputs['attention_mask']
|
| 116 |
token_embeddings = outputs.last_hidden_state
|
| 117 |
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
|
| 118 |
embeddings = torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
|
| 119 |
|
| 120 |
print(f"Embeddings shape: {embeddings.shape}")
|
| 121 |
-
|
| 122 |
-
```
|
| 123 |
-
|
| 124 |
-
## Performance Comparison
|
| 125 |
-
|
| 126 |
-
By converting to ONNX and quantizing to INT8, this model achieves significantly lower latency and reduced memory footprint compared to the original PyTorch model, with minimal impact on embedding quality.
|
| 127 |
-
|
| 128 |
-
- **Memory Usage:** Reduced by approximately 50%.
|
| 129 |
-
- **Inference Speed:** Up to 3x-5x faster on modern CPUs (depending on batch size and sequence length).
|
|
|
|
| 35 |
|
| 36 |
## Usage with Text Embeddings Inference (TEI)
|
| 37 |
|
| 38 |
+
This model is pre-configured for TEI. You can run it directly using Docker.
|
| 39 |
+
**Note:** `auto-truncate` is required because the model supports 32k context, but Docker defaults to smaller batches.
|
| 40 |
|
| 41 |
### Option A: Docker CLI
|
| 42 |
|
|
|
|
| 51 |
|
| 52 |
### Option B: Docker Compose
|
| 53 |
|
|
|
|
|
|
|
| 54 |
```yaml
|
| 55 |
services:
|
| 56 |
embedding-service:
|
|
|
|
| 67 |
- "8080:80"
|
| 68 |
```
|
| 69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
## Usage with Python (Optimum)
|
| 71 |
|
|
|
|
|
|
|
|
|
|
| 72 |
```bash
|
| 73 |
pip install optimum[onnxruntime] transformers
|
| 74 |
```
|
| 75 |
|
|
|
|
| 76 |
```python
|
| 77 |
from optimum.onnxruntime import ORTModelForFeatureExtraction
|
| 78 |
from transformers import AutoTokenizer
|
|
|
|
| 94 |
with torch.no_grad():
|
| 95 |
outputs = model(**inputs)
|
| 96 |
|
| 97 |
+
# Mean Pooling
|
|
|
|
| 98 |
attention_mask = inputs['attention_mask']
|
| 99 |
token_embeddings = outputs.last_hidden_state
|
| 100 |
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
|
| 101 |
embeddings = torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
|
| 102 |
|
| 103 |
print(f"Embeddings shape: {embeddings.shape}")
|
| 104 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|