Instructions to use Salesforce/codegen-350M-multi with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Salesforce/codegen-350M-multi with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Salesforce/codegen-350M-multi")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-350M-multi") model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-350M-multi") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Salesforce/codegen-350M-multi with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Salesforce/codegen-350M-multi" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Salesforce/codegen-350M-multi", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Salesforce/codegen-350M-multi
- SGLang
How to use Salesforce/codegen-350M-multi with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Salesforce/codegen-350M-multi" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Salesforce/codegen-350M-multi", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Salesforce/codegen-350M-multi" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Salesforce/codegen-350M-multi", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Salesforce/codegen-350M-multi with Docker Model Runner:
docker model run hf.co/Salesforce/codegen-350M-multi
Create app.py
#9
by sky-meilin - opened
app.py
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Install required packages
|
| 2 |
+
%pip install azure-ai-ml azure-identity --upgrade --quiet
|
| 3 |
+
|
| 4 |
+
import os
|
| 5 |
+
import time
|
| 6 |
+
from azure.ai.ml import MLClient
|
| 7 |
+
from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment
|
| 8 |
+
from azure.identity import DefaultAzureCredential
|
| 9 |
+
|
| 10 |
+
# Set environment variables (replace with your values)
|
| 11 |
+
# Follow setup steps at: https://huggingface.co/docs/microsoft-azure/guides/configure-azure-ml-microsoft-foundry
|
| 12 |
+
os.environ["SUBSCRIPTION_ID"] = "<YOUR_SUBSCRIPTION_ID>"
|
| 13 |
+
os.environ["RESOURCE_GROUP"] = "<YOUR_RESOURCE_GROUP>"
|
| 14 |
+
os.environ["WORKSPACE_NAME"] = "<YOUR_WORKSPACE_NAME>"
|
| 15 |
+
|
| 16 |
+
# Generate unique names for endpoint and deployment
|
| 17 |
+
timestamp = str(int(time.time()))
|
| 18 |
+
os.environ["ENDPOINT_NAME"] = f"hf-ep-{timestamp}"
|
| 19 |
+
os.environ["DEPLOYMENT_NAME"] = f"hf-deploy-{timestamp}"
|
| 20 |
+
|
| 21 |
+
# Create Azure ML Client for Microsoft Foundry (classic)
|
| 22 |
+
client = MLClient(
|
| 23 |
+
credential=DefaultAzureCredential(),
|
| 24 |
+
subscription_id=os.getenv("SUBSCRIPTION_ID"),
|
| 25 |
+
resource_group_name=os.getenv("RESOURCE_GROUP"),
|
| 26 |
+
workspace_name=os.getenv("WORKSPACE_NAME"),
|
| 27 |
+
)
|
| 28 |
+
|
| 29 |
+
# Build model URI for Azure registry
|
| 30 |
+
model_uri = f"azureml://registries/HuggingFace/models/salesforce-codegen-350m-multi/labels/latest"
|
| 31 |
+
|
| 32 |
+
# Create endpoint and deployment
|
| 33 |
+
endpoint = ManagedOnlineEndpoint(name=os.getenv("ENDPOINT_NAME"))
|
| 34 |
+
|
| 35 |
+
deployment = ManagedOnlineDeployment(
|
| 36 |
+
name=os.getenv("DEPLOYMENT_NAME"),
|
| 37 |
+
endpoint_name=os.getenv("ENDPOINT_NAME"),
|
| 38 |
+
model=model_uri,
|
| 39 |
+
# Check https://huggingface.co/docs/microsoft-azure/foundry/hardware to see the available instances
|
| 40 |
+
instance_type="Standard_NC40ads_H100_v5",
|
| 41 |
+
instance_count=1,
|
| 42 |
+
)
|
| 43 |
+
|
| 44 |
+
# Deploy endpoint and deployment (this may take 10-15 minutes)
|
| 45 |
+
client.begin_create_or_update(endpoint).wait()
|
| 46 |
+
client.online_deployments.begin_create_or_update(deployment).wait()
|
| 47 |
+
|
| 48 |
+
print(f"Endpoint '{os.getenv('ENDPOINT_NAME')}' deployed successfully!")
|
| 49 |
+
print("You can now send requests to your endpoint via Microsoft Foundry or Azure Machine Learning.")
|