Instructions to use Salesforce/codegen-350M-multi with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Salesforce/codegen-350M-multi with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Salesforce/codegen-350M-multi")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen-350M-multi")
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-350M-multi")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Salesforce/codegen-350M-multi with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Salesforce/codegen-350M-multi"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/codegen-350M-multi",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Salesforce/codegen-350M-multi

SGLang

How to use Salesforce/codegen-350M-multi with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Salesforce/codegen-350M-multi" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/codegen-350M-multi",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Salesforce/codegen-350M-multi" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/codegen-350M-multi",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Salesforce/codegen-350M-multi with Docker Model Runner:
```
docker model run hf.co/Salesforce/codegen-350M-multi
```

Create app.py

by sky-meilin - opened Feb 2

base: refs/heads/main

←

from: refs/pr/9

Discussion Files changed

+49

-0

Files changed (1) hide show

app.py +49 -0

app.py ADDED Viewed

	@@ -0,0 +1,49 @@

+# Install required packages
+%pip install azure-ai-ml azure-identity --upgrade --quiet
+import os
+import time
+from azure.ai.ml import MLClient
+from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment
+from azure.identity import DefaultAzureCredential
+# Set environment variables (replace with your values)
+# Follow setup steps at: https://huggingface.co/docs/microsoft-azure/guides/configure-azure-ml-microsoft-foundry
+os.environ["SUBSCRIPTION_ID"] = "<YOUR_SUBSCRIPTION_ID>"
+os.environ["RESOURCE_GROUP"] = "<YOUR_RESOURCE_GROUP>"
+os.environ["WORKSPACE_NAME"] = "<YOUR_WORKSPACE_NAME>"
+# Generate unique names for endpoint and deployment
+timestamp = str(int(time.time()))
+os.environ["ENDPOINT_NAME"] = f"hf-ep-{timestamp}"
+os.environ["DEPLOYMENT_NAME"] = f"hf-deploy-{timestamp}"
+# Create Azure ML Client for Microsoft Foundry (classic)
+client = MLClient(
+    credential=DefaultAzureCredential(),
+    subscription_id=os.getenv("SUBSCRIPTION_ID"),
+    resource_group_name=os.getenv("RESOURCE_GROUP"),
+    workspace_name=os.getenv("WORKSPACE_NAME"),
+)
+# Build model URI for Azure registry
+model_uri = f"azureml://registries/HuggingFace/models/salesforce-codegen-350m-multi/labels/latest"
+# Create endpoint and deployment
+endpoint = ManagedOnlineEndpoint(name=os.getenv("ENDPOINT_NAME"))
+deployment = ManagedOnlineDeployment(
+    name=os.getenv("DEPLOYMENT_NAME"),
+    endpoint_name=os.getenv("ENDPOINT_NAME"),
+    model=model_uri,
+    # Check https://huggingface.co/docs/microsoft-azure/foundry/hardware to see the available instances
+    instance_type="Standard_NC40ads_H100_v5",
+    instance_count=1,
+)
+# Deploy endpoint and deployment (this may take 10-15 minutes)
+client.begin_create_or_update(endpoint).wait()
+client.online_deployments.begin_create_or_update(deployment).wait()
+print(f"Endpoint '{os.getenv('ENDPOINT_NAME')}' deployed successfully!")
+print("You can now send requests to your endpoint via Microsoft Foundry or Azure Machine Learning.")