internlm
/

Intern-S1-Pro

@@ -61,7 +61,7 @@ temperature = 0.8
 ### Serving
 > [!IMPORTANT]
-> Running a trillion-parameter model using the native Hugging Face forward method is challenging. We strongly recommend using an LLM inference engine (such as LMDeploy, vLLM, or sglang) to host Intern-S1-Pro and accessing the model via API.
 Intern-S1-Pro can be deployed using any of the following LLM inference frameworks:
@@ -71,8 +71,6 @@ Intern-S1-Pro can be deployed using any of the following LLM inference framework
 Detailed deployment examples for these frameworks are available in the [Model Deployment Guide](./deployment_guide.md).
-> Deployment support for the time-series module is under optimization and will be released soon.
 ## Advanced Usage
@@ -249,7 +247,7 @@ text = tokenizer.apply_chat_template(
 )
 ```
-With serving Intern-S1-Pro models, you can dynamically control the thinking mode by adjusting the `enable_thinking` parameter in your requests.
 ```python
 from openai import OpenAI
@@ -288,6 +286,122 @@ response = client.chat.completions.create(
 print(json.dumps(response.model_dump(), indent=2, ensure_ascii=False))
 ```
 ## Citation
 If you find this work useful, feel free to give us a cite.

 ### Serving
 > [!IMPORTANT]
+> Running a trillion-parameter model using the native Hugging Face forward method is challenging. We strongly recommend using an LLM inference engine (such as LMDeploy, vLLM, or SGLang) to host Intern-S1-Pro and accessing the model via API.
 Intern-S1-Pro can be deployed using any of the following LLM inference frameworks:
 Detailed deployment examples for these frameworks are available in the [Model Deployment Guide](./deployment_guide.md).
 ## Advanced Usage
 )
 ```
+When serving Intern-S1-Pro models, you can dynamically control the thinking mode by adjusting the `enable_thinking` parameter in your requests.
 ```python
 from openai import OpenAI
 print(json.dumps(response.model_dump(), indent=2, ensure_ascii=False))
 ```
+### Time Series Demo
+Time series inference is currently only supported in LMDeploy. To get started, download and deploy Intern-S1-Pro with LMDeploy (>=v0.12.1) by following the [Model Deployment Guide](./deployment_guide.md).
+Below is an example of detecting earthquake events from a time series signal file. Additional data types and functionalities are also supported.
+```
+from openai import OpenAI
+from lmdeploy.vl.time_series_utils import encode_time_series_base64
+openai_api_key = "EMPTY"
+openai_api_base = "http://0.0.0.0:8000/v1"
+client = OpenAI(
+    api_key=openai_api_key,
+    base_url=openai_api_base,
+)
+model_name = client.models.list().data[0].id
+def send_base64(file_path: str, sampling_rate: int = 100):
+    """base64-encoded time-series data."""
+    # encode_time_series_base64 accepts local file paths and http urls,
+    # encoding time-series data (.npy, .csv, .wav, .mp3, .flac, etc.) into base64 strings.
+    base64_ts = encode_time_series_base64(file_path)
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Please determine whether an Earthquake event has occurred in the provided time-series data. If so, please specify the starting time point indices of the P-wave and S-wave in the event."
+                },
+                {
+                    "type": "time_series_url",
+                    "time_series_url": {
+                        "url": f"data:time_series/npy;base64,{base64_ts}",
+                        "sampling_rate": sampling_rate
+                    },
+                },
+            ],
+        }
+    ]
+    return client.chat.completions.create(
+        model=model_name,
+        messages=messages,
+        temperature=0,
+        max_tokens=200,
+    )
+def send_http_url(url: str, sampling_rate: int = 100):
+    """http(s) url pointing to the time-series data."""
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Please determine whether an Earthquake event has occurred in the provided time-series data. If so, please specify the starting time point indices of the P-wave and S-wave in the event."
+                },
+                {
+                    "type": "time_series_url",
+                    "time_series_url": {
+                        "url": url,
+                        "sampling_rate": sampling_rate
+                    },
+                },
+            ],
+        }
+    ]
+    return client.chat.completions.create(
+        model=model_name,
+        messages=messages,
+        temperature=0,
+        max_tokens=200,
+    )
+def send_file_url(file_path: str, sampling_rate: int = 100):
+    """file url pointing to the time-series data."""
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Please determine whether an Earthquake event has occurred in the provided time-series data. If so, please specify the starting time point indices of the P-wave and S-wave in the event."
+                },
+                {
+                    "type": "time_series_url",
+                    "time_series_url": {
+                        "url": f"file://{file_path}",
+                        "sampling_rate": sampling_rate
+                    },
+                },
+            ],
+        }
+    ]
+    return client.chat.completions.create(
+        model=model_name,
+        messages=messages,
+        temperature=0,
+        max_tokens=200,
+    )
+response = send_base64("./0092638_seism.npy")
+# response = send_http_url("https://huggingface.co/internlm/Intern-S1-Pro/raw/main/0092638_seism.npy")
+# response = send_file_url("./0092638_seism.npy")
+print(response.choices[0].message)
+```
 ## Citation
 If you find this work useful, feel free to give us a cite.

deployment_guide.md CHANGED Viewed

@@ -9,7 +9,7 @@ The Intern-S1-Pro release is a 1T parameter model stored in FP8 format. Deployme
 ## LMDeploy
-Required version `lmdeploy>=0.12.0`
 - Tensor Parallelism
@@ -59,25 +59,7 @@ lmdeploy serve api_server \
 ## vLLM
-- Tensor Parallelism + Expert Parallelism
-```bash
-# start ray on node 0 and node 1
-# node 0
-export VLLM_ENGINE_READY_TIMEOUT_S=10000
-vllm serve internlm/Intern-S1-Pro \
-    --tensor-parallel-size 16 \
-    --enable-expert-parallel \
-    --distributed-executor-backend ray \
-    --max-model-len 65536 \
-    --trust-remote-code \
-    --reasoning-parser deepseek_r1 \
-    --enable-auto-tool-choice \
-    --tool-call-parser hermes
-```
-- Data Parallelism + Expert Parallelism
 ```bash
 # node 0

 ## LMDeploy
+Required version `lmdeploy>=0.12.1`
 - Tensor Parallelism
 ## vLLM
+You can use the vLLM nightly-built docker image `vllm/vllm-openai:nightly` to deploy. Refer to [using-docker](https://docs.vllm.ai/en/latest/deployment/docker/?h=docker) for more.
 ```bash
 # node 0