Update README.md
Browse files
README.md
CHANGED
|
@@ -104,7 +104,7 @@ Deployment and inference with Nemotron-4-340B-Instruct can be done in three step
|
|
| 104 |
|
| 105 |
Create a Python script to interact with the deployed model.
|
| 106 |
Create a Bash script to start the inference server
|
| 107 |
-
Schedule a Slurm job to distribute the model across
|
| 108 |
|
| 109 |
1. Define the Python script ``call_server.py``
|
| 110 |
|
|
@@ -154,7 +154,7 @@ if response.endswith("<extra_id_1>"):
|
|
| 154 |
print(response)
|
| 155 |
```
|
| 156 |
|
| 157 |
-
2. Given this Python script, create a Bash script which spins up the inference server within the [NeMo container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) (docker pull nvcr.io/nvidia/nemo:24.01.framework) and calls the Python script ``call_server.py``. The Bash script ``nemo_inference.sh`` is as follows,
|
| 158 |
|
| 159 |
```
|
| 160 |
NEMO_FILE=$1
|
|
@@ -204,7 +204,7 @@ depends_on () {
|
|
| 204 |
```
|
| 205 |
|
| 206 |
|
| 207 |
-
3. Launch ``nemo_inference.sh`` with a Slurm script defined like below, which starts a
|
| 208 |
|
| 209 |
```
|
| 210 |
#!/bin/bash
|
|
|
|
| 104 |
|
| 105 |
Create a Python script to interact with the deployed model.
|
| 106 |
Create a Bash script to start the inference server
|
| 107 |
+
Schedule a Slurm job to distribute the model across 2 nodes and associate them with the inference server.
|
| 108 |
|
| 109 |
1. Define the Python script ``call_server.py``
|
| 110 |
|
|
|
|
| 154 |
print(response)
|
| 155 |
```
|
| 156 |
|
| 157 |
+
2. Given this Python script, create a Bash script which spins up the inference server within the [NeMo container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) (```docker pull nvcr.io/nvidia/nemo:24.01.framework```) and calls the Python script ``call_server.py``. The Bash script ``nemo_inference.sh`` is as follows,
|
| 158 |
|
| 159 |
```
|
| 160 |
NEMO_FILE=$1
|
|
|
|
| 204 |
```
|
| 205 |
|
| 206 |
|
| 207 |
+
3. Launch ``nemo_inference.sh`` with a Slurm script defined like below, which starts a 2-node job for model inference.
|
| 208 |
|
| 209 |
```
|
| 210 |
#!/bin/bash
|