nvidia
/

Nemotron-4-340B-Instruct

Model card Files Files and versions

jiaqiz commited on Jun 14, 2024

Commit

1c7a68e

·

verified ·

1 Parent(s): 5738a00

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -104,7 +104,7 @@ Deployment and inference with Nemotron-4-340B-Instruct can be done in three step
 Create a Python script to interact with the deployed model.
 Create a Bash script to start the inference server
-Schedule a Slurm job to distribute the model across 4 nodes and associate them with the inference server.
 1. Define the Python script ``call_server.py``
@@ -154,7 +154,7 @@ if response.endswith("<extra_id_1>"):
 print(response)
 ```
-2. Given this Python script, create a Bash script which spins up the inference server within the [NeMo container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) (docker pull nvcr.io/nvidia/nemo:24.01.framework) and calls the Python script ``call_server.py``. The Bash script ``nemo_inference.sh`` is as follows,
 ```
 NEMO_FILE=$1
@@ -204,7 +204,7 @@ depends_on () {
 ```
-3. Launch ``nemo_inference.sh`` with a Slurm script defined like below, which starts a 4-node job for model inference.
 ```
 #!/bin/bash

 Create a Python script to interact with the deployed model.
 Create a Bash script to start the inference server
+Schedule a Slurm job to distribute the model across 2 nodes and associate them with the inference server.
 1. Define the Python script ``call_server.py``
 print(response)
 ```
+2. Given this Python script, create a Bash script which spins up the inference server within the [NeMo container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) (```docker pull nvcr.io/nvidia/nemo:24.01.framework```) and calls the Python script ``call_server.py``. The Bash script ``nemo_inference.sh`` is as follows,
 ```
 NEMO_FILE=$1
 ```
+3. Launch ``nemo_inference.sh`` with a Slurm script defined like below, which starts a 2-node job for model inference.
 ```
 #!/bin/bash