Update README.md
Browse files
README.md
CHANGED
|
@@ -156,7 +156,7 @@ print(response)
|
|
| 156 |
|
| 157 |
2. Given this Python script, create a Bash script which spins up the inference server within the [NeMo container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) (```docker pull nvcr.io/nvidia/nemo:24.01.framework```) and calls the Python script ``call_server.py``. The Bash script ``nemo_inference.sh`` is as follows,
|
| 158 |
|
| 159 |
-
```
|
| 160 |
NEMO_FILE=$1
|
| 161 |
WEB_PORT=1424
|
| 162 |
|
|
@@ -174,7 +174,6 @@ depends_on () {
|
|
| 174 |
}
|
| 175 |
|
| 176 |
|
| 177 |
-
|
| 178 |
/usr/bin/python3 /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_eval.py \
|
| 179 |
gpt_model_file=$NEMO_FILE \
|
| 180 |
pipeline_model_parallel_split_rank=0 \
|
|
@@ -210,7 +209,7 @@ depends_on () {
|
|
| 210 |
#!/bin/bash
|
| 211 |
#SBATCH -A SLURM-ACCOUNT
|
| 212 |
#SBATCH -p SLURM-PARITION
|
| 213 |
-
#SBATCH -N 2
|
| 214 |
#SBATCH -J generation
|
| 215 |
#SBATCH --ntasks-per-node=8
|
| 216 |
#SBATCH --gpus-per-node=8
|
|
@@ -220,8 +219,9 @@ RESULTS=<PATH_TO_YOUR_SCRIPTS_FOLDER>
|
|
| 220 |
OUTFILE="${RESULTS}/slurm-%j-%n.out"
|
| 221 |
ERRFILE="${RESULTS}/error-%j-%n.out"
|
| 222 |
MODEL=<PATH_TO>/Nemotron-4-340B-Instruct
|
| 223 |
-
|
| 224 |
MOUNTS="--container-mounts=<PATH_TO_YOUR_SCRIPTS_FOLDER>:/scripts,MODEL:/model"
|
|
|
|
| 225 |
read -r -d '' cmd <<EOF
|
| 226 |
bash /scripts/nemo_inference.sh /model
|
| 227 |
EOF
|
|
|
|
| 156 |
|
| 157 |
2. Given this Python script, create a Bash script which spins up the inference server within the [NeMo container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) (```docker pull nvcr.io/nvidia/nemo:24.01.framework```) and calls the Python script ``call_server.py``. The Bash script ``nemo_inference.sh`` is as follows,
|
| 158 |
|
| 159 |
+
```bash
|
| 160 |
NEMO_FILE=$1
|
| 161 |
WEB_PORT=1424
|
| 162 |
|
|
|
|
| 174 |
}
|
| 175 |
|
| 176 |
|
|
|
|
| 177 |
/usr/bin/python3 /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_eval.py \
|
| 178 |
gpt_model_file=$NEMO_FILE \
|
| 179 |
pipeline_model_parallel_split_rank=0 \
|
|
|
|
| 209 |
#!/bin/bash
|
| 210 |
#SBATCH -A SLURM-ACCOUNT
|
| 211 |
#SBATCH -p SLURM-PARITION
|
| 212 |
+
#SBATCH -N 2
|
| 213 |
#SBATCH -J generation
|
| 214 |
#SBATCH --ntasks-per-node=8
|
| 215 |
#SBATCH --gpus-per-node=8
|
|
|
|
| 219 |
OUTFILE="${RESULTS}/slurm-%j-%n.out"
|
| 220 |
ERRFILE="${RESULTS}/error-%j-%n.out"
|
| 221 |
MODEL=<PATH_TO>/Nemotron-4-340B-Instruct
|
| 222 |
+
CONTAINER="nvcr.io/nvidia/nemo:24.01.framework"
|
| 223 |
MOUNTS="--container-mounts=<PATH_TO_YOUR_SCRIPTS_FOLDER>:/scripts,MODEL:/model"
|
| 224 |
+
|
| 225 |
read -r -d '' cmd <<EOF
|
| 226 |
bash /scripts/nemo_inference.sh /model
|
| 227 |
EOF
|