nvidia
/

Nemotron-4-340B-Instruct

Model card Files Files and versions

jiaqiz commited on Jun 14, 2024

Commit

f2ea857

·

verified ·

1 Parent(s): 1c7a68e

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -156,7 +156,7 @@ print(response)
 2. Given this Python script, create a Bash script which spins up the inference server within the [NeMo container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) (```docker pull nvcr.io/nvidia/nemo:24.01.framework```) and calls the Python script ``call_server.py``. The Bash script ``nemo_inference.sh`` is as follows,
-```
 NEMO_FILE=$1
 WEB_PORT=1424
@@ -174,7 +174,6 @@ depends_on () {
 }
 /usr/bin/python3 /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_eval.py \
         gpt_model_file=$NEMO_FILE \
         pipeline_model_parallel_split_rank=0 \
@@ -210,7 +209,7 @@ depends_on () {
 #!/bin/bash
 #SBATCH -A SLURM-ACCOUNT
 #SBATCH -p SLURM-PARITION
-#SBATCH -N 2 # number of nodes
 #SBATCH -J generation
 #SBATCH --ntasks-per-node=8
 #SBATCH --gpus-per-node=8
@@ -220,8 +219,9 @@ RESULTS=<PATH_TO_YOUR_SCRIPTS_FOLDER>
 OUTFILE="${RESULTS}/slurm-%j-%n.out"
 ERRFILE="${RESULTS}/error-%j-%n.out"
 MODEL=<PATH_TO>/Nemotron-4-340B-Instruct
 MOUNTS="--container-mounts=<PATH_TO_YOUR_SCRIPTS_FOLDER>:/scripts,MODEL:/model"
 read -r -d '' cmd <<EOF
 bash /scripts/nemo_inference.sh /model
 EOF

 2. Given this Python script, create a Bash script which spins up the inference server within the [NeMo container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) (```docker pull nvcr.io/nvidia/nemo:24.01.framework```) and calls the Python script ``call_server.py``. The Bash script ``nemo_inference.sh`` is as follows,
+```bash
 NEMO_FILE=$1
 WEB_PORT=1424
 }
 /usr/bin/python3 /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_eval.py \
         gpt_model_file=$NEMO_FILE \
         pipeline_model_parallel_split_rank=0 \
 #!/bin/bash
 #SBATCH -A SLURM-ACCOUNT
 #SBATCH -p SLURM-PARITION
+#SBATCH -N 2
 #SBATCH -J generation
 #SBATCH --ntasks-per-node=8
 #SBATCH --gpus-per-node=8
 OUTFILE="${RESULTS}/slurm-%j-%n.out"
 ERRFILE="${RESULTS}/error-%j-%n.out"
 MODEL=<PATH_TO>/Nemotron-4-340B-Instruct
+CONTAINER="nvcr.io/nvidia/nemo:24.01.framework"
 MOUNTS="--container-mounts=<PATH_TO_YOUR_SCRIPTS_FOLDER>:/scripts,MODEL:/model"
 read -r -d '' cmd <<EOF
 bash /scripts/nemo_inference.sh /model
 EOF