Update README.md
Browse files
README.md
CHANGED
|
@@ -63,6 +63,8 @@ The base model, Nemotron-4-340B, was trained with a global batch-size of 2304, a
|
|
| 63 |
|
| 64 |
1. We will spin up an inference server and then call the inference server in a python script. Let’s first define the python script ``call_server.py``
|
| 65 |
|
|
|
|
|
|
|
| 66 |
|
| 67 |
headers = {"Content-Type": "application/json"}
|
| 68 |
|
|
@@ -100,7 +102,8 @@ prompt = PROMPT_TEMPLATE.format(prompt=question)
|
|
| 100 |
print(prompt)
|
| 101 |
|
| 102 |
response = get_generation(prompt, greedy=True, add_BOS=False, token_to_gen=1024, min_tokens=1, temp=1.0, top_p=1.0, top_k=0, repetition=1.0, batch=False)
|
| 103 |
-
print(response)
|
|
|
|
| 104 |
|
| 105 |
2. Given this python script, we will create a bash script, which spins up the inference server within the [NeMo container](https://github.com/NVIDIA/NeMo/blob/main/Dockerfile) and calls the python script ``call_server.py``. The bash script ``nemo_inference.sh`` is as follows,
|
| 106 |
|
|
|
|
| 63 |
|
| 64 |
1. We will spin up an inference server and then call the inference server in a python script. Let’s first define the python script ``call_server.py``
|
| 65 |
|
| 66 |
+
```python
|
| 67 |
+
|
| 68 |
|
| 69 |
headers = {"Content-Type": "application/json"}
|
| 70 |
|
|
|
|
| 102 |
print(prompt)
|
| 103 |
|
| 104 |
response = get_generation(prompt, greedy=True, add_BOS=False, token_to_gen=1024, min_tokens=1, temp=1.0, top_p=1.0, top_k=0, repetition=1.0, batch=False)
|
| 105 |
+
print(response)```
|
| 106 |
+
|
| 107 |
|
| 108 |
2. Given this python script, we will create a bash script, which spins up the inference server within the [NeMo container](https://github.com/NVIDIA/NeMo/blob/main/Dockerfile) and calls the python script ``call_server.py``. The bash script ``nemo_inference.sh`` is as follows,
|
| 109 |
|