Transformers

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v5.3.0).

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

This model was released on 2024-07-15 and added to Hugging Face Transformers on 2024-01-17.

Qwen2

Qwen2 is a family of large language models (pretrained, instruction-tuned and mixture-of-experts) available in sizes from 0.5B to 72B parameters. The models are built on the Transformer architecture featuring enhancements like group query attention (GQA), rotary positional embeddings (RoPE), a mix of sliding window and full attention, and dual chunk attention with YARN for training stability. Qwen2 models support multiple languages and context lengths up to 131,072 tokens.

You can find all the official Qwen2 checkpoints under the Qwen2 collection.

Click on the Qwen2 models in the right sidebar for more examples of how to apply Qwen2 to different language tasks.

The example below demonstrates how to generate text with Pipeline, AutoModel, and from the command line using the instruction-tuned models.

Pipeline

AutoModel

transformers CLI

Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the Quantization overview for more available quantization backends.

The example below uses bitsandbytes to quantize the weights to 4-bits.

# pip install -U flash-attn --no-build-isolation
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-7B")
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-7B",
    dtype=torch.bfloat16,
    device_map="auto",
    quantization_config=quantization_config,
    attn_implementation="flash_attention_2"
)

inputs = tokenizer("The Qwen2 model family is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Notes

Ensure your Transformers library version is up-to-date. Qwen2 requires Transformers>=4.37.0 for full support.

Transformers

Qwen2

Notes

Qwen2Config

class transformers.Qwen2Config

Qwen2Tokenizer

class transformers.Qwen2Tokenizer

save_vocabulary

Qwen2TokenizerFast

class transformers.Qwen2Tokenizer

Qwen2RMSNorm

class transformers.Qwen2RMSNorm

forward

Qwen2Model

class transformers.Qwen2Model

forward

Qwen2ForCausalLM

class transformers.Qwen2ForCausalLM

forward

Qwen2ForSequenceClassification

class transformers.Qwen2ForSequenceClassification

forward

Qwen2ForTokenClassification

class transformers.Qwen2ForTokenClassification

forward

Qwen2ForQuestionAnswering

class transformers.Qwen2ForQuestionAnswering

forward