deepseek-ai
/

DeepSeek-V3.2-Exp

Text Generation

Model card Files Files and versions

msr2000 commited on 18 days ago

Commit

194c67e

·

1 Parent(s): 1938e9d

Update README

Files changed (2) hide show

README.md +2 -0
inference/model.py +0 -2

README.md CHANGED Viewed

@@ -79,7 +79,9 @@ This experimental release represents our ongoing research into more efficient tr
 | SWE-bench Multilingual | 57.8 | 57.9 |
 | Terminal-bench | 36.7 | 37.7 |
 ## How to Run Locally

 | SWE-bench Multilingual | 57.8 | 57.9 |
 | Terminal-bench | 36.7 | 37.7 |
+## Update
+- 2025.11.17: **We have identified that previous versions of the inference demo code contained an implementation discrepancy in Rotary Position Embedding (RoPE) within the indexer module, potentially leading to degraded model performance.** Specifically, the input tensor to RoPE in the indexer module requires a non-interleaved layout, whereas RoPE in the MLA module expects an interleaved layout. This issue has now been resolved. Please refer to the updated version of the inference demo code and take note of this implementation detail.
 ## How to Run Locally

inference/model.py CHANGED Viewed

@@ -281,7 +281,6 @@ class RMSNorm(nn.Module):
         super().__init__()
         self.dim = dim
         self.eps = eps
-        # rmsnorm in the checkpoint is stored in bf16, while the parameter here is stored in fp32 for convenient.
         self.weight = nn.Parameter(torch.ones(dim, dtype=torch.float32))
     def forward(self, x: torch.Tensor, residual: Optional[torch.Tensor] = None):
@@ -315,7 +314,6 @@ class LayerNorm(nn.Module):
         super().__init__()
         self.dim = dim
         self.eps = eps
-        # layernorm in the checkpoint is stored in bf16, while the parameters here are stored in fp32 for convenient.
         self.weight = nn.Parameter(torch.ones(dim, dtype=torch.float32))
         self.bias = nn.Parameter(torch.zeros(dim, dtype=torch.float32))

         super().__init__()
         self.dim = dim
         self.eps = eps
         self.weight = nn.Parameter(torch.ones(dim, dtype=torch.float32))
     def forward(self, x: torch.Tensor, residual: Optional[torch.Tensor] = None):
         super().__init__()
         self.dim = dim
         self.eps = eps
         self.weight = nn.Parameter(torch.ones(dim, dtype=torch.float32))
         self.bias = nn.Parameter(torch.zeros(dim, dtype=torch.float32))