| | --- |
| | language: en |
| | library_name: transformers |
| | pipeline_tag: text-classification |
| | license: mit |
| | tags: |
| | - sentiment-analysis |
| | - distilbert |
| | - sequence-classification |
| | - academic-peer-review |
| | - openreview |
| | datasets: |
| | - nhop/OpenReview |
| | base_model: |
| | - distilbert/distilbert-base-uncased |
| | --- |
| | |
| | # Academic Sentiment Classifier (DistilBERT) |
| |
|
| | DistilBERT-based sequence classification model that predicts the sentiment polarity of academic peer-review text (binary: negative vs positive). It supports research on evaluating the sentiment of scholarly reviews and AI-generated critique, enabling large-scale, reproducible measurements for academic-style content. |
| |
|
| | ## Model details |
| |
|
| | - Architecture: DistilBERT for Sequence Classification (2 labels) |
| | - Max input length used during training: 512 tokens |
| | - Labels: |
| | - LABEL_0 -> negative |
| | - LABEL_1 -> positive |
| | - Format: `safetensors` |
| |
|
| | ## Intended uses & limitations |
| |
|
| | Intended uses: |
| |
|
| | - Analyze sentiment of peer-review snippets, full reviews, or similar scholarly discourse. |
| |
|
| | Limitations: |
| |
|
| | - Binary polarity only (no neutral class); confidence scores should be interpreted with care. |
| | - Domain-specific: optimized for academic review-style English text; may underperform on general-domain data. |
| | - Not a replacement for human judgement or editorial decision-making. |
| |
|
| | Ethical considerations and bias: |
| |
|
| | - Scholarly reviews can contain technical jargon, hedging, and nuanced tone; polarity is an imperfect proxy for quality or fairness. |
| | - Potential biases may reflect those present in the underlying corpus. |
| |
|
| | ## Training data |
| |
|
| | The model was fine-tuned on a corpus of academic peer-review text curated from OpenReview review texts. The task is binary sentiment classification over review text spans. |
| |
|
| | Note: If you plan to use or extend the underlying data, please review the terms of use for OpenReview and any relevant dataset licenses. |
| |
|
| | ## Training procedure (high level) |
| |
|
| | - Base model: DistilBERT (transformers) |
| | - Objective: single-label binary classification |
| | - Tokenization: standard DistilBERT tokenizer, truncation to 512 tokens |
| | - Optimizer/scheduler: standard Trainer defaults (AdamW with linear schedule) |
| |
|
| | Exact hyperparameters may vary across runs; typical training uses AdamW with a linear learning rate schedule and truncation to 512 tokens. |
| |
|
| | ## How to use |
| |
|
| | Basic pipeline usage: |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | clf = pipeline( |
| | task="text-classification", |
| | model="EvilScript/academic-sentiment-classifier", |
| | tokenizer="EvilScript/academic-sentiment-classifier", |
| | return_all_scores=False, |
| | ) |
| | |
| | text = "The paper is clearly written and provides strong empirical support for the claims." |
| | print(clf(text)) |
| | # Example output: [{'label': 'LABEL_1', 'score': 0.97}] # LABEL_1 -> positive |
| | ``` |
| |
|
| | If you prefer friendly labels, you can map them: |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | id2name = {"LABEL_0": "negative", "LABEL_1": "positive"} |
| | clf = pipeline("text-classification", model="EvilScript/academic-sentiment-classifier") |
| | res = clf("This section lacks clarity and the experiments are inconclusive.")[0] |
| | res["label"] = id2name.get(res["label"], res["label"]) # map to human-friendly label |
| | print(res) |
| | ``` |
| |
|
| | Batch inference: |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | import torch |
| | |
| | device = 0 if torch.cuda.is_available() else -1 |
| | tok = AutoTokenizer.from_pretrained("EvilScript/academic-sentiment-classifier") |
| | model = AutoModelForSequenceClassification.from_pretrained("EvilScript/academic-sentiment-classifier") |
| | |
| | texts = [ |
| | "I recommend acceptance; the methodology is solid and results are convincing.", |
| | "Major concerns remain; the evaluation is incomplete and unclear.", |
| | ] |
| | |
| | inputs = tok(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") |
| | with torch.no_grad(): |
| | logits = model(**inputs).logits |
| | probs = torch.softmax(logits, dim=-1) |
| | pred_ids = probs.argmax(dim=-1) |
| | |
| | # Map to friendly labels |
| | id2name = {0: "negative", 1: "positive"} |
| | preds = [id2name[i.item()] for i in pred_ids] |
| | print(list(zip(texts, preds))) |
| | ``` |
| |
|
| | ## Evaluation |
| |
|
| | If you compute new metrics on public datasets or benchmarks, consider sharing them via a pull request to this model card. |
| |
|
| | ## License |
| |
|
| | The model weights and card are released under the MIT license. Review and comply with any third-party data licenses if reusing the training data. |
| |
|
| | ## Citation |
| |
|
| | If you use this model, please cite the project: |
| |
|
| | ```bibtex |
| | @misc{federico_torrielli_2025, |
| | author = { Federico Torrielli and Stefano Locci }, |
| | title = { academic-sentiment-classifier }, |
| | year = 2025, |
| | url = { https://huggingface.co/EvilScript/academic-sentiment-classifier }, |
| | doi = { 10.57967/hf/6535 }, |
| | publisher = { Hugging Face } |
| | } |
| | ``` |