anonymous12321
/

Council_Topics_Classifier_PT

+language:
+  - pt
+  - en
+license: cc-by-nc-nd-4.0
+colorTo: red
+sdk: docker
+app_port: 8501
+tags:
+  - streamlit
+  - text-segmentation
+  - topic-segmentation
+  - bert
+  - next-sentence-prediction
+  - document-segmentation
+  - meeting-minutes
+library_name: transformers
+base_model:
+  - neuralmind/bert-base-portuguese-cased
+NSP-CouncilSeg: Linear Text Segmentation for Municipal Meeting Minutes
+Model Description
+NSP-CouncilSeg is a fine-tuned BERT model specialized in Text Segmentation for municipal council meeting minutes. The model uses Next Sentence Prediction (NSP) to identify topic boundaries in long-form documents, making it particularly effective for segmenting administrative and governmental meeting minutes.
+Try out the model: Hugging Face Space Demo
+Key Features
+    🎯 Specialized for Meeting Minutes: Fine-tuned on Portuguese municipal council meeting minutes
+    🌍 Multilingual Capability: Works with both Portuguese and English text
+    ⚡ Fast Inference: Efficient BERT-base architecture for real-time segmentation
+    📊 High Accuracy: Achieves BED F-measure score of 0.79 on CouncilSeg dataset
+    🔄 Sentence-Level Segmentation: Identifies topic boundaries at sentence granularity
+Model Details
+    Base Model: google-bert/bert-base-uncased
+    Architecture: BERT with Next Sentence Prediction head
+    Parameters: 110M
+    Max Sequence Length: 512 tokens
+    Fine-tuning Dataset: CouncilSeg (Portuguese Municipal Meeting Minutes)
+    Fine-tuning Method: Focal Loss with boundary-aware weighting
+    Training Framework: PyTorch + Transformers
+How It Works
+The model predicts whether two consecutive sentences belong to the same topic (label 0: "is_next") or represent a topic transition (label 1: "not_next"). By applying this classifier sequentially across all sentence pairs in a document, it identifies topic boundaries.
+Sentence A: "By the President, minutes no. 28 of 20.12.2023 were present at the meeting."
+Sentence B: "After considering and analyzing the matter, the Municipal Executive unanimously decided to approve minute no. 28 of 12.20.2023."
+→ Prediction: Same Topic (confidence: 76%)
+Sentence A: "After considering and analyzing the matter, the Municipal Executive unanimously decided to approve minute no. 28 of 12.20.2023."
+Sentence B: "There were no various processes and requests to submit."
+→ Prediction: Topic Boundary (confidence: 82%)
+Usage
+Quick Start with Transformers
+from transformers import AutoTokenizer, AutoModelForNextSentencePrediction
+import torch
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("anonymous15135/nsp-councilseg")
+model = AutoModelForNextSentencePrediction.from_pretrained("anonymous15135/nsp-councilseg")
+# Prepare input
+sentence_a = "By the President, minutes no. 28 of 20.12.2023 were present at the meeting."
+sentence_b = "After considering and analyzing the matter, the Municipal Executive unanimously decided to approve minute no. 28 of 12.20.2023."
+# Tokenize
+inputs = tokenizer(sentence_a, sentence_b, return_tensors="pt")
+# Predict
+with torch.no_grad():
+    outputs = model(**inputs)
+    logits = outputs.logits
+    probs = torch.softmax(logits, dim=1)
+# Interpret results
+is_next_prob = probs[0][0].item()
+not_next_prob = probs[0][1].item()
+print(f"Is Next (same topic): {is_next_prob:.3f}")
+print(f"Not Next (topic boundary): {not_next_prob:.3f}")
+if not_next_prob > 0.5:
+    print("🔴 Topic boundary detected!")
+else:
+    print("🟢 Same topic continues")
+Evaluation Results
+CouncilSeg Test Set
+Metric 	Score
+BED F-measure 	0.79
+Boundary Similarity 	0.59
+Pk Score 	0.08
+WindowDiff 	0.10
+Limitations
+    Domain Specificity: Best performance on administrative/governmental meeting minutes
+    Language: Optimized for Portuguese; English performance may vary
+    Document Length: Designed for documents with 10-50 segments
+    Context Window: Limited to 512 tokens per sentence pair
+    Ambiguous Boundaries: May struggle with subtle topic transitions
+Model Card Contact
+For questions or feedback, please open an issue in the model repository.
+License
+This model is released under the Attribution-NonCommercial-NoDerivatives 4.0 International