CodeGoat24
/

UnifiedReward-Think-7b

Safetensors

llava_qwen

Model card Files Files and versions

xet

Community

Add library_name and pipeline_tag metadata

by nielsr HF Staff - opened May 8, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+46

-20

Files changed (1) hide show

README.md +46 -20

README.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
-license: mit
 datasets:
 - CodeGoat24/HPD
 - CodeGoat24/OIP
@@ -10,8 +11,9 @@ datasets:
 - CodeGoat24/Text-2-Video-Human-Preferences
 - CodeGoat24/OpenAI-4o_t2i_human_preference
 - CodeGoat24/ImageGen_Reward_Cold_Start
-base_model:
-- CodeGoat24/UnifiedReward-7b
 ---
 ## Model Summary
@@ -65,27 +67,51 @@ Query = 'What does this image present?'
 R1 = 'The image is a black and white sketch of a line that appears to be in the shape of a cross. The line is a simple and straightforward representation of the cross shape, with two straight lines intersecting at a point.'
 R2 = 'This is a handwritten number seven.'
-question = ("<image>\nGiven a question and a reference image, please analyze in detail the two provided answers (Answer 1 and Answer 2). " \
-            "Evaluate them based on the following three core dimensions:\n" \
-            "1. Semantic accuracy: How well the answer reflects the visual content of the image\n" \
-            "2. Correctness: Whether the answer is logically and factually correct\n" \
-            "3. Clarity: Whether the answer is clearly and fluently expressed\n" \
             "You may also consider additional dimensions if you find them relevant (e.g., reasoning ability, attention to detail, multimodal grounding, etc.). " \
             "For each dimension, provide a score from 1 to 10 for both answers, and briefly explain your reasoning. " \
             "Then, compute the total score for each answer by explicitly adding the scores for all dimensions and showing the full calculation. " \
             "Enclose your full reasoning within <think> and </think> tags. " \
-            "Then, in the <answer> tag, output exactly one of the following: 'Answer 1 is better' or 'Answer 2 is better'. No other text is allowed in the <answer> section.\n\n" \
-            "Example format:\n" \
-            "<think>\n" \
-            "1. Semantic accuracy: Answer 1 (9/10) - ...; Answer 2 (7/10) - ...\n" \
-            "2. Correctness: Answer 1 (8/10) - ...; Answer 2 (7/10) - ...\n" \
-            "3. Clarity: Answer 1 (9/10) - ...; Answer 2 (8/10) - ...\n" \
-            "[Additional dimensions if any]: Answer 1 (6/10) - ...; Answer 2 (7/10) - ...\n" \
-            "Total score:\nAnswer 1: 9+8+9+6=32\nAnswer 2: 7+7+8+7=29\n" \
-            "</think>\n" \
-            "<answer>Answer 1 is better</answer>\n\n" \
-            "**Note: In the example above, scores and the final answer are placeholders meant only to demonstrate the format. Your actual evaluation should be based on the quality of two given answers.**\n\n"
-            f"Your task is provided as follows:\nQuestion: [{Query}]\nAnswer 1: [{R1}]\nAnswer 2: [{R2}]")
 conv = copy.deepcopy(conv_templates[conv_template])
 conv.append_message(conv.roles[0], question)

 ---
+base_model:
+- CodeGoat24/UnifiedReward-7b
 datasets:
 - CodeGoat24/HPD
 - CodeGoat24/OIP
 - CodeGoat24/Text-2-Video-Human-Preferences
 - CodeGoat24/OpenAI-4o_t2i_human_preference
 - CodeGoat24/ImageGen_Reward_Cold_Start
+license: mit
+library_name: transformers
+pipeline_tag: image-text-to-text
 ---
 ## Model Summary
 R1 = 'The image is a black and white sketch of a line that appears to be in the shape of a cross. The line is a simple and straightforward representation of the cross shape, with two straight lines intersecting at a point.'
 R2 = 'This is a handwritten number seven.'
+question = ("<image>
+Given a question and a reference image, please analyze in detail the two provided answers (Answer 1 and Answer 2). " \
+            "Evaluate them based on the following three core dimensions:
+" \
+            "1. Semantic accuracy: How well the answer reflects the visual content of the image
+" \
+            "2. Correctness: Whether the answer is logically and factually correct
+" \
+            "3. Clarity: Whether the answer is clearly and fluently expressed
+" \
             "You may also consider additional dimensions if you find them relevant (e.g., reasoning ability, attention to detail, multimodal grounding, etc.). " \
             "For each dimension, provide a score from 1 to 10 for both answers, and briefly explain your reasoning. " \
             "Then, compute the total score for each answer by explicitly adding the scores for all dimensions and showing the full calculation. " \
             "Enclose your full reasoning within <think> and </think> tags. " \
+            "Then, in the <answer> tag, output exactly one of the following: 'Answer 1 is better' or 'Answer 2 is better'. No other text is allowed in the <answer> section.
+" \
+            "Example format:
+" \
+            "<think>
+" \
+            "1. Semantic accuracy: Answer 1 (9/10) - ...; Answer 2 (7/10) - ...
+" \
+            "2. Correctness: Answer 1 (8/10) - ...; Answer 2 (7/10) - ...
+" \
+            "3. Clarity: Answer 1 (9/10) - ...; Answer 2 (8/10) - ...
+" \
+            "[Additional dimensions if any]: Answer 1 (6/10) - ...; Answer 2 (7/10) - ...
+" \
+            "Total score:
+Answer 1: 9+8+9+6=32
+Answer 2: 7+7+8+7=29
+" \
+            "</think>
+" \
+            "<answer>Answer 1 is better</answer>
+" \
+            "**Note: In the example above, scores and the final answer are placeholders meant only to demonstrate the format. Your actual evaluation should be based on the quality of two given answers.**
+"
+            f"Your task is provided as follows:
+Question: [{Query}]
+Answer 1: [{R1}]
+Answer 2: [{R2}]")
 conv = copy.deepcopy(conv_templates[conv_template])
 conv.append_message(conv.roles[0], question)