MovieCORE: COgnitive REasoning in Movies
A Video Question Answering Dataset for Probing Deeper Cognitive Understanding of Movie Content
π Overview
MovieCORE is a comprehensive video question answering (VQA) dataset specifically designed to evaluate and probe deeper cognitive understanding of movie content. Unlike traditional VQA datasets that focus on surface-level visual understanding, MovieCORE challenges models to demonstrate sophisticated reasoning about narrative structures, character development, thematic elements, and complex temporal relationships within cinematic content.
ποΈ Data Preparation
The MovieCORE dataset builds upon video content from MovieChat. To get started:
Video Data
Download the video files from MovieChat's HuggingFace repositories:
- Training Data: MovieChat-1K Train
- Test Data: MovieChat-1K Test
Annotations
Access our annotations on HuggingFace:
- MovieCORE Annotations: π€ HuggingFace Dataset
Extract and organize the data according to your model's requirements, then use our annotations for evaluation.
π Quick Start
Installation
git clone https://github.com/joslefaure/MovieCORE.git
cd MovieCORE
π― Baselines
- We have provided the script to run HERMES (ICCV'25) on MovieCORE. Please check out the linked project.
π Evaluation Dimensions
MovieCORE employs a comprehensive multi-dimensional evaluation framework to assess model performance across different aspects of cognitive understanding:
| Dimension | Description |
|---|---|
| π― Accuracy | Measures semantic similarity between predicted and ground truth answers |
| π Comprehensiveness | Assesses coverage of all key aspects mentioned in the ground truth |
| π§ Depth | Evaluates level of reasoning and insight demonstrated in predictions |
| π Evidence | Checks quality and relevance of supporting evidence provided |
| π Coherence | Measures logical flow, organization, and clarity of responses |
Each dimension provides unique insights into different cognitive capabilities required for deep video understanding.
π» Usage
Evaluation Script
Evaluate your model's performance on MovieCORE using our evaluation script:
export OPENAI_API_KEY='your_openai_api_key'
python evaluate_moviecore.py --pred_path path/to/your/predictions.json
π Input Format
Your predictions should follow this JSON structure:
{
"video_1.mp4": [
{
"question": "How does the video depict the unique adaptations of the species in the Sahara Desert, and what roles do these species play in their ecosystem?",
"answer": "The ground truth answer.",
"pred": "Your model's prediction.",
"classification": "the question classification"
},
{
"question": "The second question for video 1?",
"answer": "The ground truth answer.",
"pred": "Your model's prediction.",
"classification": "the question classification"
}
],
"video_2.mp4": [
{
"question": "The only question for video 2",
"answer": "The ground truth answer.",
"pred": "Your model's prediction.",
"classification": "the question classification"
}
]
}
π Output
The evaluation script provides:
- Overall scores across all dimensions
- Classification-specific performance metrics
- Detailed breakdowns for comprehensive analysis
π Citation
If you use MovieCORE in your research, please cite our paper:
@misc{faure2025moviecorecognitivereasoningmovies,
title={MovieCORE: COgnitive REasoning in Movies},
author={Gueter Josmy Faure and Min-Hung Chen and Jia-Fong Yeh and Ying Cheng and Hung-Ting Su and Yung-Hao Tang and Shang-Hong Lai and Winston H. Hsu},
year={2025},
eprint={2508.19026},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.19026},
}
π€ Contributing
We welcome contributions to MovieCORE! Please feel free to:
- Report issues or bugs
- Suggest improvements or new features
- Submit baseline implementations
- Provide feedback on the evaluation framework
π License
This dataset is provided under the MIT License. See LICENSE for more details.
π¬ Advancing Video Understanding Through Cognitive Evaluation π¬
\ud83d\udcd6 Paper | \ud83e\udd17 Dataset | \ud83d\udcbb Code
Model tree for Joslefaure/HERMES
Base model
lmsys/vicuna-7b-v1.1