Video-Text-to-Text
MovieCORE Icon

MovieCORE: COgnitive REasoning in Movies

A Video Question Answering Dataset for Probing Deeper Cognitive Understanding of Movie Content

arXiv Hugging Face Paper Hugging Face Dataset GitHub Code Project Page License

MovieCore Dataset Teaser

πŸ“– Overview

MovieCORE is a comprehensive video question answering (VQA) dataset specifically designed to evaluate and probe deeper cognitive understanding of movie content. Unlike traditional VQA datasets that focus on surface-level visual understanding, MovieCORE challenges models to demonstrate sophisticated reasoning about narrative structures, character development, thematic elements, and complex temporal relationships within cinematic content.

πŸ—‚οΈ Data Preparation

The MovieCORE dataset builds upon video content from MovieChat. To get started:

Video Data

Download the video files from MovieChat's HuggingFace repositories:

Annotations

Access our annotations on HuggingFace:

Extract and organize the data according to your model's requirements, then use our annotations for evaluation.

πŸš€ Quick Start

Installation

git clone https://github.com/joslefaure/MovieCORE.git
cd MovieCORE

🎯 Baselines

  • We have provided the script to run HERMES (ICCV'25) on MovieCORE. Please check out the linked project.

πŸ“Š Evaluation Dimensions

MovieCORE employs a comprehensive multi-dimensional evaluation framework to assess model performance across different aspects of cognitive understanding:

Dimension Description
🎯 Accuracy Measures semantic similarity between predicted and ground truth answers
πŸ“‹ Comprehensiveness Assesses coverage of all key aspects mentioned in the ground truth
🧠 Depth Evaluates level of reasoning and insight demonstrated in predictions
πŸ” Evidence Checks quality and relevance of supporting evidence provided
πŸ”— Coherence Measures logical flow, organization, and clarity of responses

Each dimension provides unique insights into different cognitive capabilities required for deep video understanding.

πŸ’» Usage

Evaluation Script

Evaluate your model's performance on MovieCORE using our evaluation script:

export OPENAI_API_KEY='your_openai_api_key'
python evaluate_moviecore.py --pred_path path/to/your/predictions.json

πŸ“ Input Format

Your predictions should follow this JSON structure:

{
    "video_1.mp4": [
        {
            "question": "How does the video depict the unique adaptations of the species in the Sahara Desert, and what roles do these species play in their ecosystem?",
            "answer": "The ground truth answer.",
            "pred": "Your model's prediction.",
            "classification": "the question classification"
        },
        {
            "question": "The second question for video 1?",
            "answer": "The ground truth answer.",
            "pred": "Your model's prediction.",
            "classification": "the question classification"
        }
    ],
    "video_2.mp4": [
        {
            "question": "The only question for video 2",
            "answer": "The ground truth answer.",
            "pred": "Your model's prediction.",
            "classification": "the question classification"
        }
    ]
}

πŸ“ˆ Output

The evaluation script provides:

  • Overall scores across all dimensions
  • Classification-specific performance metrics
  • Detailed breakdowns for comprehensive analysis

πŸ“š Citation

If you use MovieCORE in your research, please cite our paper:

@misc{faure2025moviecorecognitivereasoningmovies,
      title={MovieCORE: COgnitive REasoning in Movies}, 
      author={Gueter Josmy Faure and Min-Hung Chen and Jia-Fong Yeh and Ying Cheng and Hung-Ting Su and Yung-Hao Tang and Shang-Hong Lai and Winston H. Hsu},
      year={2025},
      eprint={2508.19026},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.19026}, 
}

🀝 Contributing

We welcome contributions to MovieCORE! Please feel free to:

  • Report issues or bugs
  • Suggest improvements or new features
  • Submit baseline implementations
  • Provide feedback on the evaluation framework

πŸ“„ License

This dataset is provided under the MIT License. See LICENSE for more details.


🎬 Advancing Video Understanding Through Cognitive Evaluation 🎬

\ud83d\udcd6 Paper | \ud83e\udd17 Dataset | \ud83d\udcbb Code

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Joslefaure/HERMES

Finetuned
(2)
this model

Datasets used to train Joslefaure/HERMES