Data4Good Case Challenge
📖 Background
Artificial Intelligence (AI) is rapidly transforming education by providing students with instant access to information and adaptive learning tools. Still, it also introduces significant risks, such as the spread of misinformation and fabricated content. Research indicates that large language models (LLMs) often confidently generate factually incorrect or “hallucinated” responses, which can mislead learners and erode trust in digital learning platforms.
The 4th Annual Data4Good Competition challenges participants to develop innovative analytics solutions to detect and improve factuality in AI-generated educational content, ensuring that AI advances knowledge rather than confusion.
💾 The data
The data provided is a Questions/Answer dataset to determine if the answer is factual, not factual (contradiction), or irrelevant to the question.
- Question: The question asked/prompted for
- Context: Relevant contextual support for the question
- Answer: The answer provided by an AI
- Type: A categorical variable with three possible levels – Factual, Contradiction, Irrelevant:
- Factual: the answer is correct
- Contradiction: the answer is incorrect
- Irrelevant: the answer has nothing to do with the question
There are 21,021 examples in the dataset (data/train.json) that you will experiment with.
The test dataset (data/test.json) contains 2000 examples that you predict as one of the three provided classes. In addition to classification performance we are seeking as detailed as possible methodologies of your step-by-step approach in your notebooks. Discuss what worked well, what did not work well, and your suggestions or ideas if a general approach to these types of problems might exist.
Previewing the Training Data
Let's load and preview the train.json dataset to understand its structure and contents.
import pandas as pd
import json
# Load the train.json file
data_path = "data/train.json"
with open(data_path, 'r', encoding='utf-8') as f:
data = json.load(f)
# Convert to DataFrame
train_df = pd.DataFrame(data)
# Show the first 50 rows
train_df.head(50)💪 Competition challenge
Create a report here in DataLab that covers the following:
- Your EDA and machine learning process using the
data/train.jsonfile. - Complete the
data/test.jsonfile by predicting thetypeof answer for each question (2000 total). Thedata/test.jsonfile also has anIDcolumn, which uniquely identifies each row.
✅ Submission Instructions:
- First, submit your DataLab workbook using the button in the top right corner.
- Then, submit your predictions as a .json file via this form.
- The structure of the file should not be altered, and should include the IDs and predicted answer
type. - Only one team member needs to submit. List the academic emails of all the team members in the submission form.
🧑⚖️ Judging criteria
Your submission will be scored using a custom weighted confusion matrix to account for cost-based priorities. Each class receives an equal weighting to calculate your overall prediction performance. Thus, factual prediction score counts for 33.3%, contradiction classification for 33.3%, and irrelevant for 33.3%.
Your classification evaluation on the test set will be ranked among all teams in the competition. A total of 6000 points (34.60% of the total amount of points of the Data4Good challenge) can be earned by successfully submitting the competition.
✅ Checklist before publishing into the competition
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the judging criteria, so the workbook is focused on your story.
- Make sure the workbook reads well and explains how you found your insights.
- Try to include an executive summary of your recommendations at the beginning.
- Check that all the cells run without error.