Skip to main content

Microsoft's Phi-4: Step-by-Step Tutorial With Demo Project

Learn how to build a homework checker using Microsoft's Phi-4 model, which validates solutions, provides detailed corrections, and suggests elegant alternatives.
Jan 7, 2025  · 12 min read

Microsoft recently introduced Phi-4, the latest addition to the Phi family of small language models. Because it excels in mathematics, I decided to use Phi-4 to build to build a homework checker integrated with Gradio.

In this tutorial, I’ll guide you step by step through building a functional web app capable of validating solutions, correcting errors, and providing alternative approaches—just like a virtual teaching assistant!

What Is Microsoft’s Phi-4 Model?

Phi-4 excels in complex reasoning tasks, particularly in mathematics, while maintaining proficiency in conventional language processing. Its key features are:

  • Advanced reasoning capabilities: Phi-4 is trained on high-quality synthetic datasets and uses innovative post-training techniques to outperform larger models in math-related reasoning tasks.
  • Efficiency and accessibility: With 14 billion parameters, Phi-4 offers high-quality results without extensive computational resources, making it accessible for a wider range of applications.
  • Availability: Phi-4 is currently accessible through Azure AI Foundry under a Microsoft Research License Agreement (MSRLA) and Hugging Face.

Phi-4 has demonstrated great performance on mathematical reasoning tasks, even surpassing larger models like Gemini Pro 1.5 in math competition problems. This makes it a good choice for applications that require advanced mathematical problem-solving capabilities.

Phi4 performance on math competition problems

Source: Microsoft

Phi-4 Homework Checker: Implementation Overview

The app that we’re going to build with Phi-4 is an AI-powered homework checker. Here’s the workflow that the user will go through:

  1. The user submits their finished homework (both the exercise instructions and the user’s solution).
  2. If the solution is incorrect, the model will explain the correct solution with detailed steps, like a teacher.
  3. If the solution is correct, the model will confirm the solution or suggest a cleaner, more efficient alternative if the answer is messy.

phi-4 demo app workflow

To provide a web interface where users can interact with the homework checker, we’ll use Gradio.

Step 1: Prerequisites

Before we begin, ensure you have the following installed:

  • Python 3.8+
  • PyTorch: For running deep learning models.
  • HuggingFace Transformers library: For loading the Phi-4 model from HuggingFace.
  • Gradio: To create a user-friendly web interface.

Install these dependencies by running:

!pip install torch transformers gradio -q

Now, we have all the dependencies installed. Next, we set up the Phi-4 model.

Step 2: Setting Up the Model

We load the Phi-4 model from HuggingFace’s Transformers library. Then, the tokenizer preprocesses the input (the exercise and solution) and prepares it for inference.

# Imports
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import gradio as gr

# Load the Phi-4 model and tokenizer
model_name = "NyxKrage/Microsoft_Phi-4"
model=AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Set tokenizer padding token if not set
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

The above code snippet sets up the Phi-4 model and tokenizer and integrates them with PyTorch for computations. Let’s break down the code above in more detail:

  • The AutoModelForCausalLM and AutoTokenizer classes are imported to work with the model and tokenization.
  • The model is loaded from the Hugging Face repository using the from_pretrained() method and configured to use FP16 precision for optimized memory usage and computation speed.
  • The device_map="auto" parameter ensures the model is automatically mapped to the available hardware.
  • The tokenizer is also loaded, which processes input text into tokens suitable for the model.
  • A check ensures the tokenizer has a defined padding token; if not, it assigns the eos_token_id (end-of-sequence token) as the padding token.

Step 3: Designing Core Features 

Once the model is set up, then we define three key functions for the app:

  1. Solution validation: The model evaluates the user's solution and provides corrections if incorrect.
  2. Alternative suggestions: It suggests cleaner solutions if the user's solution is messy.
  3. Clear feedback: The model also structures the output with clear sections.

The following function, check_homework(), constructs a prompt containing the exercise, the user's solution, and specific instructions for the model to confirm the correctness, identify issues, or provide step-by-step guidance if the solution is incorrect.

# Function to validate the solution and provide feedback
def check_homework(exercise, solution):
    prompt = f"""
    Exercise: {exercise}
    Solution: {solution}
Task: Validate the solution to the math problem, provided by the user. If the user's solution is correct, confirm else provide an alternative if the solution is messy. If it is incorrect, provide the correct solution with step-by-step reasoning.
    """
    # Tokenize and generate response
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    print(f"Tokenized input length: {len(inputs['input_ids'][0])}")
    outputs = model.generate(**inputs, max_new_tokens=1024)
    print(f"Generated output length: {len(outputs[0])}")
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # response = response.replace(prompt, "").strip()
    prompt_len = len(prompt)
    response = response[prompt_len:].strip()
    print(f"Raw Response: {response}")
    return response

The check_homework() function tokenizes the prompt using the model's tokenizer. It prepares it for processing by converting the input into PyTorch tensors, which are mapped to the device where the model is running.

It then generates a response from the model with a limit of max_new_tokens=1024 to control the output length. This token length can be varied as required.

Finally, the processed response, which provides feedback or a corrected solution, is returned. 

Step 4: Creating a User-Friendly Interface with Gradio

Gradio simplifies the deployment of the homework checker by allowing users to input their exercises and solutions interactively. The following code snippet creates a user-friendly Gradio web interface for the check_homework() function. The Gradio interface takes the user’s inputs (the exercise and the solution) and passes them to the model for validation. 

# Define the function that integrates with the Gradio app
def homework_checker_ui(exercise, solution):
    return check_homework(exercise, solution)

# Create the Gradio interface using the new syntax
interface = gr.Interface(
    fn=homework_checker_ui,
    inputs=[
        gr.Textbox(lines=2, label="Exercise (e.g., Solve for x in 2x + 3 = 7)"),
        gr.Textbox(lines=1, label="Your Solution (e.g., x = 1)")
    ],
    outputs=gr.Textbox(label="Feedback"),
    title="AI Homework Checker",
    description="Validate your homework solutions, get corrections, and receive cleaner alternatives.",
)

# Launch the app
interface.launch(debug=True)

We created two input fields using gr.Textbox: one for the math problem (exercise) and another for the user’s solution. The output is displayed in a single gr.Textbox labeled "Feedback". The interface.launch() command launches the Gradio app in a browser, and the debug=True flag enables detailed logs to help troubleshoot errors. 

Step 5: Testing and Validating

It’s time to test our AI Homework Checker app. Here are some tests I ran:

  1. Simple math problem: I tried solving basic probability problems, and the app returned a well-structured and clear solution to the problem.

Testing Phi4 on a simple math problem.

  1. Complex derivative problem: Solving derivatives can be challenging for some models. Here, I tried finding the first derivative of a natural log function with the homework checker and it produced correct step by step reasoning for the solution.

Testing Phi4 on a Complex derivative math problem.

Conclusion

In this tutorial, we created an AI-powered homework checker using the Phi-4 model. This app validates solutions, provides detailed corrections, and suggests elegant alternatives, making it an ideal virtual teacher for students.

Ready to extend the app? Experiment with more complex problems, or integrate it into educational platforms for broader use!


Aashi Dutt's photo
Author
Aashi Dutt
LinkedIn
Twitter

I am a Google Developers Expert in ML(Gen AI), a Kaggle 3x Expert, and a Women Techmakers Ambassador with 3+ years of experience in tech. I co-founded a health-tech startup in 2020 and am pursuing a master's in computer science at Georgia Tech, specializing in machine learning.

Topics

Learn AI with these courses!

track

Developing AI Applications

23hrs hr
Learn to create AI-powered applications with the latest AI developer tools, including the OpenAI API, Hugging Face, and LangChain.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related
An AI juggles tasks

blog

5 Projects You Can Build with Generative AI Models (with examples)

Learn how to use Generative AI models to create an image editor, ChatGPT-like chatbot on low resources, loan approval classifier app, automate PDF Interactions, and GPT-powered voice assistant.
Abid Ali Awan's photo

Abid Ali Awan

10 min

tutorial

Phi-3 Tutorial: Hands-On With Microsoft’s Smallest AI Model

Learn about Microsoft’s Phi-3 language model, including its architecture, features, applications, installation, setup, integration, optimization, and fine-tuning.
Zoumana Keita 's photo

Zoumana Keita

12 min

tutorial

Microsoft's TinyTroupe: A Guide With Examples

Learn how to use Microsoft’s TinyTroupe to simulate interactions between AI personas with distinct characteristics for different purposes.
Hesam Sheikh Hassani's photo

Hesam Sheikh Hassani

8 min

tutorial

Qwen 2.5 Coder: A Guide With Examples

Learn about the Qwen2.5-Coder series by building an AI code review assistant using Qwen 2.5-Coder-32B-Instruct and Gradio.
Aashi Dutt's photo

Aashi Dutt

8 min

tutorial

Gemini 2.0 Flash: Step-by-Step Tutorial With Demo Project

Learn how to use Google's Gemini 2.0 Flash model to develop a visual assistant capable of reading on-screen content and answering questions about it using Python.
François Aubry's photo

François Aubry

12 min

tutorial

A Deep Dive into the Phi-2 Model

Understanding the Phi-2 model and learning how to access and fine-tune it using the role-play dataset.
Abid Ali Awan's photo

Abid Ali Awan

12 min

See MoreSee More