Skip to main content

Qwen 2.5 Coder: A Guide With Examples

Learn about the Qwen2.5-Coder series by building an AI code review assistant using Qwen 2.5-Coder-32B-Instruct and Gradio.
Dec 12, 2024  · 8 min read

The Alibaba Qwen research team recently released the Qwen2.5-Coder series. The series comprises 12 models under two main categories: base models and instruct models.

The Qwen2.5-Coder series offers parameter variants ranging from 0.5B to 32B, providing us developers with the flexibility to experiment on both edge devices and heavy-load GPUs.

In this hands-on guide, we’ll use the Qwen 2.5-32B-Instruct model to build a code review assistant integrated with Gradio. By the end of this tutorial, you’ll have a functional web app capable of analyzing code, suggesting improvements, and generating helpful solutions—just like a virtual support assistant!

RAG with LangChain

Integrate external data with LLMs using Retrieval Augmented Generation (RAG) and LangChain.
Explore Course

What Is the Qwen2.5-Coder Series?

The Qwen2.5-Coder series (formerly known as CodeQwen1.5), developed by Alibaba’s Qwen research team, is dedicated to advancing Open CodeLLMs. The series includes models like the Qwen 2.5-32B-Instruct, which has become the state-of-the-art open-source code model, rivaling the coding capabilities of proprietary giants like GPT-4o and Gemini.

These models are presented as being:

  • Powerful: These models are capable of advanced code generation, repair, and reasoning.
  • Diverse: They support over 92 programming languages, including Python, Java, C++, Ruby, and Rust.
  • Practical: Qwen 2.5 models are designed for real-world applications, from code assistance to artifact generation, with a long-context understanding of up to 128K tokens.

Comparison of Qwen 2.5 Coder 32B instruct model with existing SOTA models on various benchmarks.

Source: Qwen’s official blog

Qwen 2.5 Coder 32B Instruct outperforms its counterparts across multiple benchmarks, including HumanEval, MBPP, and Spider, showcasing exceptional coding and problem-solving capabilities. It excels in diverse tasks like SQL query generation, code evaluation, and real-world programming challenges, surpassing models like GPT-4o and Claude 3.5 Sonnet.

To learn more about these models' capabilities and comparisons, read the Qwen2.5-Coder Series announcement article.

Qwen 2.5 Code Implementation: Overview

In this section, we will dive into the code implementation of our code review assistant using Gradio.

We will start by setting up the necessary prerequisites, followed by initializing the model with optimized configurations.

Next, we will define core functionalities, using the model's instruct capabilities and finally integrating these components into a user-friendly Gradio interface, enabling easy interaction with the assistant.

Step 1: Prerequisites

Before diving into the implementation, let’s ensure that we have the following tools and libraries installed:

  • Python 3.8+
  • PyTorch: For efficient deep learning model handling.
  • Hugging Face Transformer library: For loading and using pre-trained LLMs.
  • BitsandBytes: For applying quantization configuration for optimized performance
  • Gradio: To create a user-friendly web interface.

Run the following commands to install the necessary dependencies:

!pip install torch transformers gradio bitsandbytes -q

Once the above dependencies are installed, run the following import commands:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
Import gradio as gr

Note: Running the Qwen2.5-Coder-32B-Instruct model on Google Colab requires an A100 GPU with high RAM. If resources are limited, consider using lower parameter models like 0.5B, 3B, 7B, or 14B.

Step 2: Model and Tokenizer Initialization

The Hugging Face Transformers library simplifies loading large models. We start by initializing the Qwen2.5-Coder-32B-Instruct model from HuggingFace with quantization for better performance:

# Define quantization configuration for optimized performance
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
# Load the model and tokenizer
model_name = "Qwen/Qwen2.5-Coder-32B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    quantization_config=bnb_config
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

The above code snippet allows us to load the Qwen2.5-Coder-32B-Instruct model with 8-bit quantization, reducing memory usage for efficient execution on devices with limited resources.

It uses BitsAndBytesConfig to enable quantization, torch.float16 for optimized computation on supported GPUs, and device_map="auto" for automatic hardware allocation. The tokenizer from HuggingFace’s transformers library ensures easy text processing, assigning a pad_token_id if not already set. This configuration makes it easy to run large models efficiently on consumer-grade hardware.

Step 3: Response Generation

Once we have loaded the model along with the tokenizer, we set up a function to generate a response from the model using the apply_chat_template method. 

def generate_response(messages):
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=200,
        temperature=0.7,
        repetition_penalty=1.2
    )
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
    return response

The above function serves as a utility to interact with the Qwen2.5-Coder-32B-Instruct model for generating context-aware responses. It takes a list of chat-like messages (with roles such as “system” and “user”) as input and performs the following steps:

  1. Chat template application: It formats the messages using the apply_chat_template function, preparing them for the model’s chat-based understanding.
  2. Tokenization and input preparation: The formatted text is tokenized and converted into model-compatible tensors using Hugging Face’s tokenizer.
  3. Response generation: The model generates a response based on the input using specific parameters such as:
  1. max_new_tokens=200; This parameter limits the response length to 200 tokens.
  2. temperature=0.7; This helps to control randomness in the response, thereby balancing creativity and consistency.
  3. repetition_penalty=1.2; This parameter discourages repetitive phrases in the output.

4. Post-processing: This step removes the input prompt from the generated output by slicing and decoding the response into a human-readable string using the batch_decode function. The function abstracts the complexity of preparing, generating, and decoding responses, making it reusable for various tasks such as debugging, code analysis, and generating actionable feedback. 

Step 4: The App’s Core Features Setup

Next, we set up the code for our application's core features, namely, the code analyzer, code improver, and coding standards checker. Since we are using an instruct model, we can use roles such as “system” and “user” and pass them to the model within the prompt to generate a response.

Here are our app’s three core features: issues detector, code quality expert, and coding standards expert.

1. Issue detector

This assistant identifies syntax errors, logical bugs, and potential vulnerabilities in the provided code. The specific instructions for the model are:

def analyze_code_issues(code):
    messages = [
        {"role": "system", "content": "You are Qwen, a helpful coding assistant. Your job is to analyze and debug code."},
        {"role": "user", "content": f"""Review this code:
    {code}
    List all syntax errors, logical bugs, and potential runtime issues. Format as:
    - Error/Bug: [description]
    - Impact: [potential consequences]
    - Fix: [suggested solution]"""}
        ]
    return generate_response(messages)

The above code consists of messages containing roles and content along with an initial prompt for the model to act like a helpful coding assistant. This message is then passed into the generate_response function to return a suitable response.

2. Code quality expert

This assistant helps to improve readability, efficiency, and maintainability of the code, by generating suggestions on optimizations and code quality.

def suggest_code_improvements(code):
    messages = [
        {"role": "system", "content": "You are Qwen, an expert in code optimization. Help improve code quality."},
        {"role": "user", "content": f"""Review this code:
    {code}
    Provide specific optimization suggestions:
    1. Performance improvements
    2. Better variable names
    3. Simpler logic
    4. Memory efficiency
    Include code examples where relevant."""}
    ]
    return generate_response(messages)

3. Coding standards expert

This assistant checks if the provided code follows ideal coding standards and provides suggestions accordingly.

messages = [
        {"role": "system", "content": "You are Qwen, a coding standards expert. Your task is to evaluate code adherence to best practices."},
        {"role": "user", "content": f"""Evaluate this code:
    {code}
    Check against these standards:
    1. PEP 8 compliance
    2. Function/variable naming
    3. Documentation completeness
    4. Error handling
    5. Code organization
    List violations with specific examples and corrections."""}
        ]
    return generate_response(messages)

The above function enables the model to act as an expert in coding standards, evaluating the provided code for adherence to guidelines such as PEP 8 compliance, proper function naming, robust error handling, and more.

Step 5: Creating a User-Friendly Interface With Gradio

Gradio simplifies the deployment of the assistant, allowing users to input code and view results interactively. The model’s responses from the previous sections are seamlessly integrated into a Gradio interface.

def review_code(code):
    issues = analyze_code_issues(code)
    improvements = suggest_code_improvements(code)
    standards = check_coding_standards(code)
    return issues, improvements, standards
interface = gr.Interface(
    fn=review_code,
    inputs="textbox",
    outputs=["text", "text", "text"],
    title="AI Code Review Assistant",
    description="Analyze code for issues, suggest improvements, and check adherence to coding standards."
)
interface.launch(debug=True)

The review_code function integrates three components: analyze_code_issues, suggest_code_improvements, and check_coding_standards. These functions identify syntax errors, bugs, optimizations, and adherence to coding standards. The Gradio interface takes user input through a textbox and outputs the results as three text fields. It provides an easy-to-use web interface for developers to analyze and improve their code. Once we run this final cell, we get a running Gradio app interface like one shown below.

AI Code Review Assistant Interface built using Gradio

The AI code review assistant interface built using Gradio

Step 6: Testing and Validating the Code Review Assistant

In case a user doesn’t want to run a Gradio app, they can test the assistant by running the following code. Here’s an example of how the assistant analyzes a buggy Python function:

code = """
def add_numbers(a, b):
return a + b
"""
issues, suggestions, standards = review_code(code)
print("Issues:", issues)
print("Suggestions:", suggestions)
print("Adherence to coding standards:", standards)
Issues:
Here's the review of your provided Python code:
```python
.
.
- **Fix:** Implement input validation or use exception handling to manage unexpected data types gracefully.
Suggestions:
Certainly! Let's review the provided function and suggest optimizations based on your criteria:
1. **Proper Indentation**
.
.
2. **Better Variable Names**
In this simple case, `a` and `b` could be considered generic enough as they represent arbitrary operands. However, more descriptive names can help when functions become complex.
Adherence to coding standards:
Certainly! Let's go through the provided Python function `add_numbers` according to the specified standards.
.
.   
2. **Function/Variable Naming**
- The names `add_numbers`, `a`, and `b` follow good conventions for simplicity and clarity in their context (function name clearly indicates addition; variable names indicate they're operands). However, if you want more descriptive variable names (especially useful in larger functions), consider using something like `operand_one` or `num1`.

This Gradio app can also be run locally with the provided code, utilizing the Qwen-2.5 model accessible via Ollama. Just replace the HF library model with the local Ollama library model, and your app will be up and running seamlessly.

Conclusion

In this article, we learned how to use Qwen 2.5-Coder-32B-Instruct in combination with Gradio to build an AI code review assistant. This tool analyzes code for syntax errors, suggests optimizations, and enforces coding standards, streamlining the software development process.

Now that you know how to set up Qwen 2.5, I encourage you to build your own project!


Aashi Dutt's photo
Author
Aashi Dutt
LinkedIn
Twitter

I am a Google Developers Expert in ML(Gen AI), a Kaggle 3x Expert, and a Women Techmakers Ambassador with 3+ years of experience in tech. I co-founded a health-tech startup in 2020 and am pursuing a master's in computer science at Georgia Tech, specializing in machine learning.

Project: Building RAG Chatbots for Technical Documentation

Implement RAG with LangChain to create a chatbot for answering questions about technical documentation.
Topics

Learn AI with these courses!

track

Developing AI Applications

23hrs hr
Learn to create AI-powered applications with the latest AI developer tools, including the OpenAI API, Hugging Face, and LangChain.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

I Tested QwQ-32B-Preview: Alibaba’s Reasoning Model

I tested the capabilities of Alibaba’s QwQ-32B-Preview model by testing it on a range of math, coding, and logic tasks.
Dr Ana Rojo-Echeburúa's photo

Dr Ana Rojo-Echeburúa

8 min

tutorial

Qwen (Alibaba Cloud) Tutorial: Introduction and Fine-Tuning

Qwen is a family of large language and multimodal models developed by Alibaba Cloud, designed for various tasks like text generation, image understanding, and conversation.
Dr Ana Rojo-Echeburúa's photo

Dr Ana Rojo-Echeburúa

18 min

tutorial

Llama 3.2 and Gradio Tutorial: Build a Multimodal Web App

Learn how to use the Llama 3.2 11B vision model with Gradio to create a multimodal web app that functions as a customer support assistant.
Aashi Dutt's photo

Aashi Dutt

10 min

tutorial

Replit Agent: A Guide With Practical Examples

Learn how to set up Replit Agent and discover how to use it through an example walkthrough and 10 real-world use cases.
Dr Ana Rojo-Echeburúa's photo

Dr Ana Rojo-Echeburúa

10 min

tutorial

Groq LPU Inference Engine Tutorial

Learn about the Groq API and its features with code examples. Additionally, learn how to build context-aware AI applications using the Groq API and LlamaIndex.
Abid Ali Awan's photo

Abid Ali Awan

19 min

code-along

Only Code If You Want To: Data Science with DataLab (Part 2)

Find out how AI assistance can boost your productivity in a more traditional notebook setting.
Joe Franklin's photo

Joe Franklin

See MoreSee More