Exploring Microsoft’s phi-4 Model and Its GGUF Format with Llama.cpp

2 min readJan 9, 2025

In the rapidly evolving field of generative AI, Microsoft Research has introduced the phi-4 model, a cutting-edge 14 billion parameter Transformer designed for efficient reasoning and logic tasks. Built using a diverse dataset and rigorous alignment processes, phi-4 pushes the boundaries of open AI models while maintaining robust safety and instruction-following capabilities. This article delves into the model’s highlights, benchmark performance, and usage through the GGUF format with Llama.cpp.

phi-4 Model Overview

Key Specifications:

Developers: Microsoft Research
Architecture: Dense decoder-only Transformer with 14 billion parameters.
Context Length: 16K tokens.
Training Data: 9.8 trillion tokens from synthetic datasets, filtered public domain websites, academic books, and Q&A datasets.
Training Time: 21 days on 1920 H100–80G GPUs.
Release Date: December 12, 2024.
License: MIT

The model is designed to cater to memory/compute-constrained environments and latency-bound scenarios, excelling in reasoning and logic tasks.

Benchmark Performance

phi-4 demonstrates exceptional capabilities in various domains, outpacing many contemporaries in reasoning, code generation, and mathematical tasks. Below are the notable benchmark results:

Using phi-4 in GGUF Format with Llama.cpp

One of the standout features of phi-4 is its availability in the GGUF format, enabling efficient deployment in environments with limited resources. Below, we’ll guide you through using phi-4 with Llama.cpp.

Setup and Execution

Install the llama-cpp-python library:

!pip install llama-cpp-python

Download the phi-4 GGUF model:

!wget https://huggingface.co/microsoft/phi-4-gguf/resolve/main/phi-4-q4.gguf

Initialize the model in Python:

from llama_cpp import Llama

llm = Llama(
    model_path="/content/phi-4-q4.gguf",
    # Uncomment options as needed:
    # n_gpu_layers=-1,  # Use GPU acceleration
    # seed=1337,        # Set a specific seed
    # n_ctx=2048        # Increase context window
)

Run a sample chat query:

output = llm.create_chat_completion(
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant who gives answers without including any noise in the output'},
        {'role': 'user', 'content': 'Name the galaxy which is closest to us'}
    ]
)

print(output["content"])

Sample Output

When queried about the galaxy closest to us, phi-4 in GGUF format generated:

The galaxy closest to us is the Canis Major Dwarf Galaxy. It is a satellite galaxy of the Milky Way and is located approximately 25,000 light-years from Earth and about 42,000 light-years from the Galactic Center.

Conclusion

Microsoft’s phi-4 represents a significant step forward in open-source generative AI, combining high-quality training data with advanced alignment techniques. Its versatility, coupled with the lightweight GGUF format, makes it an excellent choice for developers aiming to integrate powerful AI features into constrained environments. By leveraging tools like Llama.cpp, users can harness the model’s capabilities efficiently and effectively.

#AI #ArtificialIntelligence #GenerativeAI #MicrosoftResearch #phi4Model #OpenSourceAI #TransformerModels #MachineLearning #DeepLearning #LlamaCpp #GGUF #AIInnovation #CodeGeneration #ReasoningAndLogic #AIBenchmarks #GPTAlternatives #TechTrends #DeveloperTools #AICommunity #MLResearch