Gemma 3 27B on Novita AI: Really a Single-GPU Model?

GEMMA 3 27B

Key Highlights

Multimodal, Multilingual, Long Context: Gemma 3 27B processes text and images, supports 140+ languages, and handles up to 128K tokens for long inputs.

Elo Score: Achieved an Elo score of 1339, ranking among the top 10 models globally.

Single GPU Compatibility: Delivers top performance on a single NVIDIA H100 GPU, outperforming competitors requiring multiple GPUs.

Novita AI Access: Novita AI offers a cost-effective API and a free playground to explore Gemma 3 27B’s capabilities—try it now!

Released on March 12, 2025, Gemma 3 27B is a significant addition to Google’s open-source large language model offerings. As the largest model in the Gemma 3 family, aside from instruction-tuned variations with specific functionalities, it aims to provide a balance of performance and accessibility. This article offers a practical and technical overview of Gemma 3 27B, detailing its architecture, capabilities, performance in benchmarks and practical tests, hardware considerations, and access methods.

What is Gemma 3 27B?

Notable Features

  • Advanced Multilingual Support: With its new tokenizer, Gemma 3 is highly effective across 140+ languages.
  • Multimodal Input: The ability to process both images and text makes it a versatile tool for a range of applications.
  • Extended Context Window: The 128K token capacity allows for handling extensive and detailed inputs.
  • Open Source and Community-Friendly: Being open-source, the model encourages community experimentation and broad adoption.

Release Date, Model Size, Open Source

  • March 12, 2025
  • 27 billion parameters
    This size positions Gemma 3 as a substantial model capable of handling complex and diverse tasks effectively.
  • Open Model: Released as an open-source model by Google.

Supported Languages

  • Supports over 140 languages
    • Features a new tokenizer designed for better multilingual support, making it highly versatile in global applications.

Model Architecture

  • Technology: Built using the same research and technology that power Google’s Gemini 2.0 models.
  • Training:
    • Trained on 14 trillion tokens using Google TPUs.
    • Leveraged the JAX Framework for efficient and scalable training.
    • Techniques Used:
      • Distillation
      • Reinforcement Learning (including RLHF, RLMF, RLEF)
      • Model Merging
    • These techniques enhance the model’s performance in critical areas such as math, coding, and instruction following.

Multimodal Capability

  • Multimodal: Yes
    • Processes both images and text as input and generates text output.
  • Vision Encoder: Based on SigLIP.

Context Window

  • 128K tokens
    • Allows the model to process and understand large amounts of information for sophisticated tasks.
    • Pretraining Details:
      • Initially pretrained with 32k sequences and later scaled to 128k.
      • Achieved through adjustments to positional embeddings using RoPE (Rotary Positional Embeddings).

Quantization Precision

  • Default training precision: bfloat16
    • Performance Note: Models perform best using bfloat16, and quality may degrade with other precisions.
  • Quantization Options:
    • Community experiments with quantization levels such as Q8 gguf quant, EXL2, and IQ4_XS to optimize VRAM usage.
Precision LevelWeights Only (GB)Weights + KV Caching (GB)
bf16 (Raw)54.072.7
INT414.132.8
INT4 (blocks=32)15.334.0
SFP827.446.1

Gemma 3 27B Benchmark

Chat Performance

Gemma 3 27B has demonstrated outstanding capabilities in recent evaluations, achieving an Elo score of 1339 on the LMSys Chatbot Arena. This performance places it among the top 10 best models, including leading closed-source models, o1-preview and highlights its strength in human preference evaluations. In addition, Gemma 3 27B achieves this high score while requiring only a single NVIDIA H100 GPU, unlike competitors that depend on up to 32 GPUs for similar performance.

elo scores
From Hugging Face

Performance in Specific Benchmarks

The instruction-tuned version, Gemma 3 27B IT, achieved competitive results across a range of evaluations, often rivaling closed Gemini models:

BenchmarkScoreDescription
MMLU-Pro67.5Strong performance in multi-task language understanding.
LiveCodeBench29.7Demonstrates moderate success in live coding challenges.
Bird-SQL54.4Competitive results in SQL query generation and understanding.
GPQA Diamond42.4Solid performance in general-purpose question answering.
MATH69.0Excels in complex mathematical problem-solving.
FACTS Grounding74.9Excellent factual grounding and accuracy in knowledge-based tasks.
MMMU64.9Strong performance in multimodal understanding tasks.
SimpleQA10.0Lower performance in basic fact-based question answering, leaving room for improvement.

Gemma 3 27B Hardware Requirements

Gemma 3 27B is described as the “most capable model you can run on a single GPU!

ELO SCORE
From Google
Setup VRAM Requirement Notes
Cloud Deployment About 80GB VRAM (single/multi-GPU) A100 or H100 GPUs are recommended for optimal cloud deployment performance.Or RTX 4090 24GB (x3)
Apple Silicon Gemma 3 4B supported via mlx-vlm Gemma 3 4B ships with day zero support in mlx-vlm, an open-source library for running vision-language models on Apple Silicon devices, including Macs and iPhones.

Gemma 3 27B Test

Code

Prompt: Create a JavaScript simulation of a rotating 3D sphere made up of alphabets. The closest letters should be in a brighter color, while the ones farthest away should appear in gray color.

Output: It appears that the model did not fully adhere to prompt. Instead, it generated a spinning ring with alphabets. Additionally, the first two previews failed due to insufficient brightness.

gemma 3 27b code test

Reasoning

Prompt: You start with 14 apples. Emma takes 3 but gives back 2. You drop 7 and pick up 4. Leo takes 4 and gives 5. You take 1 apple from Emma and trade it with Leo for 3 apples, then give those 3 to Emma, who hands you an apple and an orange. Zara takes your apple and gives you a pear. You trade the pear with Leo for an apple. Later, Zara trades an apple for an orange and swaps it with you for another apple. How many pears do you have? Answer me just what is asked.

Output: In a matter of seconds, it efficiently analyzed all scenarios and accurately calculated the total pear count.

Image Analysis

Prompt: Tell me how many times the elo score increased from gemma 2 27b to gemma 3 27b?

ELO SCORE
From Google

Output: Accurately identified the numbers but did not calculate the multiples as instructed by the prompt; instead, only calculated the difference.

Based on the image, here’s the calculation:

  • Gemma 2 27B Elo score: 1220
  • Gemma 3 27B Elo score: 1338

Increase: 1338 – 1220 = 118

The Elo score increased by 118 points from Gemma 2 27B to Gemma 3 27B.

How to Access Gemma 3 27B?

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Step 1: Log In and Access the Model Library

Log in to your account and click on the Model Library button.

Log In and Access the Model Library

Step 2: Start Your Free Trial

Begin your free trial to explore the capabilities of the selected model.

start a free trail on gemma 3

Step 3: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

get api key

Step 4: Install the API

Install API using the package manager specific to your programming language.

install api on gemma 3

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="<YOUR Novita AI API Key>",
)

model = "google/gemma-3-27b-it"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

Gemma 3 27B is a powerful open-source model from Google, offering strong reasoning, multimodal capabilities, multilingual support, and easy integration with platforms like Hugging Face, all while running on consumer-grade hardware.

Frequently Asked Questions

How many parameters does Gemma 3 27B have?

Gemma 3 27B has 27 billion parameters.

Is Gemma 3 27B multimodal?

Yes, it supports both image and text inputs.

What is the recommended hardware for running Gemma 3 27B?

For local use, a GPU with at least 24GB of VRAM is recommended, with more VRAM being beneficial for larger context sizes. It can also be deployed on cloud platforms like Hugging Face Inference Endpoints with various GPU options. Or you can choose effective api like Novita AI to use it!

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.

Recommend Reading


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading