Gemma 3 27B: Really a Single-GPU Model?

Key Highlights

Multimodal, Multilingual, Long Context: Gemma 3 27B processes text and images, supports 140+ languages, and handles up to 128K tokens for long inputs.

Elo Score: Achieved an Elo score of 1339, ranking among the top 10 models globally.

Single GPU Compatibility: Delivers top performance on a single NVIDIA H100 GPU, outperforming competitors requiring multiple GPUs.

Novita AI Access: Novita AI offers a cost-effective API and a free playground to explore Gemma 3 27B’s capabilities—try it now!

Released on March 12, 2025, Gemma 3 27B is a significant addition to Google’s open-source large language model offerings. As the largest model in the Gemma 3 family, aside from instruction-tuned variations with specific functionalities, it aims to provide a balance of performance and accessibility. This article offers a practical and technical overview of Gemma 3 27B, detailing its architecture, capabilities, performance in benchmarks and practical tests, hardware considerations, and access methods.

Table Of Contents

What is Gemma 3 27B?
Gemma 3 27B Benchmark
Gemma 3 27B Hardware Requirements
Gemma 3 27B Test
How to Access Gemma 3 27B?
Frequently Asked Questions

What is Gemma 3 27B?

Notable Features

Advanced Multilingual Support: With its new tokenizer, Gemma 3 is highly effective across 140+ languages.

Multimodal Input: The ability to process both images and text makes it a versatile tool for a range of applications.

Extended Context Window: The 128K token capacity allows for handling extensive and detailed inputs.

Open Source and Community-Friendly: Being open-source, the model encourages community experimentation and broad adoption.

Release Date, Model Size, Open Source

March 12, 2025
27 billion parameters
This size positions Gemma 3 as a substantial model capable of handling complex and diverse tasks effectively.
Open Model: Released as an open-source model by Google.

Supported Languages

Supports over 140 languages
- Features a new tokenizer designed for better multilingual support, making it highly versatile in global applications.

Model Architecture

Technology: Built using the same research and technology that power Google’s Gemini 2.0 models.
Training:
- Trained on 14 trillion tokens using Google TPUs.
- Leveraged the JAX Framework for efficient and scalable training.
- Techniques Used:
  - Distillation
  - Reinforcement Learning (including RLHF, RLMF, RLEF)
  - Model Merging
- These techniques enhance the model’s performance in critical areas such as math, coding, and instruction following.

Multimodal Capability

Multimodal: Yes
- Processes both images and text as input and generates text output.
Vision Encoder: Based on SigLIP.

Context Window

128K tokens
- Allows the model to process and understand large amounts of information for sophisticated tasks.
- Pretraining Details:
  - Initially pretrained with 32k sequences and later scaled to 128k.
  - Achieved through adjustments to positional embeddings using RoPE (Rotary Positional Embeddings).

Quantization Precision

Default training precision: bfloat16
- Performance Note: Models perform best using bfloat16, and quality may degrade with other precisions.
Quantization Options:
- Community experiments with quantization levels such as Q8 gguf quant, EXL2, and IQ4_XS to optimize VRAM usage.

Precision Level	Weights Only (GB)	Weights + KV Caching (GB)
bf16 (Raw)	54.0	72.7
INT4	14.1	32.8
INT4 (blocks=32)	15.3	34.0
SFP8	27.4	46.1

Gemma 3 27B Benchmark

Chat Performance

Gemma 3 27B has demonstrated outstanding capabilities in recent evaluations, achieving an Elo score of 1339 on the LMSys Chatbot Arena. This performance places it among the top 10 best models, including leading closed-source models, o1-preview and highlights its strength in human preference evaluations. In addition, Gemma 3 27B achieves this high score while requiring only a single NVIDIA H100 GPU, unlike competitors that depend on up to 32 GPUs for similar performance.

Performance in Specific Benchmarks

The instruction-tuned version, Gemma 3 27B IT, achieved competitive results across a range of evaluations, often rivaling closed Gemini models:

Benchmark	Score	Description
MMLU-Pro	67.5	Strong performance in multi-task language understanding.
LiveCodeBench	29.7	Demonstrates moderate success in live coding challenges.
Bird-SQL	54.4	Competitive results in SQL query generation and understanding.
GPQA Diamond	42.4	Solid performance in general-purpose question answering.
MATH	69.0	Excels in complex mathematical problem-solving.
FACTS Grounding	74.9	Excellent factual grounding and accuracy in knowledge-based tasks.
MMMU	64.9	Strong performance in multimodal understanding tasks.
SimpleQA	10.0	Lower performance in basic fact-based question answering, leaving room for improvement.

Gemma 3 27B Hardware Requirements

Gemma 3 27B is described as the “most capable model you can run on a single GPU!

From Google

Setup	VRAM Requirement	Notes
Cloud Deployment	About 80GB VRAM (single/multi-GPU)	A100 or H100 GPUs are recommended for optimal cloud deployment performance.Or RTX 4090 24GB (x3)
Apple Silicon	Gemma 3 4B supported via mlx-vlm	Gemma 3 4B ships with day zero support in mlx-vlm, an open-source library for running vision-language models on Apple Silicon devices, including Macs and iPhones.

Gemma 3 27B Test

Code

Prompt: Create a JavaScript simulation of a rotating 3D sphere made up of alphabets. The closest letters should be in a brighter color, while the ones farthest away should appear in gray color.

Output: It appears that the model did not fully adhere to prompt. Instead, it generated a spinning ring with alphabets. Additionally, the first two previews failed due to insufficient brightness.

Reasoning

Prompt: You start with 14 apples. Emma takes 3 but gives back 2. You drop 7 and pick up 4. Leo takes 4 and gives 5. You take 1 apple from Emma and trade it with Leo for 3 apples, then give those 3 to Emma, who hands you an apple and an orange. Zara takes your apple and gives you a pear. You trade the pear with Leo for an apple. Later, Zara trades an apple for an orange and swaps it with you for another apple. How many pears do you have? Answer me just what is asked.

Output: In a matter of seconds, it efficiently analyzed all scenarios and accurately calculated the total pear count.

Image Analysis

Prompt: Tell me how many times the elo score increased from gemma 2 27b to gemma 3 27b?

Output: Accurately identified the numbers but did not calculate the multiples as instructed by the prompt; instead, only calculated the difference.

Based on the image, here’s the calculation:

Gemma 2 27B Elo score: 1220

Gemma 3 27B Elo score: 1338

Increase: 1338 – 1220 = 118

The Elo score increased by 118 points from Gemma 2 27B to Gemma 3 27B.

How to Access Gemma 3 27B?

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Step 1: Log In and Access the Model Library

Try Gemma 3 27B Demo Now!

Step 2: Start Your Free Trial

Begin your free trial to explore the capabilities of the selected model.

Step 3: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

Step 4: Install the API

Install API using the package manager specific to your programming language.

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="<YOUR Novita AI API Key>",
)

model = "google/gemma-3-27b-it"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

Gemma 3 27B is a powerful open-source model from Google, offering strong reasoning, multimodal capabilities, multilingual support, and easy integration with platforms like Hugging Face, all while running on consumer-grade hardware.

Frequently Asked Questions

How many parameters does Gemma 3 27B have?

Gemma 3 27B has 27 billion parameters.

Is Gemma 3 27B multimodal?

Yes, it supports both image and text inputs.

What is the recommended hardware for running Gemma 3 27B?

For local use, a GPU with at least 24GB of VRAM is recommended, with more VRAM being beneficial for larger context sizes. It can also be deployed on cloud platforms like Hugging Face Inference Endpoints with various GPU options. Or you can choose effective api like Novita AI to use it!

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.

Discover more from Novita

Subscribe to get the latest posts sent to your email.

Gemma 3 27B on Novita AI: Really a Single-GPU Model?

Key Highlights

What is Gemma 3 27B?

Notable Features

Release Date, Model Size, Open Source

Supported Languages

Model Architecture

Multimodal Capability

Context Window

Quantization Precision

Gemma 3 27B Benchmark

Chat Performance

Performance in Specific Benchmarks

Gemma 3 27B Hardware Requirements

Gemma 3 27B Test

Code

Reasoning

Image Analysis

How to Access Gemma 3 27B?

Step 1: Log In and Access the Model Library

Step 2: Start Your Free Trial

Step 3: Get Your API Key

Step 4: Install the API

Frequently Asked Questions

Discover more from Novita

Leave a CommentCancel reply

CONTACT

RESOURCES

COMPANY

PARTNERS

Key Highlights

What is Gemma 3 27B?

Notable Features

Release Date, Model Size, Open Source

Supported Languages

Model Architecture

Multimodal Capability

Context Window

Quantization Precision

Gemma 3 27B Benchmark

Chat Performance

Performance in Specific Benchmarks

Gemma 3 27B Hardware Requirements

Gemma 3 27B Test

Code

Reasoning

Image Analysis

How to Access Gemma 3 27B?

Step 1: Log In and Access the Model Library

Step 2: Start Your Free Trial

Step 3: Get Your API Key

Step 4: Install the API

Frequently Asked Questions

Recommend Reading

Discover more from Novita

Related Posts

Leave a CommentCancel reply

CONTACT

RESOURCES

COMPANY

PARTNERS

Discover more from Novita