How to Access Gemma 3 27B Locally, via API, on Cloud GPU

HOW TO ACCESS GEMMA 3 27B

Key Highlights

Gemma 3 27B is an open-source, multimodal LLM released by Google in March 2025.

Supports 140+ languages with a new tokenizer and 128K context window.

Handles both text and image input, outputs text.

Trained on 14 trillion tokens, excels in math, code, and instruction following.

Benchmark scores: 1339 Elo, 69.0 (MATH), 67.5 (MMLU-Pro).

Can run on a single NVIDIA H100 or be deployed via Ollama (local) or Novita AI API / Cloud GPU.

Gemma 3 27B is a powerful, flexible LLM built by Google. It combines multilingual reach, multimodal input, and high performance, making it ideal for diverse AI workloads—locally or in the cloud.

What is Gemma 3 27B?

Notable Features

  • Advanced Multilingual Support: With its new tokenizer, Gemma 3 is highly effective across 140+ languages.
  • Multimodal Input: The ability to process both images and text makes it a versatile tool for a range of applications.
  • Extended Context Window: The 128K token capacity allows for handling extensive and detailed inputs.
  • Open Source and Community-Friendly: Being open-source, the model encourages community experimentation and broad adoption.
Category Item Details
Basic Info Release Date March 12, 2025
Model Size 27 billion parameters
Open Source Yes (released by Google)
Language Support Supported Multilingual Languages Over 140 languages
Training Training Data 14 trillion tokens
Strengths Math, coding, instruction following
Multimodal Multimodal Capability Yes (processes images and text, outputs text)
Context Context Window 128K tokens
Model Size by Precision bf16 (Raw) Weights: 54.0 GB; Weights + KV Cache: 72.7 GB
INT4 Weights: 14.1 GB; Weights + KV Cache: 32.8 GB
INT4 (blocks=32) Weights: 15.3 GB; Weights + KV Cache: 34.0 GB
SFP8 Weights: 27.4 GB; Weights + KV Cache: 46.1 GB

Gemma 3 27B Benchmark

Benchmark Gemma 3 27B DeepSeek R1 LLaMA 3.3 70B
LMSys Elo Score 1339 ~1360 ~1260
MMLU-Pro 67.5 84.0 66.4
LiveCodeBench 29.7 65.9 ~29
GPQA Diamond 42.4 71.5 50.5
MATH 69.0 97.3 77.0

How to Access Gemma 3 27B Locally?

Hardware Requirements

Gemma 3 27B is described as the “most capable model you can run on a single GPU!

ELO SCORE
From Google
Setup VRAM Requirement Notes
Cloud Deployment About 80GB VRAM (single/multi-GPU) A100 or H100 GPUs are recommended for optimal cloud deployment performance.Or RTX 4090 24GB (x3)
Apple Silicon Gemma 3 4B supported via mlx-vlm Gemma 3 4B ships with day zero support in mlx-vlm, an open-source library for running vision-language models on Apple Silicon devices, including Macs and iPhones.

Step-by-step process to install Gemma 3 27B locally 

# Step 0: Check NVIDIA GPU
nvidia-smi

# Step 1: Update Ubuntu package sources
apt update

# Step 2: Install Ollama dependencies for GPU detection
apt install pciutils lshw

# Step 3: Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Step 4: Start Ollama server (run this in one terminal and keep it open)
ollama serve

# Step 5: (In a new terminal) Check if Ollama is working
ollama

# Step 6: Install Gemma-3 models (choose one)

# Run Gemma-3 1B
# ollama run gemma3:1b

# Run Gemma-3 4B
# ollama run gemma3:4b

# Run Gemma-3 12B
# ollama run gemma3:12b

# ✅ Recommended: Run Gemma-3 27B
ollama run gemma3:27b

# Step 7: Interact with the model directly via prompt in the console
# Example:
# You are an AI-powered trading analyst specializing in cryptocurrency markets.
# Your task is to design an autonomous AI agent that can predict market trends,
# execute trades, and manage risks efficiently. Your response should include:
# - A strategy for analyzing on-chain + off-chain data
# - Model choice for price prediction and sentiment
# - A Python code snippet
# - Risk management methods
# - Ethical concerns

How to Access Gemma 3 27B via Novita API?

Step 1: Log In and Access the Model Library

Log in to your account and click on the Model Library button.

Log In and Access the Model Library

Step 2: Start Your Free Trial

Begin your free trial to explore the capabilities of the selected model.

start a free trail on gemma 3

Step 3: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

get api key

Step 4: Install the API

Install API using the package manager specific to your programming language.

install api on gemma 3

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="<YOUR Novita AI API Key>",
)

model = "google/gemma-3-27b-it"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

Using Gemma 3 27B via Chatbox

Step 1: Install Chatbox

Using Gemma 3 27B via Chatbox
  1. Select the “Setting” option. This setting ensures compatibility with APIs following the OpenAI API standard, like Novita AI.
  2. Fill in the configuration fields:
    • Base URL: Enter https://api.novita.ai/v3/openai.
    • API Key: Paste your Novita AI API Key here.
    • Model Name: Paste the model name you copied earlier (e.g., google/gemma-3-27b-it).
  3. Once the configuration is filled out, click Done.

Using Gemma 3 27B via Cloud GPU

Step1:Register an account

If you’re new to Novita AI, begin by creating an account on our website. Once you’re registered, head to the “GPUs” tab to explore available resources and start your journey.

Novita AI website screenshot

Step2:Exploring Templates and GPU Servers

Start by selecting a template that matches your project needs, such as PyTorch, TensorFlow, or CUDA. Choose the version that fits your requirements, like PyTorch 2.2.1 or CUDA 11.8.0. Then, select the A100 GPU server configuration, which offers powerful performance to handle demanding workloads with ample VRAM, RAM, and disk capacity.

novita ai website screenshot using cloud gpu

Step3:Tailor Your Deployment

After selecting a template and GPU, customize your deployment settings by adjusting parameters like the operating system version (e.g., CUDA 11.8). You can also tweak other configurations to tailor the environment to your project’s specific requirements.

novita ai website screenshot using cloud gpu

Step4:Launch an instance

Once you’ve finalized the template and deployment settings, click “Launch Instance” to set up your GPU instance. This will start the environment setup, enabling you to begin using the GPU resources for your AI tasks.

novita ai website screenshot using cloud gpu

With strong benchmarks and simple deployment options, Gemma 3 27B is a top choice for developers and researchers seeking open, high-quality AI tools.

Frequently Asked Questions

What is Gemma 3 27B?

Gemma 3 27B is a 27-billion-parameter open-source large language model developed by Google. It supports multimodal input (text + image), over 140 languages, and features a 128K token context window.

What are the hardware requirements for running Gemma 3 27B locally?

You’ll need approximately 80GB VRAM. A single NVIDIA H100 is sufficient. You can also run it with multiple RTX 4090s (e.g., 3×24GB).

Is there an API version of Gemma 3 27B available?

Yes! You can access Gemma 3 27B through the Novita AI API, which is fully compatible with the OpenAI API standard.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.

Recommend Reading

Simple APIs and Scalable GPU

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading