Gemma 3 27B is a powerful, flexible LLM built by Google. It combines multilingual reach, multimodal input, and high performance, making it ideal for diverse AI workloads—locally or in the cloud.
A100 or H100 GPUs are recommended for optimal cloud deployment performance.Or RTX 4090 24GB (x3)
Apple Silicon
Gemma 3 4B supported via mlx-vlm
Gemma 3 4B ships with day zero support in mlx-vlm, an open-source library for running vision-language models on Apple Silicon devices, including Macs and iPhones.
Step-by-step process to install Gemma 3 27B locally
# Step 0: Check NVIDIA GPU
nvidia-smi
# Step 1: Update Ubuntu package sources
apt update
# Step 2: Install Ollama dependencies for GPU detection
apt install pciutils lshw
# Step 3: Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Step 4: Start Ollama server (run this in one terminal and keep it open)
ollama serve
# Step 5: (In a new terminal) Check if Ollama is working
ollama
# Step 6: Install Gemma-3 models (choose one)
# Run Gemma-3 1B
# ollama run gemma3:1b
# Run Gemma-3 4B
# ollama run gemma3:4b
# Run Gemma-3 12B
# ollama run gemma3:12b
# ✅ Recommended: Run Gemma-3 27B
ollama run gemma3:27b
# Step 7: Interact with the model directly via prompt in the console
# Example:
# You are an AI-powered trading analyst specializing in cryptocurrency markets.
# Your task is to design an autonomous AI agent that can predict market trends,
# execute trades, and manage risks efficiently. Your response should include:
# - A strategy for analyzing on-chain + off-chain data
# - Model choice for price prediction and sentiment
# - A Python code snippet
# - Risk management methods
# - Ethical concerns
How to Access Gemma 3 27B via Novita API?
Step 1: Log In and Access the Model Library
Log in to your account and click on the Model Library button.
Begin your free trial to explore the capabilities of the selected model.
Step 3: Get Your API Key
To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.
Step 4: Install the API
Install API using the package manager specific to your programming language.
After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
api_key="<YOUR Novita AI API Key>",
)
model = "google/gemma-3-27b-it"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }
chat_completion_res = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": system_content,
},
{
"role": "user",
"content": "Hi there!",
}
],
stream=stream,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
presence_penalty=presence_penalty,
frequency_penalty=frequency_penalty,
response_format=response_format,
extra_body={
"top_k": top_k,
"repetition_penalty": repetition_penalty,
"min_p": min_p
}
)
if stream:
for chunk in chat_completion_res:
print(chunk.choices[0].delta.content or "", end="")
else:
print(chat_completion_res.choices[0].message.content)
Using Gemma 3 27B via Chatbox
Step 1: Install Chatbox
Select the “Setting” option. This setting ensures compatibility with APIs following the OpenAI API standard, like Novita AI.
Fill in the configuration fields:
Base URL: Enter https://api.novita.ai/v3/openai.
API Key: Paste your Novita AI API Key here.
Model Name: Paste the model name you copied earlier (e.g., google/gemma-3-27b-it).
Once the configuration is filled out, click Done.
Using Gemma 3 27B via Cloud GPU
Step1:Register an account
If you’re new to Novita AI, begin by creating an account on our website. Once you’re registered, head to the “GPUs” tab to explore available resources and start your journey.
Step2:Exploring Templates and GPU Servers
Start by selecting a template that matches your project needs, such as PyTorch, TensorFlow, or CUDA. Choose the version that fits your requirements, like PyTorch 2.2.1 or CUDA 11.8.0. Then, select the A100 GPU server configuration, which offers powerful performance to handle demanding workloads with ample VRAM, RAM, and disk capacity.
After selecting a template and GPU, customize your deployment settings by adjusting parameters like the operating system version (e.g., CUDA 11.8). You can also tweak other configurations to tailor the environment to your project’s specific requirements.
Step4:Launch an instance
Once you’ve finalized the template and deployment settings, click “Launch Instance” to set up your GPU instance. This will start the environment setup, enabling you to begin using the GPU resources for your AI tasks.
With strong benchmarks and simple deployment options, Gemma 3 27B is a top choice for developers and researchers seeking open, high-quality AI tools.
Frequently Asked Questions
What is Gemma 3 27B?
Gemma 3 27B is a 27-billion-parameter open-source large language model developed by Google. It supports multimodal input (text + image), over 140 languages, and features a 128K token context window.
What are the hardware requirements for running Gemma 3 27B locally?
You’ll need approximately 80GB VRAM. A single NVIDIA H100 is sufficient. You can also run it with multiple RTX 4090s (e.g., 3×24GB).
Is there an API version of Gemma 3 27B available?
Yes! You can access Gemma 3 27B through the Novita AI API, which is fully compatible with the OpenAI API standard.
Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.
Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.