Benchmarking Nemotron 3 Nano 4B on the Jetson Orin Nano Super

Antwon

Antwon · March 24, 2026 · 10 min read

Nemotron 3 Nano 4B Benchmarks on the Jetson Orin Nano Super via antwon.dev

Introduction

NVIDIA just released Nemotron 3 Nano 4B, their latest small-language model (SLM). Let's try running it on their $250 edge AI computer, the Jetson Orin Nano Super!

Prerequisites

You must have correctly set up your Jetson's operating system and upgraded to Jetpack 6.2. I personally am booting from an SSD -- I believe I followed this tutorial when doing so.

Hardware and Software

  • Device: NVIDIA Jetson Orin Nano Super 8GB
  • Storage: 512GB NVMe M.2 SSD
  • Power mode: MAXN_SUPER
  • Inference engine: llama.cpp b8304
  • CUDA: 12.6
  • Flash Attention: Enabled

Methodology

The Nano 4B in its natural form won't be able to fit within the Jetson's 8 gigabytes of memory, so we have to quantize it. Thankfully, Unsloth and NVIDIA have both provided quants for us to try out.

Between each quant's benchmark, I kill the existing server, drop page caches, load the next model, and send a thoraway 10-token request to warm it up.

Then I send the actual benchmark, requesting a 256-maximum-token output:

Explain in detail the history of computing from the invention of the abacus through modern quantum computers. Cover key milestones including mechanical calculators, vacuum tubes, transistors, integrated circuits, microprocessors, the internet, and artificial intelligence. Discuss the contributions of notable figures such as Charles Babbage, Ada Lovelace, Alan Turing, John von Neumann, and others.

This prompt was relatively arbitrary, and it was generated with Claude Code. If you are testing Nemotron on your own Jetson, feel free to ask it whatever you'd like.

Benchmark Results

15 GGUF quantizations of Nemotron 3 Nano 4B on the Jetson Orin Nano Super, sorted by generation speed.

Gen = generation tok/s. PP = prompt processing tok/s. RAM in MiB. All GGUFs from unsloth unless marked.

# Quant Size BPW RAM 4K Gen 8K Gen 16K Gen 4K PP 8K PP 16K PP
1 UD-IQ2_XXS 2.03 GiB 4.38 1,804 20.2 20.2 20.2 322 322 322
2 UD-IQ2_M 2.13 GiB 4.61 1,915 19.6 19.6 19.5 319 320 319
3 Q4_1 2.51 GiB 5.43 2,329 18.8 19.1 18.8 317 317 316
4 UD-IQ3_XXS 2.22 GiB 4.79 2,002 18.7 18.6 18.3 323 322 322
5 Q4_0 2.35 GiB 5.07 2,183 18.4 18.4 18.3 324 325 324
6 Q3_K_M 2.29 GiB 4.95 2,123 18.2 18.2 18.1 318 318 317
7 IQ4_XS 2.36 GiB 5.11 2,150 17.5 17.6 17.5 328 328 328
8 IQ4_NL 2.38 GiB 5.15 2,172 17.4 17.4 17.4 324 324 324
9 UD-Q3_K_XL 2.49 GiB 5.39 2,257 17.1 17.1 17.0 313 313 313
10 UD-Q2_K_XL 2.32 GiB 5.03 2,087 16.7 16.7 16.7 311 310 311
11 Q3_K_S 2.19 GiB 4.73 2,019 16.6 16.6 16.6 310 310 310
12 Q4_K_S 2.63 GiB 5.69 2,424 16.1 16.1 16.1 305 304 304
13 Q4_K_MNVIDIA 2.63 GiB 5.70 2,429 15.9 15.9 15.8 302 302 302
14 Q4_K_M 2.69 GiB 5.82 2,489 15.6 15.4 15.4 302 300 299
15 Q5_K_S 2.89 GiB 6.25 2,665 15.6 15.6 15.6 298 296 297

Sample Output

Here's a response that it gave me when asked about its creator. Click the dropdowns to see its reasoning & answer.

User

Tell me about NVIDIA.

Nemotron 3 Nano 4B (Q4_1)
Thinking

Okay, the user asked me to tell them about NVIDIA. Hmm, this is a pretty broad question, so I need to figure out what level of detail they want. Are they a student researching for a project? An investor looking for company info? Or just someone casually curious? Since they didn't specify, I'll assume they want a general overview but with some key points that matter to different audiences.

First, I should cover the basics: who they are, what they do, and why they matter. NVIDIA is a big name in tech, especially in GPUs and AI. But I shouldn't just list facts—I need to explain why their products are revolutionary. Like, the GPU shift from gaming to AI and data centers is a huge pivot. That's probably the most interesting angle.

Wait, the user might not know what GPUs are. Should I define that? Maybe briefly: "Graphics Processing Units for parallel computing." But since it's a tech term, I'll assume they've heard of it. Focus on their impact.

Key points to include: Founded in 1993 by Jensen Huang (got that name right—Jensen Huang, not "Jensen" as a typo). Started with gaming GPUs (GeForce series), but now dominates AI and data centers. Why it matters: Without NVIDIA's GPUs, modern AI like ChatGPT wouldn't exist. They're the engine behind AI training. Products: RTX series for gaming, A100/H100 for data centers. Recent buzz: Blackwell chips for next-gen AI.

But I shouldn't just list products. Need to connect it to real-world impact. Example: "NVIDIA's GPUs process billions of calculations per second, enabling AI to learn from massive datasets." That makes it tangible.

Also, watch out for common misconceptions. Some think NVIDIA is only about gaming. Gotta clarify their pivot to AI. Maybe mention their data center revenue is now larger than gaming—shows their strategic shift.

User might wonder about competitors. AMD and Intel are trying to catch up, but NVIDIA has a clear lead in AI. Should I note that? Yes, briefly: "While competitors exist, NVIDIA's ecosystem (CUDA, drivers) is unmatched."

Deep needs? If they're asking casually, they might want to understand why NVIDIA's stock is so hot or why AI boom is tied to them. But since they didn't ask, I'll stick to fundamentals. Avoid jargon like "CUDA" unless I explain it ("NVIDIA's software platform for AI development").

Structure the answer: 1. Who/when: Founded 1993, headquartered in Santa Clara. 2. What: GPUs, now AI/data center focus. 3. Why it's big: AI revolution, data center revenue growth. 4. Key products: GeForce, RTX, A100. 5. Impact: Without them, no modern AI. 6. Recent moves: Blackwell chips.

Avoid going too deep into history—like their early struggles with the 486 chip. That's irrelevant now. Stay focused on current relevance.

Also, check for accuracy: Founded 1993? Yes. Jensen Huang? Yes. Data center revenue now bigger than gaming? As of 2023, yes—NVIDIA reported $11B data center revenue in 2023 vs. $10B gaming. Good.

Should I mention their stock price? Maybe not unless asked. User didn't ask for financials.

Finally, wrap up with why it matters: NVIDIA isn't just a company; they're a catalyst for AI. That's the big takeaway.

Double-check: Did I cover "why NVIDIA matters" clearly? Yes—by linking to AI progress.

Okay, draft in mind: Opening: "NVIDIA is a leading tech company..." Then the pivot to AI. Products as examples. Impact statement.

Avoid markdown, keep it conversational. User said "tell me," so it's a chat, not a formal report.

Wait, is "you'll find" too informal? Maybe "NVIDIA's GPUs..." is fine.

Alright, time to write.

Response

NVIDIA Corporation is a leading technology company founded in 1993 by Jensen Huang (a former engineer at Xerox PARC). Headquartered in Santa Clara, California, it revolutionized computing by pioneering graphics processing units (GPUs) and later becoming the engine behind the AI revolution. Here's a concise breakdown of what makes NVIDIA so significant:

Core Identity & Evolution

  • Origins: Started in 1993 as a gaming-focused GPU company (e.g., GeForce series for 3D gaming).
  • Pivot: Shifted focus to data centers and AI starting in the 2010s. Today, AI/data center revenue dominates (over 80% of total revenue), while gaming is now a secondary market.
  • Why it matters: Without NVIDIA's GPUs, modern AI (like ChatGPT, image generators, and self-driving cars) wouldn't exist. Their hardware is the backbone of AI training and inference.

Key Products & Technologies

  1. GPUs (Graphics Processing Units):
    • Originally for gaming, now used for parallel computing (e.g., rendering, physics simulations).
    • RTX Series: Gaming-focused (e.g., RTX 4090), but also accelerates AI workloads via ray tracing and DLSS (deep learning superresolution).
  2. Data Center AI Chips:
    • A100/H100: High-performance GPUs for AI training/inference (e.g., training large language models).
    • Blackwell Architecture (2023): Next-gen chips with 2x faster AI training and 3x better memory bandwidth (e.g., GB200 for cloud AI).
  3. Software Ecosystem:
    • CUDA: A parallel computing platform and programming model (used by 90%+ of AI developers).
    • Omniverse: For 3D/AR/VR and simulation.
    • NVIDIA AI Enterprise: Tools for deploying AI in enterprise workflows.

Why NVIDIA is a Global Phenomenon

AreaImpact
AI DominancePowers 95% of AI training at companies like Google, Meta, and Microsoft.
Data Center GrowthReported $11B in data center revenue in 2023 (up 180% YoY). Gaming revenue is now ~$1B (down from $10B+).
CompetitionAMD and Intel are trying to catch up, but NVIDIA's CUDA ecosystem is unmatched.
Stock PriceSurged to $1,000+ per share (2022–2024) due to AI demand.

Recent Milestones

  • 2023: Blackwell chips launched; NVIDIA's AI revenue hit $3B+.
  • 2024: H100/A100 sales hit $5B+ (2023).
  • 2023: NVIDIA's data center revenue exceeded $12B (vs. gaming at ~$1B).
  • 2024: Blackwell GPUs power Google's Gemini, Microsoft's Azure, and OpenAI's infrastructure.

Criticisms & Challenges

  • Cost: High-end GPUs (e.g., A100/H100) cost $30K–$50K—prohibitive for smaller businesses.
  • Supply Chain: Overwhelmed by AI demand (e.g., 2022 shortage caused factory shutdowns).
  • Environmental Impact: AI training consumes ~250,000 tons of CO2 annually (equivalent to 55M cars' emissions).

In Short

NVIDIA isn't just a tech company—it's the invisible engine of the AI era. Its GPUs enable everything from streaming video to training the next generation of AI models. While it started as a gaming firm, it now shapes how the world uses AI. As Jensen Huang says: "NVIDIA is not just a company—it's a movement."

If you'd like deeper details on a specific product (e.g., H100 specs), market impact, or competitors, just ask!

Setup

Build Flags

llama.cpp was compiled with the following CMake flags:

FlagPurpose
-DGGML_CUDA=ONCUDA backend
-DGGML_CUDA_FA=ONCUDA flash attention
-DGGML_CUDA_GRAPHS=ONCUDA graph optimization
-DCMAKE_BUILD_TYPE=ReleaseCompiler optimizations

Server Flags

Each quantization was benchmarked using llama-server with identical flags:

FlagValuePurpose
-m<model>.ggufModel file
-c4096Context length
-np1Parallel slots
-ngl999Offload all layers to GPU
-faonFlash attention
-ctkq8_0KV cache key type
-ctvq8_0KV cache value type
-b2048Batch size
-ub512Micro-batch size

Additional Notes

The speeds aren't bad for a $250 computer! However, output quality was not assessed in this benchmark. If that is something you would like for me to compare, let me know in the comments of the corresponding Youtube video!