Torad Labs · Research Measured · re-runnable

Research is the product.

Open methods, re-runnable benchmarks, and the numbers behind every claim on this site.

Validated results come first. The blog and the drafts build on them.

Every number here is re-runnable.

Each result names the model it ran on and the method behind it. Open items live on /technology.

0.9998similarity vs bf16 original

31%of vLLM memory

238 tok/svs 267 on vLLM

5 / 5top answers matched

What we have measured

VALIDATED

What we claim	What we measured
It needs about a third of the memory	870 MB vs 2.7 GB (31%). Qwen3-1.7B linear weights, against .
It writes about as fast as the cloud software	238 vs 267 , within 10%. Qwen3-1.7B against .
It runs models the cloud software cannot even fit	40 and 21 tokens per second for Qwen3-8B and Qwen3-14B (8 and 14 billion ) on an ordinary 16 GB graphics card, where vLLM cannot load them at all.
The shrunk model gives the same answers as the original	0.9998 on Qwen3-1.7B, top answer matching on 5 of 5 test prompts ().
It works on other model families too	0.9997 on 4 of 5 for Gemma 4 E2B, a different base model, the same way.
It takes very little code to run	2,500 lines of Rust and 1,300 of CUDA, against roughly 500,000 in the standard stack.

In the order they were written. These links open blog.torad.ai.

They land here as they finish.

A 4-bit model that can still learn ASSERTED in progress How we keep a shrunk 4-bit model trainable. Most methods freeze it for good. No one else has published this.
Running and training in one step ASSERTED in progress The megakernel: running a model and training it in one pass, instead of two separate programs.
Shrinking a model without losing it ASSERTED in progress How TQ4 climbs from the naive baseline (0.85) to the same answers as the original (0.9998). The full write-up of the result above.

VALIDATED means measured and re-runnable: a named model, a stated method, a number you can reproduce.

ASSERTED means believed but not yet proven, like the drafts above.