Skip to content
Torad. LAB / 001
Talk to us
Torad Labs · Research Measured · re-runnable

Research

Research is the product.

The methods are open and the benchmarks are re-runnable. The papers are still in progress, with no invented arXiv numbers.

Measured · re-runnable · open to anyone

The validated set is the one measured original. The blog and the drafts build on it.

Measured, not claimed.

Every number names the model it ran on and the method behind it, and you can re-run them all. Claims we cannot yet stand behind stay on /technology, never here.

What we have measured

VALIDATED
What we claimWhat we measuredStatus
It needs about a quarter of the memoryQwen3-1.7B's linear weights take 870 MB in our 4-bit format (Torad Quant 4-bit, our own format. It stores each number using 4 bits instead of the usual 16, so the model takes about a quarter of the space.), against 2.7 GB for the same weights in the standard 16-bit format (Short for bfloat16, the standard 16-bit format AI models normally run in. Accurate, but it takes a lot of memory.). That is 31%.VALIDATED
It writes about as fast as the cloud softwareQwen3-1.7B writes 238 How fast the model writes. A token is roughly a short word or a few letters, so this is close to words per second. to the standard software's 267, within 10% (The most popular open-source software for running large AI models. Fast, but it needs a lot of memory to load a model.).VALIDATED
It runs models the cloud software cannot even fitQwen3-8B at 40 and Qwen3-14B at 21 tokens per second (8 and 14 billion The adjustable values inside a model. More parameters usually means a bigger, more capable, and more memory-hungry model. An "8B" model has 8 billion of them.) on an ordinary 16 GB graphics card, where vLLM cannot load them at all.VALIDATED
The shrunk model gives the same answers as the originalOn Qwen3-1.7B: A score from 0 to 1.00 for how alike two answers are, called cosine similarity. 1.00 means identical. We use it to compare the shrunk model against the full-size original. 0.9998, with the top answer matching on 5 of 5 test prompts (The shrunk model gives the same answers as the full-size original, down to the same top choice.).VALIDATED
It works on other model families tooGemma 4 E2B, a different base model, scores 0.9997 on 4 of 5 the same way.VALIDATED
It takes very little code to runAbout 2,500 lines of Rust and 1,300 of CUDA, against roughly 500,000 in the standard stack.VALIDATED

In progress, and honest about it.

No invented arXiv numbers. These write-ups are under way, marked for what they are: believed, not yet proven.

  • A 4-bit model that can still learn ASSERTED in progress How we keep a shrunk 4-bit model trainable. Most methods freeze it for good. No one else has published this.
  • Running and training in one step ASSERTED in progress The megakernel: running a model and training it in one pass, instead of two separate programs.
  • Shrinking a model without losing it ASSERTED in progress How TQ4 climbs from the naive baseline (0.85) to the same answers as the original (0.9998). The full write-up of the result above.

How we mark a claim.

VALIDATED means measured and re-runnable: a named model, a stated method, a number you can reproduce. Nothing else earns it.

ASSERTED means believed but not yet proven, like the drafts above. We keep it separate, and never show it as validated.

Be first to know.

When a paper ships or the platform opens, we write. No noise.

Be first to know

We write when a result is validated or a paper ships. That is all.