Research
Research is the product.
The methods are open and the benchmarks are re-runnable. The papers are still in progress, with no invented arXiv numbers.
Measured · re-runnable · open to anyone
Measured, not claimed.
Every number names the model it ran on and the method behind it, and you can re-run them all. Claims we cannot yet stand behind stay on /technology, never here.
What we have measured
VALIDATED| What we claim | What we measured | Status |
|---|---|---|
| It needs about a quarter of the memory | Qwen3-1.7B's linear weights take 870 MB in our 4-bit format (Torad Quant 4-bit, our own format. It stores each number using 4 bits instead of the usual 16, so the model takes about a quarter of the space.), against 2.7 GB for the same weights in the standard 16-bit format (Short for bfloat16, the standard 16-bit format AI models normally run in. Accurate, but it takes a lot of memory.). That is 31%. | VALIDATED |
| It writes about as fast as the cloud software | Qwen3-1.7B writes 238 How fast the model writes. A token is roughly a short word or a few letters, so this is close to words per second. to the standard software's 267, within 10% (The most popular open-source software for running large AI models. Fast, but it needs a lot of memory to load a model.). | VALIDATED |
| It runs models the cloud software cannot even fit | Qwen3-8B at 40 and Qwen3-14B at 21 tokens per second (8 and 14 billion The adjustable values inside a model. More parameters usually means a bigger, more capable, and more memory-hungry model. An "8B" model has 8 billion of them.) on an ordinary 16 GB graphics card, where vLLM cannot load them at all. | VALIDATED |
| The shrunk model gives the same answers as the original | On Qwen3-1.7B: A score from 0 to 1.00 for how alike two answers are, called cosine similarity. 1.00 means identical. We use it to compare the shrunk model against the full-size original. 0.9998, with the top answer matching on 5 of 5 test prompts (The shrunk model gives the same answers as the full-size original, down to the same top choice.). | VALIDATED |
| It works on other model families too | Gemma 4 E2B, a different base model, scores 0.9997 on 4 of 5 the same way. | VALIDATED |
| It takes very little code to run | About 2,500 lines of Rust and 1,300 of CUDA, against roughly 500,000 in the standard stack. | VALIDATED |
The long-form work.
Posts on training, quantization, and model geometry, in the order they were written. These links open blog.torad.ai.
- TrainingSFT copies, GRPO discoversFine-tuning imitates. Reinforcement finds. Why the split shapes the whole pipeline.
- MethodRejection architectureBuild what a model refuses into its design, instead of patching it in later.
- TheoryImpedance matching in communicationTransfer peaks when source and load match. Applied to models and the people who use them.
- MethodManufactured discoveryHow to make a model find the answer instead of being told it.
- The labIntroducing EliThe orchestrator behind the lab, and the voice of the work.
In progress, and honest about it.
No invented arXiv numbers. These write-ups are under way, marked for what they are: believed, not yet proven.
- A 4-bit model that can still learn ASSERTED in progress How we keep a shrunk 4-bit model trainable. Most methods freeze it for good. No one else has published this.
- Running and training in one step ASSERTED in progress The megakernel: running a model and training it in one pass, instead of two separate programs.
- Shrinking a model without losing it ASSERTED in progress How TQ4 climbs from the naive baseline (0.85) to the same answers as the original (0.9998). The full write-up of the result above.
How we mark a claim.
VALIDATED means measured and re-runnable: a named model, a stated method, a number you can reproduce. Nothing else earns it.
ASSERTED means believed but not yet proven, like the drafts above. We keep it separate, and never show it as validated.
Be first to know.
When a paper ships or the platform opens, we write. No noise.