Torad Edge / the model, on your machine
Your machine runs the model. The weights are yours.
Edge runs the model you trained with Etch on hardware you already own, fully offline. No per-token bill, no account, no round trip.
Local · offline · bit-exact at 4-bit
Why it fits
It runs models the cloud software cannot fit.
An 8B or 14B model fits on a card you can buy. Our format shrinks it to about a third of the size at the same speed.
31%
of the memory VALIDATED
Qwen3-1.7B's weights take 870 MB in our 4-bit format (Torad Quant 4-bit, our own format. It stores each number using 4 bits instead of the usual 16, so the model takes about a quarter of the space.) against 2.7 GB in 16-bit (Short for bfloat16, the standard 16-bit format AI models normally run in. Accurate, but it takes a lot of memory.). That headroom is why an 8B or 14B model fits on an ordinary 16 GB graphics card.
238 / 267
tok/s · within 10% VALIDATED
Qwen3-1.7B writes 238 against the cloud software's 267 How fast the model writes. A token is roughly a short word or a few letters, so this is close to words per second., within ten percent. Same speed, a third of the size.
The cloud column cannot fit these models, and your data leaves it every time. Cut the connection and watch which column stays lit.
| Hosted cloud online | Torad Edge local | |
|---|---|---|
| 8B model, 16 GB GPU | cannot fit | 40 tok/s |
| 14B model, 16 GB GPU | cannot fit | 21 tok/s |
| your data leaves | always | never |
The runtime
One small program runs the whole thing.
Your prompt goes in, the answer comes out, and the runtime is small enough for one person to read end to end. Retrain with Etch and the new weights drop in place, no redeploy.
Above / in use
One program answers your prompt, and new weights drop into it without a restart. Nothing leaves the room.
Below / taken apart
Choose a part to see it: A the path that serves an answer, B where retrained weights drop in, C the small program one owner can read.
Lossless at 4-bit
Shrinking an AI usually wrecks it. Done right, you cannot tell.
We ask the full-size model and the shrunk model the same questions, then score how alike the answers come back. That score is the A score from 0 to 1.00 for how alike two models' answers are. 1.00 means identical. It measures whether the answers point the same direction, not just whether the words match.: 1.00 is identical.
How alike is the shrunk model to the original?
VALIDATEDStart
The model is yours either way. Pricing coming soon.
Tell us the model and what it needs to know, and we will tell you what it can do and when Edge ships. No account, no card, no obligation.