Torad Edge · runtime runs offline

Your machine runs the model. The weights are yours.

Edge runs the model you trained with Etch on hardware you already own, fully offline. No per-token bill, no account, no round trip.

Start a model See how it works →

Runs offline. No data leaves the machine.

At a glance

A third of the memory. Parity speed. Zero egress.

31%of vLLM's memory

238 tok/svs 267 on vLLM · within 10%

8B + 14Brun on one 16 GB card

0bytes of egress at runtime

Qwen3-1.7B weights: 870 MB in vs 2.7 GB in bf16 · VALIDATED · Methods

Benchmarks

It runs models the cloud software cannot fit.

An 8B model does not fit a 16 GB card on vLLM. On Edge it runs at 40 tok/s. Cut the connection and compare.

Hosted cloud vs Torad Edge on a 16 GB card VALIDATED

	Hosted cloud online	Torad Edge local
8B · 16 GB	cannot fit	40 tok/s
14B · 16 GB	cannot fit	21 tok/s
data leaves	always	never

Same model, same card · vLLM cannot load the 8B or 14B at 16 GB · All benchmarks

How it works

One small program runs the whole thing.

One program serves the model and applies new weights in place. About 2,500 lines of Rust plus 1,300 of CUDA.

A · prompt in, answer outB · weights update in placeC · one auditable boundary

2,500 lines of Rust + 1,300 of CUDA; vLLM is ~500,000 · VALIDATED · The megakernel →

Why local

Private, unmetered, unthrottled.

Private by construction

Prompts and answers never leave the machine. There is no server to log them.

No per-token bill

The cost is hardware you already own. There is no usage charge.

No rate limits

It is your GPU. Throughput is whatever the card gives you.

Updates drop in

Retrain with Etch and the new weights replace the old file. No redeploy.

Quality

Shrinking a model usually makes it worse. Ours gives the same answers.

We ask the full-size model and the shrunk model the same questions, then score how alike the answers are. That score is the : 1.00 is identical.

Similarity to the full-size original

VALIDATED

Round-to-nearest0.85Round every number off. The model drifts into nonsense within a sentence.

Rotation + codebook0.91The best static methods. Answers still drift from the original.

TQ4 · ours0.9998The same answers, at a third of the size. The top answer matched on 5 of 5 test prompts.

dial · .80 to 1.00needle · 0.9998, oursticks · the static methods

Qwen3-1.7B, TQ4 against the bf16 original. Method on /technology.

Spec

What it runs on.

The megakernel, in full →

FAQ

Questions, answered.

What hardware does it need?

A 16 GB graphics card covers the 8B and 14B models on this page. An RTX Spark or any comparable local GPU works.

Does it ever need the internet?

No. After install, serving is fully offline. Cut the connection and it keeps answering.

Which models does it run?

The models Etch produces: small specialists in our 4-bit format, trained on your data.

How do updates work?

Retrain with Etch, drop the new weights in place, and the same program serves them. No redeploy.

Why not just use the cloud?

The cloud meters you, logs you, and changes under you. Edge is a file on your disk: same answers, none of the above.

Start

The weights stay yours. Pricing coming soon.

Tell us the model and what it needs to know. No account, no card, no obligation.