Skip to content
Torad. LAB / 001
Talk to us
Torad Etch · training your data, in the model

Torad Etch / training

Your data, in the model.

Etch trains a small specialist on your documents and tasks, then hands you the 4-bit weights: the same answers as the full-size original, yours to run anywhere. Not a database it queries.

continued pre-training · fine-tuning · reinforcement

A · IN THE WEIGHTS B · A LOOKUP IT QUERIES
A: your domain in the weights. B: the rejected way, a separate database the model queries.

What you bring, what you get

You give us your domain. We give you a specialist.

Four inputs go in. One artifact comes out, yours on a disk you control.

What you bring

  • Your domaindocuments and corpus it must know cold
  • Task exampleswhat it should do, in your format
  • Behavior spechow it should act, its voice
  • Eval setoptional, how you judge it

What you keep

  • A specialist that is yoursthe weights, on a disk you control. No metering, no account, no per-token bill.
  • Runs offline VALIDATEDon consumer hardware. Nothing leaves the machine.
  • Small, no loss in quality VALIDATEDa quarter of the memory, and it gives the We score it at 0.9998 out of 1.00 against the full-size original, where 1.00 means identical. See the proof below. as the full-size original.
  • Still improvable VALIDATEDmost shrunk models freeze for good. Yours keeps learning.

The pipeline

One pipeline, not three products bolted together.

Your domain goes through continued pre-training, fine-tuning, then reinforcement. Select a stage to see what it does.

Your domain runs through three stages and comes out as a 4-bit model you own.

your domain Torad Etch · one pipeline CPT SFT RL .tq4 4-bit specialist · yours

The 4-bit weights you train are the 4-bit weights you serve. Same kernel, same artifact, end to end. VALIDATED

The proof

Shrinking a model usually makes it worse. Ours gives the same answers.

Running on your machine means shrinking the model to 4-bit, which normally drifts the answers. We score how alike the shrunk and full-size answers come back, the A score from 0 to 1.00 for how alike two models' answers are. 1.00 means identical. It measures whether the answers point the same direction, not just whether the words match.: 1.00 is identical.

How alike is the shrunk model to the original?

VALIDATED
the original · bf16 0.9998 the print · TQ4 4-bit
Round-to-nearest0.85Round every number off. The answers drift into nonsense within a sentence.
Rotation + codebook0.91The careful static methods. You still feel the model get a little duller.
TQ4 · ours0.9998The same answers, at a quarter of the size. On 5 of 5 test prompts the top answer matched the original.
31%of the memory
870 MBvs 2.7 GB in bf16
238 / 267tok/s · within 10%
Measured on Qwen3-1.7B with our 4-bit Torad Quant 4-bit, our own format. It stores each number in 4 bits instead of the usual 16, so the model takes about a quarter of the space, and stays trainable rather than frozen. format: the same answers, a quarter of the size (870 MB, where Short for bfloat16, the standard 16-bit format AI models normally run in. Accurate, but it takes a lot of memory. needs 2.7 GB), at the same speed. It holds on other models: Google's Gemma 4 E2B scores 0.9997 on 4 of 5.

Two ways to run it

Managed, or self-serve. You own the weights either way.

The same pipeline, the same artifact. The only question is whose machine it runs on.

Managed

pricing coming soon

You bring the domain. We run the pipeline and hand back the specialist. Available now. Kandi is proof.

Self-serve

pricing coming soon

Run the same pipeline yourself and keep everything in-house. The weights never leave your machines.

Pricing lands soon. We write when the numbers are set and when self-serve opens. No noise.

Be first to know

What we will not claim yet

The line between what we proved and what we believe.

Measured, and re-runnable

Green.

The 31% memory, the quality score, the trainable 4-bit substrate, the same result across model families. You can run each one yourself.

Believed, not yet proven to your standard

Amber.

"7x fewer steps", "first 2B to chain function calls", QGRE R6 completeness, on-device vs cloud. We keep these off the page above and name them in the open. See them named →

Start a model.

Tell us the domain and we tell you what the specialist can do. Whatever we train, you own.

Start a model