We built a data-free method for compressing heavy LLMs
Hey folks! I’ve been working with the team at Yandex Research on a way to make LLMs easier to run locally, without calibration data, GPU farms, or cloud setups. We just published a paper on HIGGS, a data-free quantization method that skips calibration entirely. No datasets or activations required. It’s meant to help teams compress and deploy big models like DeepSeek-R1 or Llama 4 Maverick on laptops or even mobile devices. The core idea comes from a theoretical link between per-layer reconstruction error and overall perplexity. This lets us: -Quantize models without touching the original data -Get decent performance at 3–4 bits per parameter -Cut inference costs and make LLMs more practical for edge use We’ve been using HIGGS internally for fast iteration and testing, and it's proven highl...








