Tag: llm

All the articles with the tag "llm".

Trading MatMuls for SRAM Lookups: A 3-Bit Edge Architecture

16 Apr, 2026

By trading heavy FP16 MatMuls for SRAM lookups and 1-bit additions, our custom quantization pipeline squeezes state-of-the-art models down to approx. 3 bits per weight with minimal accuracy loss. Here is how bypassing Tensor Cores could reshape the design of future edge AI chips.
An approach to calibrating LLM reasoning effort

17 Feb, 2026

In this blog post, we will discuss about Controlling reasoning effort in LLMs ( the gpt-oss-style ) and Calibrating LLM Reasoning effort via Label-Free Alignment.
Stop Using Embeddings for Everything in RAG

11 Dec, 2025

Why deterministic query translation should often come before embeddings in enterprise RAG systems, and how to combine both in a hybrid approach.

Trading MatMuls for SRAM Lookups: A 3-Bit Edge Architecture