Archives

All the articles I've archived.

2026 ²

April ¹

Trading MatMuls for SRAM Lookups: A 3-Bit Edge Architecture

16 Apr, 2026

By trading heavy FP16 MatMuls for SRAM lookups and 1-bit additions, our custom quantization pipeline squeezes state-of-the-art models down to approx. 3 bits per weight with minimal accuracy loss. Here is how bypassing Tensor Cores could reshape the design of future edge AI chips.

February ¹

An approach to calibrating LLM reasoning effort

17 Feb, 2026

In this blog post, we will discuss about Controlling reasoning effort in LLMs ( the gpt-oss-style ) and Calibrating LLM Reasoning effort via Label-Free Alignment.

2025 ¹

December ¹

Stop Using Embeddings for Everything in RAG

11 Dec, 2025

Why deterministic query translation should often come before embeddings in enterprise RAG systems, and how to combine both in a hybrid approach.