Has there been much research into slightly flawed matrix multiplications? If you...

wuubuu · 2025-07-18T23:19:22 1752880762

Randomized matrix sketching is one way to get at this (see https://arxiv.org/abs/2302.11474), the problem is hardware is heavily optimized for dense multiplies so what you save in flops doesn't translate to real runtime speeds ups.

kolinko · 2025-07-19T17:05:38 1752944738

I did research on vector-matrix last year:

https://kolinko.github.io/effort/

For semi-random weights you cam get down to 20-30% multiplications/mem reads and maintain ~0.98 cosine similarity output between the approximated and full result.

As far as LLM inference goes, the speedup from removing multiplications is at best comparable to the speedup of quantisation (that is - you get at best similar KL divergence score whether you remove calculations or quantise).

MayeulC · 2025-07-19T12:05:38 1752926738

Well, approximate computing seems to be a superset of the field you describe here, with many different approaches, including analog computation. As you say, some algorithms care a bit less about precision, especially for LSBs.

WithinReason · 2025-07-19T11:54:26 1752926066

If you do it in 8-bit it's usually 2x as fast as 16 bit on Tensorcores