Has there been much research into slightly flawed matrix multiplications?
If you have a measure of correctness, and a measure of performance. Is there a maximum value of correctness per some unit of processing that exists below a full matrix multiply
Obviously it can be done with precision, since that is what floating point is. But is there anything where you can save x% of computation and have fewer than x% incorrect values in a matrix multiplications?
Gradient descent wouldn't really care about a few (Reliably) dud values.
Randomized matrix sketching is one way to get at this (see https://arxiv.org/abs/2302.11474), the problem is hardware is heavily optimized for dense multiplies so what you save in flops doesn't translate to real runtime speeds ups.
For semi-random weights you cam get down to 20-30% multiplications/mem reads and maintain ~0.98 cosine similarity output between the approximated and full result.
As far as LLM inference goes, the speedup from removing multiplications is at best comparable to the speedup of quantisation (that is - you get at best similar KL divergence score whether you remove calculations or quantise).
Well, approximate computing seems to be a superset of the field you describe here, with many different approaches, including analog computation. As you say, some algorithms care a bit less about precision, especially for LSBs.
If you have a measure of correctness, and a measure of performance. Is there a maximum value of correctness per some unit of processing that exists below a full matrix multiply
Obviously it can be done with precision, since that is what floating point is. But is there anything where you can save x% of computation and have fewer than x% incorrect values in a matrix multiplications?
Gradient descent wouldn't really care about a few (Reliably) dud values.