17

My kernels go 2x faster than MKL for matrices that fit in L2 cache, which makes them a work in progress, since the speedup works best for prompts having fewer than 1,000 tokens.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here
this post was submitted on 01 Apr 2024
17 points (90.5% liked)

Performance

286 readers
1 users here now

A community for posts relating to performance

Wormhole

!programming@programming.dev

founded 11 months ago
MODERATORS