Behohippy

Behohippy@lemmy.world · 2 years ago

I’ve got a background in deep learning and I still struggle to understand the attention mechanism. I know it’s a key/value store but I’m not sure what it’s doing to the tensor when it passes through different layers.

Behohippy@lemmy.world · 2 years ago

Any data sets produced before 2022 will be very valuable compared to anything after. Maybe the only way we avoid this is to stick to training LLMs on older data and prompt inject anything newer, rather than training for it.