To make this book club most useful for let me know what know, and what you don’t know! By doing both you’ll also implicitly tell me what you don’t know that you don’t know.
And with all three sets we can maximize the usefulness to time ratio of this study club!
So let me kick it off
I know
- The basics of LLMs implementation
- The idea behind transformers and the general intution
- A lot about open source vs closed source
- The fundamental mathematics, such as matrix multiplication, softmax etc
- How to code basic models
I don’t know
- The specifics of why to pick say 4 self attention heads versus another
- The specific details of the newer llama models versus HF models
- How to comfortably train these models on my own commodity hardware
- The details of methods used to shape models such as like LORA, SFT, PEFT
- How choices of hyperparameters like dropout etc are chosen in NN