Hey Folks,
If you haven’t introduced yourself please do so! It’s great to get to know each other.
Once you’ve done that, here’s the study plan and goals. Each session will be spaced roughly 1 to 3 weeks apart. Its still fluid so if you have thoughts leave them below.
Without further ado
Goals
Have great working knowledge of how these models work
Understand the various components of a model and workflows needed to build them
Implement one or two from scratch
Load and run a pretrained checkpoint from Hugging Face
Ways to shape LLMs through fine tuning, reinforcement learning
Session 0: Why the switch, why I’m interested, and study club focus
Kicking things off
Session 1: Building the foundation with basic neural nets
NN Frameworks and what they do
Model creation vs estimation
Backprop, vs SGD vs Bayesian
Session 2: Upping the level
More complicated feedforward neural networks
Different types of layers, how they work, we we use them
Session 3: Transformers and other NN Architectures
What are different neural network architectures
Why do they exist and how do they work
Why are transformers taking over everything?
Session 4: Language models focus
Basic Bayesian Language model
Neural Network Transformer Model
Session 5: Shaping LLMs
The strategies used to train your model to do what you want
Glad you mentioned this. We’ll definitely be separating the hype, from the math, from the reality in this study club. At the end you’ll have everything you need to form your own opinion
Although I’m sceptical of the hype around all this, I’m expecting to be asked a lot more by higher-ups to offer solutions with these techniques. So being able to see under the hype to understand A) If using these techniques to solve a problem offers value over simpler systems and B) Knowing the limits of these techniques to push back to higher ups before ideas become reality.
Looking forward to continue learning with you alll!
The LLMs are quite magical and it would be nice to know how that magic happens. That is my motivation for joining this book club. On the other hand, how often do people customize LLM for their own application? My guess is not that often. For that reason, do I really to study these?
This is a great question and one that is being actively answered by many organizations. Different companies and individuals are taking different approaches. We’ll talk about it in this book club in a couple of sessions
Hi. And thank you to Ravin for organizing this book club. I don’t work for a big firm but for a small startup. So my experience is not generalizable. But here is my 0.02$
Currently, training and fine tuning general LLMs like GPT-3.5/4 is not on the table for us. From our perspective, we are in exploration phase on how to best use this new technology. While for some tasks it might be useful, for others it might not be as useful. It’s also a resource question: should we as a company spending time and money on getting something LLM based or is that effort better spent elsewhere.
That resource question ultimately leads to the decision on whether or not to fine tune/ train a model for a custom use case. Are there other avenues we can explore: embeddings and clever prompting are two good ways of testing and gauging the possibilities with LLMs. Fine tuning, for us, is still a bit further away.