Being able to train base LLMs. This is currently an alchemical skill since you can't learn it at school. This can be further split into infrastructure engineering (managing GPU clusters aint easy), data gathering and cleaning (at terabyte scale), the training itself, etc etc.
Being very good at fine tuning for a particular goal. Its much easier to learn fine-tuning, so standards are higher to stand out.
Being able to come up with architectural improvements for LLMs, aka the researcher path.
Wages start at $250k for grads at the big AI companies.
1. For BERT scale model, all you need is a good codebase from GitHub (I had some luck with this one [0]) and a few weeks of trial and error. Want to try training T5 or LLaMA, but don't have the resources needed. Of course training models with more than 100B parameters is another level of labyrinth.
2. Finetuning is mostly related to how well you understand the task and the data you are dealing with. Since the BERT paper focuses on the GLUE benchmark, I've become very proficient in fine-tuning GLUE and eventually got sick of it.
3. Made some architectural improvements to BERT, got decent results so I wrote a paper, and got rejected because the reviewers want a head-on evaluation against some well funded papers from Google.
Just curious what the current bar is here and which of the LLM-related skills might be worth building.