LLM Theory
Published:
Awesome LLM theory articles I chanced upon
Published:
Awesome LLM theory articles I chanced upon
Published:
Compilation of tricks I find useful.
Published:
With yesterday’s results release, I have finally graduated! Having been tested for many years of my life, here are some study tips I found useful for my own undergraduate studies. I hope they come in useful for a junior out there :).
Published:
Need access to NUS computing resources but not sure how? Here’s a quick crash course!
Published:
As I’m learning more about 3D parallelism, I wonder - suppose that every device takes in a batch of tensors where the tensors across devices are of different sizes, will 3D parallelism still work? Turns out, it works for data and pipeline parallelism, but tensor parallelism will need some work.
Published:
Retrieval Augmented Generation (RAG) is a framework for retrieving facts from an external knowledge base to ground large language models (LLMs) on the most accurate, up-to-date information. RAG is increasingly popular in industry as it’s simple to implement yet powerful. Here I’ll share some tricks to improve RAG systems.
Published:
If we are allowed to train till convergence, we know that full finetuning is better than parameter efficient finetuning (PEFT). But what if we have a fixed compute budget? Given a fixed budget, PEFT can go through significantly more tokens. Will full finetuning still be better than PEFT?
Published:
How do we train causal language models (e.g. Alpaca, LLaMA, gpt-neox-20b…) with seq2seq objective? This goal is important because we want to instruction-tune our causal LMs, especially since Alpaca is the best open model at time of writing.
Published:
Just my personal notes!
Published:
As I step into this world of research, I realised I’m still in the midst of discovering what I’m truly interested in. To this end, I hope to read papers solely because I’m interested in it (instead of, say, to do literature review for my current research).