Open Source Contributions
While I work in defence, I love open source (OS)! I try to contribute to OS in my small ways. Here’s a list of my contributions:
- Fixed resuming checkpointing bug in
lit_gpt
byLightning-AI
. Previously resuming checkpoint may not take the latest checkpoint. - Fixed dataloader bug in
TinyLlama
, a popular pretraining codebase. The bug resulted in >35% of data being duplicated in the dataloader. - Accelerate support for GLM. Enable 8-bit inference and CPU offloading for GLM (SOTA Chinese LLM).
- Alpaca Indon dataset. Translated
alpaca
dataset to Indonesian using NLLB. - 豆瓣读书 dataset. Dataset containing chinese book reviews from 豆瓣读书.