alignrl
Open-source LLM post-training playbook covering SFT, GRPO, DPO, evaluation, and inference. Published on PyPI with Colab notebooks, a docs site, and 49 passing tests. One pip install to fine-tune any Hugging Face model.
About
alignrl is a Python toolkit that makes LLM post-training accessible. It wraps Unsloth, TRL, and lm-evaluation-harness behind a clean config-driven API so you can run supervised fine-tuning (QLoRA via Unsloth), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), standardized evaluation benchmarks, and vLLM/Unsloth inference serving. Each pipeline is available as a Python API, a CLI command, or a Colab notebook. Published to PyPI as v0.2.0 with CI/CD via GitHub Actions, a GitHub Pages documentation site, and a comprehensive test suite.