Limited Time SaleUS$5.17 cheaper than the new price!!
| Management number | 231976853 | Release Date | 2026/06/18 | List Price | US$3.44 | Model Number | 231976853 | ||
|---|---|---|---|---|---|---|---|---|---|
| Category | |||||||||
About this bookThe complete practical guide to reinforcement learning for working data scientists and ML engineers. Covers the full spectrum from foundational theory to the algorithms powering today's frontier AI — RLHF, DPO, GRPO, and the training pipelines behind ChatGPT, Claude, and DeepSeek-R1.Written intuition-first with real working code throughout. No prior RL experience required. Every major concept builds from a data scientist's existing knowledge: A/B testing, recommenders, hyperparameter tuning, and language models.Who this book is forData scientistsWho use LLMs and want to understand how RLHF, DPO, and GRPO actually work under the hood.ML engineersBuilding LLM-based products who need to know when to use RL, how to design rewards, and how to avoid failure modes.LLM practitionersWho want the mental model connecting training methodology to deployment behaviour across different model families.What you'll learnMDPs, Q-learning, policy gradients, PPO, and GAE from first principlesThe complete RLHF pipeline: reward modelling, PPO training, KL divergence controlDPO and next-generation preference optimisation (SimPO, IPO, KTO, ORPO)GRPO and RLVR: DeepSeek-R1's approach to verifiable reward trainingHow to build and RL-train LLM agents with tool use and multi-step planningEnd-to-end project: fine-tuning an LLM for SQL generation with RL (Colab-ready)RL for recommenders, HPO, bandits, and offline decision-makingProduction monitoring, reward design, and a system design checklistTable of contentsPart I–III · Foundations & Applications1 · What is RL?2 · Prerequisites3 · Markov chains & MDPs4 · Policies, values & rewards5 · Model-free RL & Q-learning6 · Policy gradients & PPO7 · RL for recommendation8 · RL for HPO & AutoML9 · Bandits & offline RLPart IV–VI · LLM Agents, RLHF & Production10 · LLM agents11 · Training agents with RL12 · When to use RL13–14 · LLM pipeline & reward models15 · RLHF with TRL16 · DPO & preference optimisation17 · DeepSeek, GRPO & reasoning18 · End-to-end SQL project19–20 · Design framework & futurePrerequisites+Python fluency+Basic ML (gradient descent, cross-entropy)+Familiarity with neural networks+Basic probability & statistics–No prior RL experience requiredFormat & resources·Digital & print editions·36 Google Colab-ready notebooks·Full GitHub companion repository·PyTorch · HuggingFace TRL · PEFT·English · April 2026"RL is no longer the exotic subfield in the corner of the ML curriculum. It is the mechanism by which the AI systems you use every day were built. The question is not whether to learn it. The question is how to learn it in a way that sticks." — from the Preface Read more
| ASIN | B0DHV7ZZR8 |
|---|---|
| XRay | Not Enabled |
| Language | English |
| File size | 10.3 MB |
| Page Flip | Enabled |
| Word Wise | Not Enabled |
| Print length | 611 pages |
| Accessibility | Learn more |
| Publication date | April 8, 2026 |
| Enhanced typesetting | Enabled |
If you notice any omissions or errors in the product information on this page, please use the correction request form below.
Correction Request Form