kitaverband-speyer.de

SafeSearch Not set

New Arrivals/Restock

Reinforcement Learning for Data Scientists: From Intuition to LLMs

Name: Reinforcement Learning for Data Scientists: From Intuition to LLMs
Brand: kitaverband-speyer.de
SKU: 231976853
Price: 3.44 USD
Availability: InStock
Rating: 4.2 (105 reviews)

4.2 (105 items)

Limited Time Sale

Until the end

New US$8.61 (tax included) Number of stocks: 1

Used US$3.44 (tax included) New Arrivals and Restocks Number in stock: 1

US$5.17 cheaper than the new price!!

Free shipping for purchases over $99 ( Details )
Free cash-on-delivery fees for purchases over $99

Other shops (12) $99 ~

See all stores

Please note that the sales price and tax displayed may differ between online and in-store. Also, the product may be out of stock in-store.

Used US$3.44

Product details

Management number	231976853	Release Date	2026/06/18	List Price	US$3.44	Model Number	231976853
Category	Kindle Store Kindle eBooks Computers & Technology Computer Science Artificial Intelligence Generative AI

About this bookThe complete practical guide to reinforcement learning for working data scientists and ML engineers. Covers the full spectrum from foundational theory to the algorithms powering today's frontier AI — RLHF, DPO, GRPO, and the training pipelines behind ChatGPT, Claude, and DeepSeek-R1.Written intuition-first with real working code throughout. No prior RL experience required. Every major concept builds from a data scientist's existing knowledge: A/B testing, recommenders, hyperparameter tuning, and language models.Who this book is forData scientistsWho use LLMs and want to understand how RLHF, DPO, and GRPO actually work under the hood.ML engineersBuilding LLM-based products who need to know when to use RL, how to design rewards, and how to avoid failure modes.LLM practitionersWho want the mental model connecting training methodology to deployment behaviour across different model families.What you'll learnMDPs, Q-learning, policy gradients, PPO, and GAE from first principlesThe complete RLHF pipeline: reward modelling, PPO training, KL divergence controlDPO and next-generation preference optimisation (SimPO, IPO, KTO, ORPO)GRPO and RLVR: DeepSeek-R1's approach to verifiable reward trainingHow to build and RL-train LLM agents with tool use and multi-step planningEnd-to-end project: fine-tuning an LLM for SQL generation with RL (Colab-ready)RL for recommenders, HPO, bandits, and offline decision-makingProduction monitoring, reward design, and a system design checklistTable of contentsPart I–III · Foundations & Applications1 · What is RL?2 · Prerequisites3 · Markov chains & MDPs4 · Policies, values & rewards5 · Model-free RL & Q-learning6 · Policy gradients & PPO7 · RL for recommendation8 · RL for HPO & AutoML9 · Bandits & offline RLPart IV–VI · LLM Agents, RLHF & Production10 · LLM agents11 · Training agents with RL12 · When to use RL13–14 · LLM pipeline & reward models15 · RLHF with TRL16 · DPO & preference optimisation17 · DeepSeek, GRPO & reasoning18 · End-to-end SQL project19–20 · Design framework & futurePrerequisites+Python fluency+Basic ML (gradient descent, cross-entropy)+Familiarity with neural networks+Basic probability & statistics–No prior RL experience requiredFormat & resources·Digital & print editions·36 Google Colab-ready notebooks·Full GitHub companion repository·PyTorch · HuggingFace TRL · PEFT·English · April 2026"RL is no longer the exotic subfield in the corner of the ML curriculum. It is the mechanism by which the AI systems you use every day were built. The question is not whether to learn it. The question is how to learn it in a way that sticks." — from the Preface Read more

ASIN	B0DHV7ZZR8
XRay	Not Enabled
Language	English
File size	10.3 MB
Page Flip	Enabled
Word Wise	Not Enabled
Print length	611 pages
Accessibility	Learn more
Publication date	April 8, 2026
Enhanced typesetting	Enabled

Correction of product information

If you notice any omissions or errors in the product information on this page, please use the correction request form below.

Correction Request Form

Product Review

You must be logged in to post a review

4.2 ( 105 items )

	15 items
	5 items
	2 items
	1 items
	0 items

Sort
keyword

There are currently no product reviews.

Shipping Rates

Order Amount	Shipping Fee	Handling Fee
Under $99	$12.99	$24.00
$99 - $499	FREE	$24.00
$500 and above	FREE	FREE

Delivery Time

Standard Shipping: 5-7 business days
Express Shipping: 2-3 business days (additional $15)
Overnight Shipping: Next business day (additional $35)

Available Regions

We ship to all 50 US states, Canada, and select international destinations through our partner Neokyo.

Diameter	12 feet (3.66m)
Height	30 inches (76cm)
Water Capacity	1,718 gallons (6,500L)
Weight (Empty)	42 lbs (19kg)

Reinforcement Learning for Data Scientists: From Intuition to LLMs

Product details

Bestseller ranking

Generative AI

Google Cloud Digital Leader and Generative AI Leader - 2-in-1 Fast Track Study Guide

AI觀測者 模型的壓力環境: 推理真正啟動的底層條件 (Traditional Chinese Edition)

Hardcoded: AI and the End of the Scientific Consensus (The Mathematics of Evolution)

Automatisation IA : 50 Workflows Puissants pour Gagner 10 à 20 Heures par Semaine (Sans Coder) (French Edition) Kindle Edition

Advanced Computing and AI for Sustainable Environment, Energy, and Smart Manufacturing: AI-Driven Sustainability Across Environment, Energy, and Industry 1st Edition, Kindle Edition

Artificial Intelligence In Emergency Medicine: A Practical Guide To Ai, Machine Learning, And Digital Decision Support for Acute Care Clinicians

Customers who viewed this product also viewed

Salad Tools

DENEST Portable Mini Manual Dehydrator Washing Machine Compact Washer Spin Dryer Barrel

Correction of product information

Product Review

AI觀測者模型的壓力環境: 推理真正啟動的底層條件 (Traditional Chinese Edition)