強化学習（Reinforcement Learning）の変更点

追加された行はこの色です。
削除された行はこの色です。
強化学習（Reinforcement Learning）へ行く。
強化学習（Reinforcement Learning）の差分を削除
「[[.NET 開発基盤部会 Wiki>http://dotnetdevelopmentinfrastructure.osscons.jp]]」は、「[[Open棟梁Project>https://github.com/OpenTouryoProject/]]」,「[[OSSコンソーシアム .NET開発基盤部会>https://www.osscons.jp/dotNetDevelopmentInfrastructure/]]」によって運営されています。

-戻る（[[人工知能（AI）]]
--[[機械学習（machine learning）]]
--[[深層学習（deep learning）]]
--強化学習（Reinforcement Learning）
--[[生成系AI（Generative AI）]]

*目次 [#c9745c17]
#contents

*概要 [#hcf0e200]

*詳細 [#ecde0dff]

**[[強化学習>#hef24735]] [#pf1151d2]

-価値ベース
--多腕バンディット問題
--動的計画法
--モンテカルロ法
--TD法系
---TD法
---Q学習
---SARSA

-方策ベース
--方策勾配法

**深層強化学習 [#s6c0e580]

-価値ベース
--DQN

-方策ベース
--Actor-Critic
--A3C

*参考 [#hef24735]
-コンテンツへのリンク - OSSコンソーシアム~
https://www.osscons.jp/joy1y64w3-537
--深層学習についてのレポート（強化学習、深層強化学習）~
https://1drv.ms/p/s!Amfs5caPP9r5kDBBIzKQ-QAF1tWl
--機械学習・深層学習についてのNotebook~
https://github.com/OpenTouryoProject/DxCommon/tree/master/Notebook/Jupyter
https://github.com/OpenTouryoProject/DxCommon/tree/master/Notebook/path

**YouTube [#dac5202f]

***AIcia Solid Project [#x77fa067]
再生リスト 強化学習の探検 - YouTube~
https://www.youtube.com/playlist?list=PLhDAH9aTfnxI1OywfnxXCDTWGtYL2NxJR

***データサイエンス研究所 [#saeb93a9]
再生リスト 強化学習 - YouTube~
https://www.youtube.com/playlist?list=PL7BUpEjz_maQjfwIhAzkwxaLYIecfN7QP

**ゼロから作るDeep Learning [#ya905060]
-強化学習編~
https://www.oreilly.co.jp/books/9784873119755/

***サンプル [#v4432aa1]
https://github.com/oreilly-japan/deep-learning-from-scratch-4
-https://github.com/oreilly-japan/deep-learning-from-scratch-4/tree/master/ch01
-https://github.com/oreilly-japan/deep-learning-from-scratch-4/tree/master/ch02
-https://github.com/oreilly-japan/deep-learning-from-scratch-4/tree/master/ch03
-https://github.com/oreilly-japan/deep-learning-from-scratch-4/tree/master/ch04
-https://github.com/oreilly-japan/deep-learning-from-scratch-4/tree/master/ch05
-https://github.com/oreilly-japan/deep-learning-from-scratch-4/tree/master/ch06
-https://github.com/oreilly-japan/deep-learning-from-scratch-4/tree/master/ch07
-https://github.com/oreilly-japan/deep-learning-from-scratch-4/tree/master/ch08
-https://github.com/oreilly-japan/deep-learning-from-scratch-4/tree/master/notebooks

***その他、参考 [#d6a5884b]
-ゼロつく4 - からっぽのしょこ【ゼロつく4のノート】~
https://www.anarchive-beta.com/archive/category/%E6%94%BB%E7%95%A5%E3%83%8E%E3%83%BC%E3%83%88-%E3%82%BC%E3%83%AD%E3%81%A4%E3%81%8F4

--1章 バンディット問題
---1.2：バンディット問題~
https://www.anarchive-beta.com/entry/2022/05/01/180000
---1.4.1：スロットマシンの実装~
https://www.anarchive-beta.com/entry/2022/05/02/180000
---1.4.2：エージェントの実装~
https://www.anarchive-beta.com/entry/2022/05/03/180000
---1.4.3-4：バンディット問題の学習~
https://www.anarchive-beta.com/entry/2022/05/04/180000
---1.5.0：非定常問題のスロットマシンの実装~
https://www.anarchive-beta.com/entry/2022/05/05/180000
---1.5.1：非定常問題のエージェントの実装~
https://www.anarchive-beta.com/entry/2022/05/06/180000
---1.5.2：非定常バンディット問題の学習~
https://www.anarchive-beta.com/entry/2022/05/07/180000

--2章 マルコフ決定過程
---2.2：環境とエージェントの定式化~
https://www.anarchive-beta.com/entry/2022/05/18/180000
---2.3：収益と状態価値関数~
https://www.anarchive-beta.com/entry/2022/05/19/180000

--3章 ベルマン方程式
---3.1.1：報酬の期待値計算~
https://www.anarchive-beta.com/entry/2022/05/20/180000
---3.1.2：状態価値関数のベルマン方程式の導出~
https://www.anarchive-beta.com/entry/2022/05/21/180000
---3.2.1：状態価値関数のベルマン方程式の例~
https://www.anarchive-beta.com/entry/2022/05/22/180000
---3.3.1：行動価値関数~
https://www.anarchive-beta.com/entry/2022/05/25/180000
---3.3.2：行動価値関数のベルマン方程式の導出~
https://www.anarchive-beta.com/entry/2022/05/26/180000
---3.4：ベルマン最適方程式~
https://www.anarchive-beta.com/entry/2022/05/27/180000
---3.5.1：ベルマン最適方程式の適用~
https://www.anarchive-beta.com/entry/2022/05/28/180000
---3.5.2：最適方策~
https://www.anarchive-beta.com/entry/2022/05/29/180000

--4章 動的計画法
---4.1：動的計画法と方策評価~
https://www.anarchive-beta.com/entry/2022/06/03/190000
---4.2.1：GridWorldクラスの実装：評価と改善に関するメソッド~
https://www.anarchive-beta.com/entry/2022/06/05/190000
---4.2.1：GridWorldクラスの実装：可視化に関するメソッド~
https://www.anarchive-beta.com/entry/2022/10/24/190000
---4.2.3：反復方策評価の実装~
https://www.anarchive-beta.com/entry/2022/06/07/190000
---4.3：方策反復法~
https://www.anarchive-beta.com/entry/2022/06/08/190000
---4.4：方策反復法の実装~
https://www.anarchive-beta.com/entry/2022/06/09/190000
---4.5.1：価値反復法の導出~
https://www.anarchive-beta.com/entry/2022/06/10/190000
---4.5.2：価値反復法の実装~
https://www.anarchive-beta.com/entry/2022/06/11/190000

--5章 モンテカルロ法
---5.2：モンテカルロ法による方策評価~
https://www.anarchive-beta.com/entry/2022/10/25/190000
---5.3：モンテカルロ法による方策評価の実装~
https://www.anarchive-beta.com/entry/2022/10/26/190000
---5.4.1-2：モンテカルロ法による方策制御の実装~
https://www.anarchive-beta.com/entry/2022/10/27/190000
---5.4.3-5：モンテカルロ法による方策反復法の実装~
https://www.anarchive-beta.com/entry/2022/10/28/190000
---5.5：重点サンプリング~
https://www.anarchive-beta.com/entry/2022/11/02/190000
---付録A：方策オフ型のモンテカルロ法~
https://www.anarchive-beta.com/entry/2022/11/03/190000

--6章 TD法
---6.1：TD法による方策評価~
https://www.anarchive-beta.com/entry/2022/11/08/193000
---6.2：SARSA~
https://www.anarchive-beta.com/entry/2022/11/09/193000
---6.3：方策オフ型のSARSA~
https://www.anarchive-beta.com/entry/2022/11/10/193000
---6.4：Q学習~
https://www.anarchive-beta.com/entry/2022/11/11/193000
---6.5：サンプルモデル版のQ学習~
https://www.anarchive-beta.com/entry/2022/11/12/193000

--7章 ニューラルネットワークとQ学習
---7.1.3：勾配降下法~
https://www.anarchive-beta.com/entry/2022/11/13/193000
---7.2：線形回帰~
https://www.anarchive-beta.com/entry/2022/11/14/193000
---7.3.1-3：ニューラルネットワーク~
https://www.anarchive-beta.com/entry/2022/11/15/193000
---7.3.5：オプティマイザ(最適化手法)~
https://www.anarchive-beta.com/entry/2022/11/16/193000
---7.4：Q学習とニューラルネットワーク~
https://www.anarchive-beta.com/entry/2022/11/17/193000

--8章 DQN
---8.1：OpenAI Gym：Classic Control~
https://www.anarchive-beta.com/entry/2022/11/22/180000
---8.2：DQNのコア技術~
https://www.anarchive-beta.com/entry/2022/11/26/200000
強化学習（Reinforcement Learning） の変更点

強化学習（Reinforcement Learning）の変更点