データマイニング（DM）- Python - RNN のバックアップ(No.4)

バックアップ一覧
差分を表示
現在との差分を表示
ソースを表示
データマイニング（DM）- Python - RNN へ行く。
- 1 (2022-11-29 (火) 23:44:23)
- 2 (2022-11-29 (火) 23:55:39)
- 3 (2022-11-30 (水) 17:54:53)
- 4 (2022-11-30 (水) 19:56:29)
- 5 (2023-04-20 (木) 16:29:39)
- 6 (2023-10-24 (火) 09:29:40)

「.NET 開発基盤部会 Wiki」は、「Open棟梁Project」,「OSSコンソーシアム .NET開発基盤部会」によって運営されています。散布図

戻る
- CRISP-DM
- Excel
- KNIME
- Python
- Python - DL
  - Python - DNN
  - Python - RNN
  - Python - CNN

DataSet

↑

概要 †

RNN、LSTM

↑

詳細 †

↑

時系列予測 †

↑

データ †

https://github.com/AileenNielsen/TimeSeriesAnalysisWithPython/blob/master/data/AirPassengers.csv

ロード

ダウンロード

url = 'https://raw.githubusercontent.com/AileenNielsen/TimeSeriesAnalysisWithPython/master/data/AirPassengers.csv'
from urllib import request
request.urlretrieve(url, './work/AirPassengers.csv')

ロード

df = pd.read_csv('./work/AirPassengers.csv')

確認

先頭
```
df.head()
```

後尾
```
df.tail()
```

列名変更
```
df.columns = ['Month', 'Passengers']
```

可視化

plt.plot(df['Passengers'])
plt.xticks(np.arange(0, 145, 12))
plt.grid()
plt.show()

基本成分に分解

from statsmodels.tsa.seasonal import seasonal_decompose
sd = seasonal_decompose(df['Passengers'].values, period=12) # periodで周期を指定
sd.plot()
plt.show()

準備

型変換
Kerasが扱える型に変換

data = df['Passengers'].values.astype('f')

正規化
- しないと上手く学習できない。
- X・Yともに正規化するので、推論結果はscaleを掛けて戻す。
```
scale = data.max()
data /= scale
```

説明系列と目的系列、目的系列の教師データ化
このケースでは説明系列＝目的系列なので、
```
x = data[:-1]
y = data[1:]
print('x:',len(x))
print('y:', len(y))
```

shape変換
Kerasの時系列解析用にshape変換
・ x : 説明系列の配列（複数の説明系列）の配列（バッチ）の可能性で３次元
・ y : 目的系列の配列（バッチ）の可能性で２次元
```
print('x:', np.shape(x), ' y:', np.shape(y))
x = x.reshape(len(x), 1, 1)
y = y.reshape(len(y), 1)
print('x:', np.shape(x), ' y:', np.shape(y))
```

時系列を維持して訓練・テストのデータ分割

# 訓練データのサンプル数を指定
train_size = int(len(data) * 0.7)
# データの分割
x_train = x[:train_size]
x_test = x[train_size:]
y_train = y[:train_size]
y_test = y[train_size:]
# shapeを確認
print('x_train:', x_train.shape)
print('x_test :', x_test.shape)
print('y_train:', y_train.shape)
print('y_test :', y_test.shape)

↑

モデル †

LSTMの定義
- 30ユニットのLSTMの層＋ Denseレイヤ = 1つの値を予測
- LSTM（units, batch_input_shape=batch_size, time_step, input_dim））
  - units: 中間層のノード数（中間層の出力次元数）
  - batch_input_shape: 入力するデータの形状を指定
    ・バッチサイズ
    ・予測のタイムステップ
    ・入力の次元（特徴量（説明系列）の数）
```
model = Sequential()
model.add(LSTM(30, batch_input_shape=(None, 1, 1))) # 中間層が30のLSTM
model.add(Dense(1)) # 回帰なので最後の出力値は１つ
```

コンパイル
回帰の損失関数は誤差二乗和

model.compile(loss='mean_squared_error', optimizer=Adam())

確認
```
model.summary()
```

学習

batch_size = 20
n_epoch = 200
hist = model.fit(x_train,
                 y_train,
                 epochs=n_epoch,
                 validation_data=(x_test, y_test),
                 verbose=0,
                 batch_size=batch_size)

推論
```
y_pred = model.predict(x)
```

出力の正規化を戻す関数

def pred_n_passengers(y_pred, scale, year, month):
    index = ((year - 1949) * 12) + (month - 1) # 1949/1からのデータ
    return y_pred[index] * scale # 正規化した値を元に戻す。

1960年4月の乗客数を予測

year = 1960
month = 4
print("org : ", data[((year - 1949) * 12) + (month - 1)] * scale)
print("pred: ", pred_n_passengers(y_pred, scale, year, month))

評価
回帰なので正答率は出力しない。

実測・予測を表示

plt.plot(data, color='blue')  # 実測値
plt.plot(y_pred, color='red')   # 予測値
plt.show()

学習履歴を表示

def plot_history_loss(hist):
    plt.plot(hist.history['loss'],label="loss for training")
    plt.plot(hist.history['val_loss'],label="loss for validation")
    plt.title('model loss')
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.legend(loc='best')
    plt.show()
plot_history_loss(hist)

↑

... †

↑

参考 †

↑

scikit-learn †

↑

データマイニング（DM）- Python - RNN のバックアップ(No.4)

目次 †

概要 †

詳細 †

時系列予測 †

データ †

モデル †

... †

参考 †

scikit-learn †

TensorFlow・Keras †