R ランダムフォレストのチュートリアルと例

Rのランダムフォレストとは何ですか?

ランダムフォレストは、「群衆の知恵」というシンプルなアイデアに基づいています。複数の予測子の結果を集約すると、最良の個別の予測子よりも優れた予測が得られます。予測子のグループはと呼ばれます。 アンサンブル。したがって、このテクニックは次のように呼ばれます アンサンブル学習.

前のチュートリアルで、使用方法を学びました 決定木 バイナリ予測を行うため。技術を向上させるために、次のグループをトレーニングできます。 デシジョンツリー分類器、それぞれが列車セットの異なるランダムなサブセット上にあります。予測を行うには、すべての個々のツリーの予測を取得し、最も多くの票を獲得するクラスを予測するだけです。このテクニックはと呼ばれます ランダムフォレスト.

ステップ1）データをインポートする

チュートリアルと同じデータセットがあることを確認するには、決定木、トレインテストとテストセットはインターネット上に保存されます。何も変更せずにインポートできます。

library(dplyr)
data_train <- read.csv("https://raw.githubusercontent.com/guru99-edu/R-Programming/master/train.csv")
glimpse(data_train)
data_test <- read.csv("https://raw.githubusercontent.com/guru99-edu/R-Programming/master/test.csv") 
glimpse(data_test)

ステップ 2) モデルをトレーニングする

モデルのパフォーマンスを評価する XNUMX つの方法は、多数の異なる小さなデータセットでモデルをトレーニングし、それらを他の小さなテストセットに対して評価することです。これを F 分割交差検証 特徴。 R ほぼ同じサイズのデータセットをランダムに分割する機能があります。たとえば、k=9 の場合、モデルは XNUMX つのフォルダーにわたって評価され、残りのテストセットでテストされます。このプロセスは、すべてのサブセットが評価されるまで繰り返されます。この手法は、モデルの選択、特にモデルに調整するパラメーターがある場合に広く使用されています。

モデルを評価する方法がわかったので、データを最適に一般化するパラメーターを選択する方法を理解する必要があります。

ランダムフォレストは、特徴のランダムなサブセットを選択し、多数のデシジョンツリーを構築します。モデルは、意思決定ツリーのすべての予測を平均化します。

ランダムフォレストには、予測の一般化を改善するために変更できるパラメーターがいくつかあります。関数 RandomForest() を使用してモデルをトレーニングします。

Randon Forest の構文は次のとおりです。

RandomForest(formula, ntree=n, mtry=FALSE, maxnodes = NULL)
Arguments:
- Formula: Formula of the fitted model
- ntree: number of trees in the forest
- mtry: Number of candidates draw to feed the algorithm. By default, it is the square of the number of columns.
- maxnodes: Set the maximum amount of terminal nodes in the forest
- importance=TRUE: Whether independent variables importance in the random forest be assessed

注意: ランダムフォレストは、より多くのパラメーターでトレーニングできます。を参照できます。ビネットさまざまなパラメータを確認します。

モデルの調整は非常に面倒な作業です。パラメータ間では多くの組み合わせが可能です。必ずしもすべてを試す時間があるとは限りません。良い代替方法は、マシンに最適な組み合わせを見つけさせることです。利用可能な方法は XNUMX つあります。

ランダム検索
グリッド検索

両方のメソッドを定義しますが、チュートリアルではグリッド検索を使用してモデルをトレーニングします。

グリッド検索の定義

グリッド検索方法はシンプルで、モデルは関数に渡したすべての組み合わせに対して相互検証を使用して評価されます。

たとえば、10、20、30 のツリー数を持つモデルを試したいとすると、各ツリーは 1、2、3、4、5 に等しい mtry 数にわたってテストされます。その後、マシンは 15 の異なるモデルをテストします。

    .mtry ntrees
 1      1     10
 2      2     10
 3      3     10
 4      4     10
 5      5     10
 6      1     20
 7      2     20
 8      3     20
 9      4     20
 10     5     20
 11     1     30
 12     2     30
 13     3     30
 14     4     30
 15     5     30

アルゴリズムは以下を評価します。

RandomForest(formula, ntree=10, mtry=1)
RandomForest(formula, ntree=10, mtry=2)
RandomForest(formula, ntree=10, mtry=3)
RandomForest(formula, ntree=20, mtry=2)
...

毎回、ランダムフォレストは相互検証を実験します。グリッド検索の欠点の XNUMX つは、実験の数です。組み合わせの数が多いと、非常に簡単に爆発する可能性があります。この問題を解決するには、ランダム検索を使用できます。

ランダム検索の定義

ランダム検索とグリッド検索の大きな違いは、ランダム検索では検索空間内のすべてのハイパーパラメータの組み合わせを評価しないことです。代わりに、反復ごとにランダムに組み合わせを選択します。利点は、計算コストが低くなることです。

制御パラメータを設定する

モデルを構築して評価するには、次の手順を実行します。

デフォルト設定でモデルを評価する
最適な mtry 数を見つける
最適な最大ノード数を見つける
最適な ntree 数を見つける
テストデータセットでモデルを評価する

パラメーターの調査を始める前に、XNUMX つのライブラリをインストールする必要があります。

キャレット: R 機械学習ライブラリ。あなたが持っている場合 Rをインストールする r-エッセンシャル付き。すでに図書館にあります
- アナコンダ: conda install -cr r-caret
e1071: R 機械学習ライブラリ。
- アナコンダ: conda インストール -cr r-e1071

RandomForest と一緒にインポートできます。

library(randomForest)
library(caret)
library(e1071)

初期設定

K 分割相互検証は trainControl() 関数によって制御されます

trainControl(method = "cv", number = n, search ="grid")
arguments
- method = "cv": The method used to resample the dataset. 
- number = n: Number of folders to create
- search = "grid": Use the search grid method. For randomized method, use "grid"
Note: You can refer to the vignette to see the other arguments of the function.

デフォルトのパラメーターを使用してモデルを実行して、精度スコアを確認できます。

注意: チュートリアル全体で同じコントロールを使用します。

# Define the control
trControl <- trainControl(method = "cv",
    number = 10,
    search = "grid")

キャレットライブラリを使用してモデルを評価します。ライブラリには、ほぼすべての機能を評価する train() という関数が XNUMX つあります。機械学習アルゴリズム。言い換えれば、この関数を使用して他のアルゴリズムをトレーニングすることができます。

基本的な構文は次のとおりです。

train(formula, df, method = "rf", metric= "Accuracy", trControl = trainControl(), tuneGrid = NULL)
argument
- `formula`: Define the formula of the algorithm
- `method`: Define which model to train. Note, at the end of the tutorial, there is a list of all the models that can be trained
- `metric` = "Accuracy": Define how to select the optimal model
- `trControl = trainControl()`: Define the control parameters
- `tuneGrid = NULL`: Return a data frame with all the possible combination

デフォルト値を使用してモデルを構築してみましょう。

set.seed(1234)
# Run the model
rf_default <- train(survived~.,
    data = data_train,
    method = "rf",
    metric = "Accuracy",
    trControl = trControl)
# Print the results
print(rf_default)

コードの説明

trainControl(method=”cv”,number=10,search=”grid”): 10 個のフォルダーのグリッド検索でモデルを評価します
train(…): ランダムフォレストモデルをトレーニングします。精度を考慮して最適なモデルが選択されます。

出力：

## Random Forest 
## 
## 836 samples
##   7 predictor
##   2 classes: 'No', 'Yes' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 753, 752, 753, 752, 752, 752, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    2    0.7919248  0.5536486
##    6    0.7811245  0.5391611
##   10    0.7572002  0.4939620
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was mtry = 2.

このアルゴリズムでは 500 のツリーを使用し、mtry の 2 つの異なる値 (6、10、XNUMX) をテストしました。

モデルに使用された最終値は mtry = 2 で、精度は 0.78 でした。より高いスコアを目指してみましょう。

ステップ 2) 最適な mtry を検索する

1 ～ 10 の mtry 値を使用してモデルをテストできます。

set.seed(1234)
tuneGrid <- expand.grid(.mtry = c(1: 10))
rf_mtry <- train(survived~.,
    data = data_train,
    method = "rf",
    metric = "Accuracy",
    tuneGrid = tuneGrid,
    trControl = trControl,
    importance = TRUE,
    nodesize = 14,
    ntree = 300)
print(rf_mtry)

コードの説明

tuneGrid <- Expand.grid(.mtry=c(3:10)): 3:10 の値を持つベクトルを構築します

モデルに使用された最終値は mtry = 4 でした。

出力：

## Random Forest 
## 
## 836 samples
##   7 predictor
##   2 classes: 'No', 'Yes' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 753, 752, 753, 752, 752, 752, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    1    0.7572576  0.4647368
##    2    0.7979346  0.5662364
##    3    0.8075158  0.5884815
##    4    0.8110729  0.5970664
##    5    0.8074727  0.5900030
##    6    0.8099111  0.5949342
##    7    0.8050918  0.5866415
##    8    0.8050918  0.5855399
##    9    0.8050631  0.5855035
##   10    0.7978916  0.5707336
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was mtry = 4.

mtry の最適な値は次の場所に保存されます。

rf_mtry$bestTune$mtry

これを保存し、他のパラメータを調整する必要があるときに使用できます。

max(rf_mtry$results$Accuracy)

出力：

## [1] 0.8110729

best_mtry <- rf_mtry$bestTune$mtry 
best_mtry

出力：

## [1] 4

ステップ 3) 最適な maxnode を検索する

maxnodes のさまざまな値を評価するには、ループを作成する必要があります。次のコードでは、次の操作を行います。

リストを作成する
パラメーター mtry の最適な値を使用して変数を作成します。義務
ループを作成する
maxnode の現在の値を保存します
結果を要約する

store_maxnode <- list()
tuneGrid <- expand.grid(.mtry = best_mtry)
for (maxnodes in c(5: 15)) {
    set.seed(1234)
    rf_maxnode <- train(survived~.,
        data = data_train,
        method = "rf",
        metric = "Accuracy",
        tuneGrid = tuneGrid,
        trControl = trControl,
        importance = TRUE,
        nodesize = 14,
        maxnodes = maxnodes,
        ntree = 300)
    current_iteration <- toString(maxnodes)
    store_maxnode[[current_iteration]] <- rf_maxnode
}
results_mtry <- resamples(store_maxnode)
summary(results_mtry)

コードの説明：

store_maxnode <- list(): モデルの結果はこのリストに保存されます
Expand.grid(.mtry=best_mtry): mtry の最適な値を使用します。
for (maxnodes in c(15:25)) { … }: 15 から 25 までの maxnodes の値を使用してモデルを計算します。
maxnodes=maxnodes: 各反復で、maxnodes は maxnodes の現在の値と等しくなります。つまり、15、16、17、…
key <- toString(maxnodes): maxnode の値を文字列変数として保存します。
store_maxnode[[key]] <- rf_maxnode: モデルの結果をリストに保存します。
resamples(store_maxnode): モデルの結果を配置します。
summary(results_mtry): すべての組み合わせの概要を出力します。

出力：

## 
## Call:
## summary.resamples(object = results_mtry)
## 
## Models: 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 
## Number of resamples: 10 
## 
## Accuracy 
##         Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## 5  0.6785714 0.7529762 0.7903758 0.7799771 0.8168388 0.8433735    0
## 6  0.6904762 0.7648810 0.7784710 0.7811962 0.8125000 0.8313253    0
## 7  0.6904762 0.7619048 0.7738095 0.7788009 0.8102410 0.8333333    0
## 8  0.6904762 0.7627295 0.7844234 0.7847820 0.8184524 0.8433735    0
## 9  0.7261905 0.7747418 0.8083764 0.7955250 0.8258749 0.8333333    0
## 10 0.6904762 0.7837780 0.7904475 0.7895869 0.8214286 0.8433735    0
## 11 0.7023810 0.7791523 0.8024240 0.7943775 0.8184524 0.8433735    0
## 12 0.7380952 0.7910929 0.8144005 0.8051205 0.8288511 0.8452381    0
## 13 0.7142857 0.8005952 0.8192771 0.8075158 0.8403614 0.8452381    0
## 14 0.7380952 0.7941050 0.8203528 0.8098967 0.8403614 0.8452381    0
## 15 0.7142857 0.8000215 0.8203528 0.8075301 0.8378873 0.8554217    0
## 
## Kappa 
##         Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## 5  0.3297872 0.4640436 0.5459706 0.5270773 0.6068751 0.6717371    0
## 6  0.3576471 0.4981484 0.5248805 0.5366310 0.6031287 0.6480921    0
## 7  0.3576471 0.4927448 0.5192771 0.5297159 0.5996437 0.6508314    0
## 8  0.3576471 0.4848320 0.5408159 0.5427127 0.6200253 0.6717371    0
## 9  0.4236277 0.5074421 0.5859472 0.5601687 0.6228626 0.6480921    0
## 10 0.3576471 0.5255698 0.5527057 0.5497490 0.6204819 0.6717371    0
## 11 0.3794326 0.5235007 0.5783191 0.5600467 0.6126720 0.6717371    0
## 12 0.4460432 0.5480930 0.5999072 0.5808134 0.6296780 0.6717371    0
## 13 0.4014252 0.5725752 0.6087279 0.5875305 0.6576219 0.6678832    0
## 14 0.4460432 0.5585005 0.6117973 0.5911995 0.6590982 0.6717371    0
## 15 0.4014252 0.5689401 0.6117973 0.5867010 0.6507194 0.6955990    0

maxnode の最後の値の精度が最も高くなります。より高い値を試して、より高いスコアが得られるかどうかを確認してください。

store_maxnode <- list()
tuneGrid <- expand.grid(.mtry = best_mtry)
for (maxnodes in c(20: 30)) {
    set.seed(1234)
    rf_maxnode <- train(survived~.,
        data = data_train,
        method = "rf",
        metric = "Accuracy",
        tuneGrid = tuneGrid,
        trControl = trControl,
        importance = TRUE,
        nodesize = 14,
        maxnodes = maxnodes,
        ntree = 300)
    key <- toString(maxnodes)
    store_maxnode[[key]] <- rf_maxnode
}
results_node <- resamples(store_maxnode)
summary(results_node)

出力：

## 
## Call:
## summary.resamples(object = results_node)
## 
## Models: 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 
## Number of resamples: 10 
## 
## Accuracy 
##         Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## 20 0.7142857 0.7821644 0.8144005 0.8075301 0.8447719 0.8571429    0
## 21 0.7142857 0.8000215 0.8144005 0.8075014 0.8403614 0.8571429    0
## 22 0.7023810 0.7941050 0.8263769 0.8099254 0.8328313 0.8690476    0
## 23 0.7023810 0.7941050 0.8263769 0.8111302 0.8447719 0.8571429    0
## 24 0.7142857 0.7946429 0.8313253 0.8135112 0.8417599 0.8690476    0
## 25 0.7142857 0.7916667 0.8313253 0.8099398 0.8408635 0.8690476    0
## 26 0.7142857 0.7941050 0.8203528 0.8123207 0.8528758 0.8571429    0
## 27 0.7023810 0.8060456 0.8313253 0.8135112 0.8333333 0.8690476    0
## 28 0.7261905 0.7941050 0.8203528 0.8111015 0.8328313 0.8690476    0
## 29 0.7142857 0.7910929 0.8313253 0.8087063 0.8333333 0.8571429    0
## 30 0.6785714 0.7910929 0.8263769 0.8063253 0.8403614 0.8690476    0
## 
## Kappa 
##         Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## 20 0.3956835 0.5316120 0.5961830 0.5854366 0.6661120 0.6955990    0
## 21 0.3956835 0.5699332 0.5960343 0.5853247 0.6590982 0.6919315    0
## 22 0.3735084 0.5560661 0.6221836 0.5914492 0.6422128 0.7189781    0
## 23 0.3735084 0.5594228 0.6228827 0.5939786 0.6657372 0.6955990    0
## 24 0.3956835 0.5600352 0.6337821 0.5992188 0.6604703 0.7189781    0
## 25 0.3956835 0.5530760 0.6354875 0.5912239 0.6554912 0.7189781    0
## 26 0.3956835 0.5589331 0.6136074 0.5969142 0.6822128 0.6955990    0
## 27 0.3735084 0.5852459 0.6368425 0.5998148 0.6426088 0.7189781    0
## 28 0.4290780 0.5589331 0.6154905 0.5946859 0.6356141 0.7189781    0
## 29 0.4070588 0.5534173 0.6337821 0.5901173 0.6423101 0.6919315    0
## 30 0.3297872 0.5534173 0.6202632 0.5843432 0.6590982 0.7189781    0

最高の精度スコアは、maxnode の値が 22 に等しい場合に得られます。

ステップ 4) 最適な ntree を検索する

mtry と maxnode の最適な値が得られたので、ツリーの数を調整できます。やり方はmaxnodeと全く同じです。

store_maxtrees <- list()
for (ntree in c(250, 300, 350, 400, 450, 500, 550, 600, 800, 1000, 2000)) {
    set.seed(5678)
    rf_maxtrees <- train(survived~.,
        data = data_train,
        method = "rf",
        metric = "Accuracy",
        tuneGrid = tuneGrid,
        trControl = trControl,
        importance = TRUE,
        nodesize = 14,
        maxnodes = 24,
        ntree = ntree)
    key <- toString(ntree)
    store_maxtrees[[key]] <- rf_maxtrees
}
results_tree <- resamples(store_maxtrees)
summary(results_tree)

出力：

## 
## Call:
## summary.resamples(object = results_tree)
## 
## Models: 250, 300, 350, 400, 450, 500, 550, 600, 800, 1000, 2000 
## Number of resamples: 10 
## 
## Accuracy 
##           Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## 250  0.7380952 0.7976190 0.8083764 0.8087010 0.8292683 0.8674699    0
## 300  0.7500000 0.7886905 0.8024240 0.8027199 0.8203397 0.8452381    0
## 350  0.7500000 0.7886905 0.8024240 0.8027056 0.8277623 0.8452381    0
## 400  0.7500000 0.7886905 0.8083764 0.8051009 0.8292683 0.8452381    0
## 450  0.7500000 0.7886905 0.8024240 0.8039104 0.8292683 0.8452381    0
## 500  0.7619048 0.7886905 0.8024240 0.8062914 0.8292683 0.8571429    0
## 550  0.7619048 0.7886905 0.8083764 0.8099062 0.8323171 0.8571429    0
## 600  0.7619048 0.7886905 0.8083764 0.8099205 0.8323171 0.8674699    0
## 800  0.7619048 0.7976190 0.8083764 0.8110820 0.8292683 0.8674699    0
## 1000 0.7619048 0.7976190 0.8121510 0.8086723 0.8303571 0.8452381    0
## 2000 0.7619048 0.7886905 0.8121510 0.8086723 0.8333333 0.8452381    0
## 
## Kappa 
##           Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## 250  0.4061697 0.5667400 0.5836013 0.5856103 0.6335363 0.7196807    0
## 300  0.4302326 0.5449376 0.5780349 0.5723307 0.6130767 0.6710843    0
## 350  0.4302326 0.5449376 0.5780349 0.5723185 0.6291592 0.6710843    0
## 400  0.4302326 0.5482030 0.5836013 0.5774782 0.6335363 0.6710843    0
## 450  0.4302326 0.5449376 0.5780349 0.5750587 0.6335363 0.6710843    0
## 500  0.4601542 0.5449376 0.5780349 0.5804340 0.6335363 0.6949153    0
## 550  0.4601542 0.5482030 0.5857118 0.5884507 0.6396872 0.6949153    0
## 600  0.4601542 0.5482030 0.5857118 0.5884374 0.6396872 0.7196807    0
## 800  0.4601542 0.5667400 0.5836013 0.5910088 0.6335363 0.7196807    0
## 1000 0.4601542 0.5667400 0.5961590 0.5857446 0.6343666 0.6678832    0
## 2000 0.4601542 0.5482030 0.5961590 0.5862151 0.6440678 0.6656337    0

最終モデルが完成しました。次のパラメータを使用してランダムフォレストをトレーニングできます。

ntree =800: 800 個のツリーがトレーニングされます
mtry=4: 反復ごとに 4 つの特徴が選択されます
maxnodes = 24: ターミナルノード (葉) の最大 24 ノード

fit_rf <- train(survived~.,
    data_train,
    method = "rf",
    metric = "Accuracy",
    tuneGrid = tuneGrid,
    trControl = trControl,
    importance = TRUE,
    nodesize = 14,
    ntree = 800,
    maxnodes = 24)

ステップ 5) モデルを評価する

ライブラリキャレットには予測を行う機能があります。

predict(model, newdata= df)
argument
- `model`: Define the model evaluated before. 
- `newdata`: Define the dataset to make prediction

prediction <-predict(fit_rf, data_test)

予測を使用して混同行列を計算し、精度スコアを確認できます。

confusionMatrix(prediction, data_test$survived)

出力：

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  110  32
##        Yes  11  56
##                                          
##                Accuracy : 0.7943         
##                  95% CI : (0.733, 0.8469)
##     No Information Rate : 0.5789         
##     P-Value [Acc > NIR] : 3.959e-11      
##                                          
##                   Kappa : 0.5638         
##  Mcnemar's Test P-Value : 0.002289       
##                                          
##             Sensitivity : 0.9091         
##             Specificity : 0.6364         
##          Pos Pred Value : 0.7746         
##          Neg Pred Value : 0.8358         
##              Prevalence : 0.5789         
##          Detection Rate : 0.5263         
##    Detection Prevalence : 0.6794         
##       Balanced Accuracy : 0.7727         
##                                          
##        'Positive' Class : No             
##

精度は 0.7943 パーセントで、デフォルト値よりも高くなります。

ステップ 6) 結果を視覚化する

最後に、関数 varImp() を使用して特徴の重要性を確認できます。最も重要な特徴は性別と年齢であるようです。これは驚くべきことではありません。なぜなら、重要な特徴は木の根元近くに現れる可能性が高く、一方、それほど重要ではない特徴は葉の近くに現れることが多いからです。

varImpPlot(fit_rf)

出力：

varImp(fit_rf)
## rf variable importance
## 
##              Importance
## sexmale         100.000
## age              28.014
## pclassMiddle     27.016
## fare             21.557
## pclassUpper      16.324
## sibsp            11.246
## parch             5.522
## embarkedC         4.908
## embarkedQ         1.420
## embarkedS         0.000

製品概要

ランダムフォレストのトレーニングと評価の方法を次の表にまとめます。

ライブラリ	DevOps Tools Engineer試験のObjective	演算
ランダムフォレスト	ランダムフォレストを作成する	ランダムフォレスト()	式、ntree=n、mtry=FALSE、maxnodes = NULL
キャレット	K フォルダーの相互検証を作成する	trainControl()	メソッド = “cv”、番号 = n、検索 = “グリッド”
キャレット	ランダムフォレストを訓練する	列車（）	式、df、メソッド = "rf"、メトリック = "精度"、trControl = trainControl()、tuneGrid = NULL
キャレット	サンプルから予測する	予測する	モデル、新しいデータ = df
キャレット	混同行列と統計	混乱マトリックス()	モデル、y 検定
キャレット	変数の重要性	cvarImp()

付録

キャレットで使用されるモデルの一覧

names>(getModelInfo())

出力：

##   [1] "ada"                 "AdaBag"              "AdaBoost.M1"        ##   [4] "adaboost"            "amdai"               "ANFIS"              ##   [7] "avNNet"              "awnb"                "awtan"              ##  [10] "bag"                 "bagEarth"            "bagEarthGCV"        ##  [13] "bagFDA"              "bagFDAGCV"           "bam"                ##  [16] "bartMachine"         "bayesglm"            "binda"              ##  [19] "blackboost"          "blasso"              "blassoAveraged"     ##  [22] "bridge"              "brnn"                "BstLm"              ##  [25] "bstSm"               "bstTree"             "C5.0"               ##  [28] "C5.0Cost"            "C5.0Rules"           "C5.0Tree"           ##  [31] "cforest"             "chaid"               "CSimca"             ##  [34] "ctree"               "ctree2"              "cubist"             ##  [37] "dda"                 "deepboost"           "DENFIS"             ##  [40] "dnn"                 "dwdLinear"           "dwdPoly"            ##  [43] "dwdRadial"           "earth"               "elm"                ##  [46] "enet"                "evtree"              "extraTrees"         ##  [49] "fda"                 "FH.GBML"             "FIR.DM"             ##  [52] "foba"                "FRBCS.CHI"           "FRBCS.W"            ##  [55] "FS.HGD"              "gam"                 "gamboost"           ##  [58] "gamLoess"            "gamSpline"           "gaussprLinear"      ##  [61] "gaussprPoly"         "gaussprRadial"       "gbm_h3o"            ##  [64] "gbm"                 "gcvEarth"            "GFS.FR.MOGUL"       ##  [67] "GFS.GCCL"            "GFS.LT.RS"           "GFS.THRIFT"         ##  [70] "glm.nb"              "glm"                 "glmboost"           ##  [73] "glmnet_h3o"          "glmnet"              "glmStepAIC"         ##  [76] "gpls"                "hda"                 "hdda"               ##  [79] "hdrda"               "HYFIS"               "icr"                ##  [82] "J48"                 "JRip"                "kernelpls"          ##  [85] "kknn"                "knn"                 "krlsPoly"           ##  [88] "krlsRadial"          "lars"                "lars2"              ##  [91] "lasso"               "lda"                 "lda2"               ##  [94] "leapBackward"        "leapForward"         "leapSeq"            ##  [97] "Linda"               "lm"                  "lmStepAIC"          ## [100] "LMT"                 "loclda"              "logicBag"           ## [103] "LogitBoost"          "logreg"              "lssvmLinear"        ## [106] "lssvmPoly"           "lssvmRadial"         "lvq"                ## [109] "M5"                  "M5Rules"             "manb"               ## [112] "mda"                 "Mlda"                "mlp"                ## [115] "mlpKerasDecay"       "mlpKerasDecayCost"   "mlpKerasDropout"    ## [118] "mlpKerasDropoutCost" "mlpML"               "mlpSGD"             ## [121] "mlpWeightDecay"      "mlpWeightDecayML"    "monmlp"             ## [124] "msaenet"             "multinom"            "mxnet"              ## [127] "mxnetAdam"           "naive_bayes"         "nb"                 ## [130] "nbDiscrete"          "nbSearch"            "neuralnet"          ## [133] "nnet"                "nnls"                "nodeHarvest"        ## [136] "null"                "OneR"                "ordinalNet"         ## [139] "ORFlog"              "ORFpls"              "ORFridge"           ## [142] "ORFsvm"              "ownn"                "pam"                ## [145] "parRF"               "PART"                "partDSA"            ## [148] "pcaNNet"             "pcr"                 "pda"                ## [151] "pda2"                "penalized"           "PenalizedLDA"       ## [154] "plr"                 "pls"                 "plsRglm"            ## [157] "polr"                "ppr"                 "PRIM"               ## [160] "protoclass"          "pythonKnnReg"        "qda"                ## [163] "QdaCov"              "qrf"                 "qrnn"               ## [166] "randomGLM"           "ranger"              "rbf"                ## [169] "rbfDDA"              "Rborist"             "rda"                ## [172] "regLogistic"         "relaxo"              "rf"                 ## [175] "rFerns"              "RFlda"               "rfRules"            ## [178] "ridge"               "rlda"                "rlm"                ## [181] "rmda"                "rocc"                "rotationForest"     ## [184] "rotationForestCp"    "rpart"               "rpart1SE"           ## [187] "rpart2"              "rpartCost"           "rpartScore"         ## [190] "rqlasso"             "rqnc"                "RRF"                ## [193] "RRFglobal"           "rrlda"               "RSimca"             ## [196] "rvmLinear"           "rvmPoly"             "rvmRadial"          ## [199] "SBC"                 "sda"                 "sdwd"               ## [202] "simpls"              "SLAVE"               "slda"               ## [205] "smda"                "snn"                 "sparseLDA"          ## [208] "spikeslab"           "spls"                "stepLDA"            ## [211] "stepQDA"             "superpc"             "svmBoundrangeString"## [214] "svmExpoString"       "svmLinear"           "svmLinear2"         ## [217] "svmLinear3"          "svmLinearWeights"    "svmLinearWeights2"  ## [220] "svmPoly"             "svmRadial"           "svmRadialCost"      ## [223] "svmRadialSigma"      "svmRadialWeights"    "svmSpectrumString"  ## [226] "tan"                 "tanSearch"           "treebag"            ## [229] "vbmpRadial"          "vglmAdjCat"          "vglmContRatio"      ## [232] "vglmCumulative"      "widekernelpls"       "WM"                 ## [235] "wsrf"                "xgbLinear"           "xgbTree"            ## [238] "xyf"

Rのランダムフォレストとは何ですか?

ステップ1） データをインポートする

ステップ 2) モデルをトレーニングする

グリッド検索の定義

関連記事

ランダム検索の定義

制御パラメータを設定する

初期設定

ステップ 3) 最適な maxnode を検索する

ステップ 4) 最適な ntree を検索する

ステップ 5) モデルを評価する

ステップ 6) 結果を視覚化する

製品概要

付録

ニュースレターに登録する

ステップ1）データをインポートする