@@ -75,6 +75,8 @@ where: $cor_{t_i, t_j} = correltion(predict_{t_i}, predict_{t_j} ) $ is the corr
...
@@ -75,6 +75,8 @@ where: $cor_{t_i, t_j} = correltion(predict_{t_i}, predict_{t_j} ) $ is the corr
\item$measure_3$
\item$measure_3$
\end{itemize}
\end{itemize}
For the experiments, they use breast cancer prognosis. They reduce the size of a forest of 100 trees to a forest of on average 26 trees keeping the same error rate.
For the experiments, they use breast cancer prognosis. They reduce the size of a forest of 100 trees to a forest of on average 26 trees keeping the same error rate.
\item\cite{Fawagreh2015}: The goal is to get a much smaller forest while staying accurate and diverse. To do so, they used a clustering algorithm. Let $C(t_i, T)=\{c_{i1}, \dots, c_{im}\}$ denotes a vector of class labels obtained after having $t_i$ classify the training set $T$ of size $m$, with $t_i \in F$, $F$ the forest of size $n$. Let $\mathcal{C}=\bigcup^n_{i=1} C(t_i, T)$ be the super vector of all class vectors classified by each tree $t_i$. They then applied a clustering algorithm on $\mathcal{C}$ to find $k =\sqrt{\frac{n}{2}}$ clusters. Finally, the final forest $F'$ is composed on the union of each tree that is the most representative per cluster, for each cluster. So if you have 100 trees and 7 clusters, the final number of trees will be 7. They obtained at least similar performances as with regular RF algorithm.