Skip to content
Snippets Groups Projects
Commit 7d764f9c authored by farah.cherfaoui's avatar farah.cherfaoui
Browse files

ajout def arbre, et correction d'une partie des remarques de Luc

parent 20c640a2
No related branches found
No related tags found
1 merge request!7Farah notation and related work
......@@ -34,52 +34,55 @@
introduire le pb et les motivation ...
\subsection{Notation}
Let $ X \in \mathbb{R}^{n \times d}$ \todo{est-ce-que le non-gras majuscule est fréquent pour les matrices?} be the matrix data and $Y \in \mathbb{R}^{n}$ be the labels vector associated to the matrix $X$, where for each $i$, $\textbf{x}_i \in \mathcal{X} \subseteq \mathbb{R}^{d}$ and $y_i \in {\mathcal Y} \subseteq \mathbb{R}$. \\
Let $ \textbf{X} \in \mathbb{R}^{n \times d}$ \todo{est-ce-que le non-gras majuscule est fréquent pour les matrices?} be the matrix data and $\textbf{y} \in \mathbb{R}^{n}$ be the labels vector associated to the matrix $\textbf{X}$, where for each $i$, $y_i \in {\mathcal Y}$. \\
A random forest $F_{t_1, \dots, t_l}$ \todo[inline]{confusion possible notation: majuscule non gras: fonction ou matrice? les deux, à éclaircir} is a classifier made of a collection of $l$ trees ${t_1, \dots, t_l}$. This forest can be seen as a function, and noted as:
A random forest $F_{T_1, \dots, T_l}$ is a classifier made of a collection of $l$ trees ${T_1, \dots, T_l}$. A single tree and a forest can both be seen as functions. To define this tools, let us introduce ${\cal H}$ as the set of all possible trees:
\todo[inline]{confusion possible notation: majuscule non gras: fonction ou matrice? les deux, à éclaircir}
%
$$ {\cal H} = \{T \ | \ T :\ \mathbb{R}^{n \times d} \ \to \ \cal{Y}\}.$$
%
In this case, a forest can be noted as:
%
$$\begin{array}{ccccc}
F_{t_1, \dots, t_l} & : & \cal{X} & \to & \cal{Y} \\
& & \textbf{x} & \mapsto & F_{t_1, \dots, t_l}(\textbf{x}) = f(\{t_1, \dots, t_l\} , \textbf{x}) \\
F_{T_1, \dots, T_l} & : & \cal{X} & \to & \cal{Y} \\
& & \textbf{x} & \mapsto & F_{T_1, \dots, T_l}(\textbf{x}) = H(\{T_1, \dots, T_l\} , \textbf{x}) \\
\end{array}$$
%
where $f$ is a function which depend on the task\todo{f unclear: why to introduce it?}. In a regression setup, where ${\cal Y} = \mathbb{R}$\todo{I don't think it is usefull}, this function can be defined as:
where $H$ is a function which depend on the task\todo{f unclear: why to introduce it?}. In a regression setup, where ${\cal Y} = \mathbb{R}$\todo{I don't think it is usefull}, this function can be defined as:
%
$$f(\{t_1, \dots, t_l \} , \textbf{x}) = \sum_{i = 1}^{l} \alpha_i t_i(x) \ \text{ where } \alpha_i \in \mathbb{R},$$
$$H(\{T_1, \dots, T_l \} , \textbf{x}) = \sum_{i = 1}^{l} \alpha_i T_i(x) \ \text{ where } \alpha_i \in \mathbb{R},$$
%
while in a classification setup, in which ${\cal Y} = \{ c_1, \dots, c_m \}$, $f$ will be a majority vote function:
while in a classification setup, in which ${\cal Y} = \{ c_1, \dots, c_m \}$, $H$ will be a majority vote function:
%
$$f(\{t_1, \dots, t_l \} , \textbf{x}) = \argmax_{c \in {\cal Y}} \sum_{i = 1}^{l} \mathds{1}(t_i(\textbf{x}) = c).$$
$$H(\{T_1, \dots, T_l \} , \textbf{x}) = \argmax_{c \in {\cal Y}} \sum_{i = 1}^{l} \mathds{1}(T_i(\textbf{x}) = c),$$
where $\mathds{1}$ is the indicator function which return $1$ if its argument is correct, and $0$ otherwise.
%
\todo{$\mathds{1}$ not defined}We \todo{no we}will need to define the vector prediction of a forest for all the data matrix: $F_{t_1, \dots, t_l}(X) = \begin{pmatrix}
F_{t_1, \dots, t_l}(x_1) \\
\todo{$\mathds{1}$ not defined}We \todo{no we}will need to define the vector prediction of a forest for all the data matrix: $F_{t_1, \dots, t_l}(\textbf{X}) = \begin{pmatrix}
F_{T_1, \dots, T_l}(\textbf{x}_1) \\
\dots \\
F_{t_1, \dots, t_l}(x_n)
F_{T_1, \dots, T_l}(\textbf{x}_n)
\end{pmatrix}.$\\
%
%
%
All these notations can be summarized in the following table:\\
All these notations can be summarized in Table \ref{table: notation}:\\
\begin{table}
\begin{tabular}{ l c }
%\hline
\textbf{x} & the vector x \\
$k$ & the desired (pruned) forest size \\
$X$ & the matrix $X$ \\
${\cal X}$ & the data representation space \\
${\cal Y}$ & the label representation space \\
lowercase & integer \\
bold lowercase& vector \\
bold capital & matrix \\
calligraphic letters & vector space \\
$F_{T_1, \dots, T_l}$ & a forest of $l$ trees \\
$F_{T_1, \dots, T_l}(\textbf{x}) \in {\cal Y}$ & the predicted label of \textbf{x} by the forest $F_{T_1, \dots, T_l}$ \\
$F_{T_1, \dots, T_l}(\textbf{X}) \in {\cal Y}^n$ & the predicted label of all the data of $\textbf{X}$ by the forest $F_{T_1, \dots, T_l}$\\
$n$ & the number of data \\
$d$ & the data dimension \\
$l$ & the forest size \\
$F_{t_1, \dots, t_l}$ & a forest of $l$ trees \\
$F_{t_1, \dots, t_l}(\textbf{x}) \in {\cal Y}$ & the predicted label of \textbf{x} by the forest $F_{t_1, \dots, t_l}$ \\
$F_{t_1, \dots, t_l}(X) \in {\cal Y}^n$ & the predicted label of all the data of $X$ by the forest $F_{t_1, \dots, t_l}$\\
%\hline
$l$ & the initial forest size \\
$k$ & the desired (pruned) forest size \\
\end{tabular}
\caption{Notations}
\caption{Notations used in this paper}
\label{table: notation}
\end{table}\todo[inline]{ajouter les codifications des notations: bold minuscule: vecteur; non-bold majuscule: matrix, etc..}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment