Skip to content
Snippets Groups Projects
Commit 3296bc2f authored by Franck Dary's avatar Franck Dary
Browse files

Adding unknown form & lemmas to dictionnaries when decoding, this helps a lot...

Adding unknown form & lemmas to dictionnaries when decoding, this helps a lot with lemma prediciton and word embeddings
parent 75386f69
No related branches found
No related tags found
No related merge requests found
......@@ -97,7 +97,16 @@ int mcd_get_code(mcd *m, char *str, int col){
if(!dic)
return -1;
// This ensure that we add unknown forms and lemmas to dicos when decoding
// It is mandatory to do so to have access to word embeddings for words never seen in training
// and also to predict lemma based on form (rule based lemma prediction) even if the form
// was never seen in training.
bool dico_current_add_policy = dico_get_add_unknown_strings();
if(m->wf[col] == MCD_WF_FORM || m->wf[col] == MCD_WF_LEMMA)
dico_set_add_unknown_strings();
int code = dico_string2int(dic, str);
if(!dico_current_add_policy)
dico_unset_add_unknown_strings();
return code;
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment