Similar to `to_int_from_vocab` above, but applies to a single `sent`
represented as a `TokenList`. Extra possibility to `lowercase` sentence
elements before looking them up in `vocab`.
"""
int_list={}
forcol_nameincol_names:
unk_tok_id=vocab[col_name].get(unk_token,None)
...
...
@@ -239,6 +292,13 @@ class CoNLLUReader(object):
belonging to the same NE get the same int + first gets ":category" suffix).
The output has category appended to 'B' and 'I' tags. The `bio_style` can
be 'bio' or 'io', the latter has only 'I-category' tags, no 'B's.
Example:
>>>test=\"\"\"# global.columns = ID FORM parseme:ne\n1\tLe\t1:PROD\n2\tPetit\t1\n3\tPrince\t1\n4\tde\t*\n5\tSaint-Exupéry\t2:PERS\n6\test\t*\n7\tentré\t*\n8\tà\t*\n9\tl'\t*\n10\tÉcole\t3:ORG\n11\tJules-Romains\t3\"\"\"