Select Git revision
Tatiana BLADIER authored
README.md 1.06 KiB
Compo Text Eval
Metrics
These are possible metrics for evaluation of a generated text:
- Distribution of sentence lengths
- Distribution of word length
- Distribution of POS tags
- Distribution of lexemes
- Type / token ratio, Corrected TTR (CTTR), Measure of Textual Lexical Diversity (MTLD), D-value (Voc-D)
CTTR formula:
- Lexical redundancy
Lexical Redundancy formula:
- Frequency of passive voice
- Frequency of adverbs
- Percentages of verbs, adjectives, and adverbs
- Verb / noun ratio
- Distribution of tree depths
- Distribution of syntactic functions
We can also evaluate the prediction of the next token:
If we know the gold ( w_{i+1} ) based on the sequence ( w_{1}, \dots, w_{i} ), we can evaluate the distribution of n-most probable next tokens predicted by the model and compare the position of the gold token among those possible tokens.