Skip to content
Snippets Groups Projects
Select Git revision
  • f60e5525bc402f9466ba639a3db07521a0f1f1aa
  • master default
  • object
  • develop protected
  • private_algos
  • cuisine
  • SMOTE
  • revert-76c4cca5
  • archive protected
  • no_graphviz
  • 0.0.2
  • 0.0.1
12 results

objects.inv

Blame
  • README.md 1.06 KiB

    Compo Text Eval

    Metrics

    These are possible metrics for evaluation of a generated text:

    • Distribution of sentence lengths
    • Distribution of word length
    • Distribution of POS tags
    • Distribution of lexemes
    • Type / token ratio, Corrected TTR (CTTR), Measure of Textual Lexical Diversity (MTLD), D-value (Voc-D)

    CTTR formula:
    \text{CTTR} = \frac{\text{Number of Types}}{\sqrt{2 \times \text{Number of Tokens}}}

    • Lexical redundancy

    Lexical Redundancy formula:
    \text{Lexical Redundancy} = 1 - \frac{\text{Number of Types}}{\text{Number of Tokens}}

    • Frequency of passive voice
    • Frequency of adverbs
    • Percentages of verbs, adjectives, and adverbs
    • Verb / noun ratio
    • Distribution of tree depths
    • Distribution of syntactic functions

    We can also evaluate the prediction of the next token:
    If we know the gold ( w_{i+1} ) based on the sequence ( w_{1}, \dots, w_{i} ), we can evaluate the distribution of n-most probable next tokens predicted by the model and compare the position of the gold token among those possible tokens.