From c1e0fccf246fa73f8e756b4cda7afa0c3b01366a Mon Sep 17 00:00:00 2001 From: Benoit Favre <benoit.favre@lif.univ-mrs.fr> Date: Wed, 30 Nov 2016 09:39:54 +0100 Subject: [PATCH] add README --- README | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 README diff --git a/README b/README new file mode 100644 index 0000000..3917d3b --- /dev/null +++ b/README @@ -0,0 +1,61 @@ +SUMMARIZATION MODULE +==================== +This module can generate synopses based on the the conversation you give it. it +is based on learning features thanks to some annotation on previous synopses +applied for a new conversation. This file gives you the needs and what you need +to change if you want to improve or adapt the method to fit you data. + +HOW TO +====== +See example.py + +NEEDS +===== +- Python 2.7 +- Icsiboost + +FILES & ANNOTATIONS +=================== +source/syn.annot +---------------- +the annotation of the synopses are like so: +topic conversation_ID Annotator text <a class="instance" variable="$SLOT_NAME" +style="color:cyan" title="$SLOT_NAME" href="#"> slot_value </a> text. + +/!\ if you create new slots_name make sure to update the icsiboost.names /!\ + +conversations files +------------------- +The tsv format was defined for storing Decoda annotations, each word has a few +fields: +<filename> <global-wordnum> <wordnum-in-sentence> <word> NULL <postag> NULL NULL +<dependency-label> <governor-wordnum> <text-id> <lemma> <morphology> <speaker> +0.0 0.0 0.0 _ <mention> <features> <corefence-label> + +predsyn.py +---------- +Generate the data you'll need to learn the features. +For each phrases it gives you: +- name of the conversation +- word (text) +- postag +- lemma +- named entity +- parent +- parent pos +- dependency label +- topic +- length +- sentence number +- speaker + +summarizer.py +------------- +Generate the final synopsis based on all the previous data. + exemple: + import summarizer + #conv = summarizer.Word() + summarizer.summarize(conversation, threshold, convID) + + threshold here is a fixed value to start looking results from. Based on our + experiment it was optimal at 0.02 for the icsiboost system. -- GitLab