Skip to content
Snippets Groups Projects
user avatar
Benoit Favre authored
c1e0fccf
History
SUMMARIZATION MODULE
====================
This module can generate synopses based on the the conversation you give it. it
is based on learning features thanks to some annotation on previous synopses
applied for a new conversation. This file gives you the needs and what you need
to change if you want to improve or adapt the method to fit you data.

HOW TO
======
See example.py

NEEDS
=====
- Python 2.7
- Icsiboost

FILES & ANNOTATIONS
===================
source/syn.annot
----------------
the annotation of the synopses are like so:
topic conversation_ID Annotator text <a class="instance" variable="$SLOT_NAME" 
style="color:cyan" title="$SLOT_NAME" href="#"> slot_value </a> text.

/!\ if you create new slots_name make sure to update the icsiboost.names /!\

conversations files
-------------------
The tsv format was defined for storing Decoda annotations, each word has a few 
fields:
<filename> <global-wordnum> <wordnum-in-sentence> <word> NULL <postag> NULL NULL 
<dependency-label> <governor-wordnum> <text-id> <lemma> <morphology> <speaker> 
0.0 0.0 0.0 _ <mention> <features> <corefence-label>

predsyn.py
----------
Generate the data you'll need to learn the features.
For each phrases it gives you:
- name of the conversation
- word (text)
- postag
- lemma
- named entity
- parent
- parent pos
- dependency label
- topic
- length
- sentence number
- speaker

summarizer.py
-------------
Generate the final synopsis based on all the previous data.
    exemple:
        import summarizer
        #conv = summarizer.Word()
        summarizer.summarize(conversation, threshold, convID)

    threshold here is a fixed value to start looking results from. Based on our
    experiment it was optimal at 0.02 for the icsiboost system.