Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
SUMMARIZATION MODULE
====================
This module can generate synopses based on the the conversation you give it. it
is based on learning features thanks to some annotation on previous synopses
applied for a new conversation. This file gives you the needs and what you need
to change if you want to improve or adapt the method to fit you data.
HOW TO
======
See example.py
NEEDS
=====
- Python 2.7
- Icsiboost
FILES & ANNOTATIONS
===================
source/syn.annot
----------------
the annotation of the synopses are like so:
topic conversation_ID Annotator text <a class="instance" variable="$SLOT_NAME"
style="color:cyan" title="$SLOT_NAME" href="#"> slot_value </a> text.
/!\ if you create new slots_name make sure to update the icsiboost.names /!\
conversations files
-------------------
The tsv format was defined for storing Decoda annotations, each word has a few
fields:
<filename> <global-wordnum> <wordnum-in-sentence> <word> NULL <postag> NULL NULL
<dependency-label> <governor-wordnum> <text-id> <lemma> <morphology> <speaker>
0.0 0.0 0.0 _ <mention> <features> <corefence-label>
predsyn.py
----------
Generate the data you'll need to learn the features.
For each phrases it gives you:
- name of the conversation
- word (text)
- postag
- lemma
- named entity
- parent
- parent pos
- dependency label
- topic
- length
- sentence number
- speaker
summarizer.py
-------------
Generate the final synopsis based on all the previous data.
exemple:
import summarizer
#conv = summarizer.Word()
summarizer.summarize(conversation, threshold, convID)
threshold here is a fixed value to start looking results from. Based on our
experiment it was optimal at 0.02 for the icsiboost system.