utils.annotation_scripts package¶

Submodules¶

utils.annotation_scripts.annotation_CRF module¶

annotation_CRF.py. Generation of synthetic annotations for wapiti CRF classifier.

utils.annotation_scripts.annotation_CRF.annotate_CRF_test(n, test, temp_folder)¶

Returns a wapiti-formatted version of the test entities for the CRF classifier.

Args:

n (int): step identifier.
test (list): test entities.
temp_folder: directory for temporary files.

Returns:

n_sentences_test (int): number of sequences in the training database.
test_entities_indices (list): indices and identifiers of the entities of interest in the test database.
test_file (str): path to the formatted test data.

utils.annotation_scripts.annotation_CRF.annotate_CRF_train(n, train, distrib, n_labels, to_annotate, temp_folder, preclustering)¶

Returns a synthetic annotation of the data for the (wapiti) CRF classifier.

Args:

n (int): iteration identifier.
train (list): training entities.
distrib (str): type of the synthetic annotation.
n_min (int): minimum number of synthetic labels to use.
n_max (int): maximum number of synthetic labels to use.
to_annotate (list): in case of UNI annotation, list of indices of the entities to have their own class.
with_common_label_wordform (bool, optional): if True, each entity occurence wordform receives the same label. Defaults to False.
temp_folder: path to the directory for temporary files.

Returns:

n_unique_labels_used (int): number of synthetic labels that were actually used.
n_sentences_train (int): number of sequences in the training database.
n_entities_train (int): number of entities in the training database.
train_file (str): path to the formatted train data.

utils.annotation_scripts.annotation_DT module¶

annotation_DT.py. Generation of synthetic annotations for weka Decision Tree J48 classifiers.

utils.annotation_scripts.annotation_DT.annotate_DT_test(n, test, features, fake_class)¶

Returns a weka-formatted version of the test entities for the DT classifier.

Args:

n (int): step identifier.
test (list): test entities.
features (list): pattern for the feature selection.
fake_class: fake weka class to give to all test entities for the weka format.

Returns:

n_sentences_test (int): number of sequences in the training database.
test_entities (list): indices and identifiers of the entities of interest in the test database.
test_file (str): path to the formatted test data.

utils.annotation_scripts.annotation_DT.annotate_DT_train(n, train, distrib, n_labels, features, to_annotate, temp_folder, preclustering)¶

Returns a synthetic annotation of the data (train + test) for the (weka) DT classifier.

Args:

n (int): step identifier.
train (list): training entities.
distrib (str): type of the synthetic annotation.
n_min (int): minimum number of synthetic labels to use.
n_max (int): maximum number of synthetic labels to use.
features (list): pattern for the feature selection.
to_annotate (list): in case of UNI annotation, list of indices of the entities to have their own class.
with_common_label_wordform (bool, optional): if True, each entity occurence wordform receives the same label. Defaults to False.
temp_folder: directory for temporary files.

Returns:

N (int): random max number of synthetic labels for this step.
n_unique_labels_used (int): number of synthetic labels that were actually used.
n_sentences_train (int): number of sequences in the training database.
n_entities_train (int): number of entities in the training database.
train_file (str): path to the formatted train data.

utils.annotation_scripts.annotation_DT.weka_compatible_string(s)¶

utils.annotation_scripts.annotation_DT.weka_format(n, wekadata, train_length, test_length, temp_folder, verbose)¶

Formats the inout weka file to be compatible with J48 tree and splits it as a train and test file

Args:

n (int): iteration identifier.
wekadata (str): input weka file.
train_length (int): number of attributes in train.
test_length (int): number of attributes in test.
temp_folder (str): path to temporary folder.
verbose (int): verbosity level.

utils.annotation_scripts.annotation_HTK module¶

annotation_HTK.py. Generation of synthetic annotations for HTK HMM classifiers.

utils.annotation_scripts.annotation_HTK.annotate_HTK_test(n, test)¶

Returns a HTK-formatted version of the test entities for the HTK classifier.

Args:

n (int): step identifier.
test (list): test entities.
features (list): pattern for the feature selection.
fake_class: fake weka class to give to all test entities for the weka format.

Returns:

n_sentences_test (int): number of sequences in the training database.
test_entities (list): indices and identifiers of the entities of interest in the test database.
test_file (str): path to the formatted test data.

utils.annotation_scripts.annotation_HTK.annotate_HTK_train(n, train, distrib, n_labels, temp_folder, preclustering)¶

Returns a synthetic annotation of the data (train + test) for the (HTK) HMM classifier.

Args:

n (int): step identifier.
train (list): training entities.
distrib (str): type of the synthetic annotation.
n_labels (int): random max number of synthetic labels for this step.

Returns:

n_unique_labels_used (int): number of synthetic labels that were actually used.
n_entities_train (int): number of entities in the training database.
train_file (str): path to the formatted train data.
mlf (str): HTK master label file.

utils.annotation_scripts package¶

Submodules¶

utils.annotation_scripts.annotation_CRF module¶

utils.annotation_scripts.annotation_DT module¶

utils.annotation_scripts.annotation_HTK module¶

Module contents¶

Table Of Contents

Previous topic

Next topic

This Page