utils.classification_scripts package

Submodules

utils.classification_scripts.basic_hmm module

basic_hmm.py. Generates a basic initial HMM for HTK training.

utils.classification_scripts.basic_hmm.generate_basic_hmm(features_type, components, name, output_folder, n_state=12, hmm_type=1)

On-the-fly generation of an initial HMM.

Args:
  • feature_type (str): HTK features target kind.
  • components (int): number of base components.
  • name (str) : HMM/model name.
  • output_folder (str): folder where to output the HMM.
  • n_state (int, optional): number of emitting states in the HMM (not counting initial and final states). Defaults to 12.
  • hmm_type (int, optional): determines the HMM topology to use (1: basic left/right; 2: left/right1/right2). Defaults to 1

utils.classification_scripts.classification_CRF module

classify_CRF.py. For training and applying a wapiti CRF.

utils.classification_scripts.classification_CRF.label_CRF(model, test, test_entities_indices, classification_params, verbose=1)

Labels a testing set using a wapiti CRF classifier with wapiti and returns the resulting entities.

Args:
  • model: model built from training the classifier
  • test: formatted testing set.
  • test_entities_indices (list): location of the interesting entities in the test dataset.
  • classification_params (dict): additional classification parameters.
  • verbose (int, optional): controls verbosity level. Defaults to 1.
Returns:
  • result_iter: a generator expression on the result
utils.classification_scripts.classification_CRF.train_CRF(n, train, temp_folder, classification_params, verbose=1, debug=False)

Trains a CRF classifier with wapiti and returns the resulting model.

Args:
  • n (int): step number.
  • train: annotated training set (structure may depend on the classifier).
  • temp_folder (str): path to the directory for storing temporary files.
  • classification_params (dict): additional classification parameters.
  • verbose (int, optional): controls verbosity level.
  • clean (bool, optional): if False, removes the temporary files that were created.
Returns:
  • model: model built by the classifier from the given training set.

utils.classification_scripts.classification_DT module

classify.py. For training and applying a weka decision tree classifier on the artifically annotated data set.

utils.classification_scripts.classification_DT.label_DT(model, test, test_entities_indices, verbose=1)

Labels a testing set using a weka decision tree and returns the resulting entities.

Args:
  • model: model built from training the classifier
  • test: formatted testing set.
  • test_entities_indices (list): location of the interesting entities in the test dataset.
  • verbose (int, optional): controls verbosity level. Defaults to 1.
Returns:
  • result_iter: a generator expression on the result
utils.classification_scripts.classification_DT.train_DT(n, train, temp_folder, classification_params, verbose=1)

Trains a decision tree classifier with weka and returns the resulting model.

Args:
  • n (int): step number.
  • train: annotated training set (structure may depend on the classifier).
  • temp_folder (str): path to the directory for storing temporary files.
  • classification_params (list): additional classification parameters.
  • verbose (int, optional): controls verbosity level.
Returns:
  • model: model built by the classifier from the given training set.

utils.classification_scripts.classification_HTK module

classify.py. For training and applying a classifier on the artifically annotated data set.

utils.classification_scripts.classification_HTK.label_HTK(n, hmmdef_file, hmmlist_file, wnet_file, dic_file, test, test_entities_indices, temp_folder, verbose=1, debug=False)

Labels a testing set using a HTK HMM classifier and returns the resulting entities.

Args:
  • model: model built from training the classifier
  • test: formatted testing set.
  • test_entities_indices (list): location of the interesting entities in the test dataset.
  • verbose (int, optional): controls verbosity level. Defaults to 1.
  • debug (bool, optional): if True, some outputs are kept in the temporary directory.
Returns:
  • result_iter: a generator expression on the result
utils.classification_scripts.classification_HTK.train_HTK(n, train, temp_folder, classification_params, verbose=1, debug=False)

Trains a HMM classifier with HTK and returns the resulting model.

Args:
  • n (int): step number.
  • train: annotated training set (structure may depend on the classifier).
  • temp_folder (str): path to the directory for storing temporary files.
  • classification_params (list): additional classification parameters.
  • verbose (int, optional): controls verbosity level. Defaults to 1.
  • debug (bool, optional): if True, some outputs are kept in the temporary directory.
Returns:
  • hmmdef: HMMs master file.
  • hmmlist: list of HMMs in the model.

Module contents