evaluation_clustering module¶

evaluation_clustering.py. Main script for running and evaluating result of the clustering process.

Usage:
python evaluation_clustering.py [1] -i [2] -p [3] -t [4] -cfg [5] --mcl --help
where:

[1] : input similarity matrix (unnormalized similarities or pre-treated MCL format). The script expects a ‘exp_configuration.ini’ file in the same folder, usually generated when using main.py.

[2] -i: MCL inflation parameter. Defaults to 1.4.

[3] -p: MCL pre-inflation parameter. Defaults to 1.0.

[4] -t: number of cores to use for MCL.

[5] -cfg: provide a custom configuration file to replace ‘exp_configuration.ini’.

-m, --mcl: if present, the script expects an input matrix in MCL label format.

-h, --help

This outputs the results of the MCL clustering with the given inflation and pre-inflation parameters.

evaluation_clustering.cluster(co_occ, output_folder, index_to_label, cores, task_params, **kwargs)¶

Returns the clustering obtained after applying the chosen algorithm on the co-occurence matrix.

Args:

co_occ (ndarray): Co-occurence matrix.
output_folder (str): path to the output folder.
index_to_label (list): list mapping an index to the corresponding named entity.
cores (int): Number of cores to use for the clustering algorithm (if threading option available).
task_params (dict): additional clustering algorithms.
formated (bool, optional): if True the co-occurence matrix is expected to be already formatted for MCL input.
verbose (int, optional): controls verbosity level.

Returns:

clustering (list): Resulting clustering (as a list mapping a sample’s index to the index of its cluster).
n_clusters (int): number of retrieved clusters.
step_id (%str*): step identifier, optionnal, for output
summary (str): string representation of the execution (for displaying purpose).

evaluation_clustering.evaluate(co_occ, output_folder, temp_folder, ground_truth, index_to_label, cores, task_params, **kwargs)¶

Evaluate a clustering method given a similarity matrix and various clustering parameters.

Args:

co_occ (ndarray): co-occurence matrix.
output_folder (str): path to the output folder.
temp_folder (str): path to temporary folder.
ground_truth (dict): ground truth clustering to compare against.
index_to_label (list): list mapping an index to the corresponding named entity. used to generate a readable clustering.
cores (int): number of cores to use.
task_params (list): additional clustering parameters.
formated (bool, optional): if True the co-occurence matrix is expected to be already formatted for MCL input.

evaluation_clustering module¶

Previous topic

Next topic

This Page