evaluation_clustering module¶
evaluation_clustering.py. Main script for running and evaluating result of the clustering process.
Usage:
python evaluation_clustering.py [1] -i [2] -p [3] -t [4] -cfg [5] --mcl --helpwhere:
- [1] : input similarity matrix (unnormalized similarities or pre-treated MCL format). The script expects a ‘exp_configuration.ini’ file in the same folder, usually generated when using
main.py.- [2]
-i: MCL inflation parameter. Defaults to 1.4.- [3]
-p: MCL pre-inflation parameter. Defaults to 1.0.- [4]
-t: number of cores to use for MCL.- [5]
-cfg: provide a custom configuration file to replace ‘exp_configuration.ini’.-m, --mcl: if present, the script expects an input matrix in MCL label format.-h, --helpThis outputs the results of the MCL clustering with the given inflation and pre-inflation parameters.
-
evaluation_clustering.cluster(co_occ, output_folder, index_to_label, cores, task_params, **kwargs)¶ Returns the clustering obtained after applying the chosen algorithm on the co-occurence matrix.
- Args:
co_occ(ndarray): Co-occurence matrix.output_folder(str): path to the output folder.index_to_label(list): list mapping an index to the corresponding named entity.cores(int): Number of cores to use for the clustering algorithm (if threading option available).task_params(dict): additional clustering algorithms.formated(bool, optional): ifTruethe co-occurence matrix is expected to be already formatted for MCL input.verbose(int, optional): controls verbosity level.
- Returns:
clustering(list): Resulting clustering (as a list mapping a sample’s index to the index of its cluster).n_clusters(int): number of retrieved clusters.step_id(%str*): step identifier, optionnal, for outputsummary(str): string representation of the execution (for displaying purpose).
-
evaluation_clustering.evaluate(co_occ, output_folder, temp_folder, ground_truth, index_to_label, cores, task_params, **kwargs)¶ Evaluate a clustering method given a similarity matrix and various clustering parameters.
- Args:
co_occ(ndarray): co-occurence matrix.output_folder(str): path to the output folder.temp_folder(str): path to temporary folder.ground_truth(dict): ground truth clustering to compare against.index_to_label(list): list mapping an index to the corresponding named entity. used to generate a readable clustering.cores(int): number of cores to use.task_params(list): additional clustering parameters.formated(bool, optional): ifTruethe co-occurence matrix is expected to be already formatted for MCL input.