evaluation_clustering module¶
evaluation_clustering.py. Main script for running and evaluating result of the clustering process.
Usage:
python evaluation_clustering.py [1] -i [2] -p [3] -t [4] -cfg [5] --mcl --helpwhere:
- [1] : input similarity matrix (unnormalized similarities or pre-treated MCL format). The script expects a ‘exp_configuration.ini’ file in the same folder, usually generated when using
main.py
.- [2]
-i
: MCL inflation parameter. Defaults to 1.4.- [3]
-p
: MCL pre-inflation parameter. Defaults to 1.0.- [4]
-t
: number of cores to use for MCL.- [5]
-cfg
: provide a custom configuration file to replace ‘exp_configuration.ini’.-m, --mcl
: if present, the script expects an input matrix in MCL label format.-h, --help
This outputs the results of the MCL clustering with the given inflation and pre-inflation parameters.
-
evaluation_clustering.
cluster
(co_occ, output_folder, index_to_label, cores, task_params, **kwargs)¶ Returns the clustering obtained after applying the chosen algorithm on the co-occurence matrix.
- Args:
co_occ
(ndarray): Co-occurence matrix.output_folder
(str): path to the output folder.index_to_label
(list): list mapping an index to the corresponding named entity.cores
(int): Number of cores to use for the clustering algorithm (if threading option available).task_params
(dict): additional clustering algorithms.formated
(bool, optional): ifTrue
the co-occurence matrix is expected to be already formatted for MCL input.verbose
(int, optional): controls verbosity level.
- Returns:
clustering
(list): Resulting clustering (as a list mapping a sample’s index to the index of its cluster).n_clusters
(int): number of retrieved clusters.step_id
(%str*): step identifier, optionnal, for outputsummary
(str): string representation of the execution (for displaying purpose).
-
evaluation_clustering.
evaluate
(co_occ, output_folder, temp_folder, ground_truth, index_to_label, cores, task_params, **kwargs)¶ Evaluate a clustering method given a similarity matrix and various clustering parameters.
- Args:
co_occ
(ndarray): co-occurence matrix.output_folder
(str): path to the output folder.temp_folder
(str): path to temporary folder.ground_truth
(dict): ground truth clustering to compare against.index_to_label
(list): list mapping an index to the corresponding named entity. used to generate a readable clustering.cores
(int): number of cores to use.task_params
(list): additional clustering parameters.formated
(bool, optional): ifTrue
the co-occurence matrix is expected to be already formatted for MCL input.