target_extraction.analysis package

Submodules

target_extraction.analysis.dataset_plots module

This module contains plot functions that use the statistics produced from target_extraction.analysis.dataset_statistics

target_extraction.analysis.dataset_plots.sentence_length_plot(collections, tokeniser, as_percentage=True, sentences_with_targets_only=True, ax=None)[source]
Parameters
  • collections (List[TargetTextCollection]) – A list of collections to generate sentence length plots for.

  • tokenizer – The tokenizer to use to split the sentences into tokens. See for a module of comptabile tokenisers target_extraction.tokenizers

  • as_percentage (bool) – The frequency of the sentence lengths should be normalised with respect to the number of sentences in the relevent dataset and then as a percentage.

  • sentences_with_targets_only (bool) – Only use the sentences that have targets within them.

  • ax (Optional[Axes]) – Optional Axes to plot on too.

Return type

Axes

Returns

A line plot where the X-axis represents that sentence length, Y-axis the frequency of the sentence length, and the color represents the collection.

target_extraction.analysis.dataset_plots.target_length_plot(collections, target_key, tokeniser, max_target_length=None, cumulative_percentage=False, ax=None)[source]
Parameters
  • collections (List[TargetTextCollection]) – A list of collections to generate target length plots for.

  • target_key (str) – The key within each sample in the collection that contains the list of targets to be analysed. This can also be the predicted target key, which might be useful for error analysis.

  • tokenizer – The tokenizer to use to split the target(s) into tokens. See for a module of comptabile tokenisers target_extraction.tokenizers

  • max_target_length (Optional[int]) – The maximum target length to plot on the X-axis.

  • cumulative_percentage (bool) – If the return should not be percentage of the number of tokens in each target but rather the cumulative percentage of targets with that number of tokens.

  • ax (Optional[Axes]) – Optional Axes to plot on too.

Return type

Axes

Returns

A point plot where the X-axis represents the target length, Y-axis percentage of samples with that target length, and the hue represents the collection.

target_extraction.analysis.dataset_statistics module

This module allows TargetTextCollection objects to be analysed and report overall statistics.

target_extraction.analysis.dataset_statistics.average_target_per_sentences(collection, sentence_must_contain_targets)[source]
Parameters
  • collection (TargetTextCollection) – Collection to calculate average target per sentence (ATS) on.

  • sentence_must_contain_targets (bool) – Whether or not the sentences within the collection must contains at least one target. This filtering would affect the value of the dominator stated in the returns.

Return type

float

Returns

The ATS for the given collection. Which is: Number of targets / number of sentences

target_extraction.analysis.dataset_statistics.dataset_target_extraction_statistics(collections, lower_target=True, target_key='targets', tokeniser=<function spacy_tokenizer.<locals>._spacy_token_to_text>, dataframe_format=False, incl_sentence_statistics=True)[source]
Parameters
  • collections (List[TargetTextCollection]) – A list of collections

  • lower_target (bool) – Whether to lower case the targets before counting them

  • target_key (str) – The key within each sample in each collection that contains the list of targets to be analysed. This can also be the predicted target key, which might be useful for error analysis.

  • tokenizer – The tokenizer to use to split the target(s) into tokens. See for a module of comptabile tokenisers target_extraction.tokenizers. This is required to give statistics on target length.

  • dataframe_format (bool) – If True instead of a list of dictionaries the return will be a pandas dataframe

  • incl_sentence_statistics (bool) – If False statistics about the sentence will not be included. This is so that the statistics can still be created for datasets that have been anonymised.

Return type

List[Dict[str, Union[str, int, float]]]

Returns

A list of dictionaries each containing the statistics for the associated collection. Each dictionary will have the following keys: 1. Name – this comes from the collection’s name attribute 2. No. Sentences – number of sentences in the collection 3. No. Sentences(t) – number of sentence that contain

targets.

  1. No. Targets – number of targets

  2. No. Uniq Targets – number of unique targets

  3. ATS – Average Target per Sentence (ATS)

  4. ATS(t) – ATS but where all sentences in the collection must contain at least one target.

  5. TL (1) – Percentage of targets that are length 1 based on the number of tokens.

  6. TL (2) – Percentage of targets that are length 2 based on the number of tokens.

  7. TL (3+) – Percentage of targets that are length 3+ based on the number of tokens.

  8. Mean Sent L – Mean sentence length based on the tokens provided by the tokenized_text key in each TargetText within the collections. If this key does not exist then the collection will be tokenized using the given tokeniser argument.

  9. Mean Sent L(t) – Mean Sent L but where all sentences in the collection must contain at least one target.

target_extraction.analysis.dataset_statistics.dataset_target_sentiment_statistics(collections, lower_target=True, target_key='targets', tokeniser=<function spacy_tokenizer.<locals>._spacy_token_to_text>, sentiment_key='target_sentiments', dataframe_format=False, incl_sentence_statistics=True)[source]
Parameters
  • collections (List[TargetTextCollection]) – A list of collections

  • lower_target (bool) – Whether to lower case the targets before counting them

  • target_key (str) – The key within each sample in each collection that contains the list of targets to be analysed. This can also be the predicted target key, which might be useful for error analysis.

  • tokenizer – The tokenizer to use to split the target(s) into tokens. See for a module of comptabile tokenisers target_extraction.tokenizers. This is required to give statistics on target length.

  • sentiment_key (str) – The key in each TargetText within each collection that contains the True sentiment value.

  • dataframe_format (bool) – If True instead of a list of dictionaries the return will be a pandas dataframe

  • incl_sentence_statistics (bool) – If False statistics about the sentence will not be included. This is so that the statistics can still be created for datasets that have been anonymised.

Return type

Union[List[Dict[str, Union[str, int, float]]], DataFrame]

Returns

A list of dictionaries each containing the statistics for the associated collection. Each dictionary will have the keys from dataset_target_extraction_statistics() and the following in addition: 1. POS (%) – Number (Percentage) of positive targets 2. NEU (%) – Number (Percentage) of neutral targets 3. NEG (%) – Number (Percentage) of Negative targets

target_extraction.analysis.dataset_statistics.get_sentiment_counts(collection, sentiment_key, normalised=True)[source]
Parameters
  • collection (TargetTextCollection) – The collection containing the sentiment data

  • sentiment_key (str) – The key in each TargetText within the collection that contains the True sentiment value.

  • normalised (bool) – Whether to normalise the values in the dictionary by the number of targets in the collection.

Return type

Dict[str, float]

Returns

A dictionary where keys are sentiment values and the keys are the number of times they occur in the collection.

target_extraction.analysis.dataset_statistics.tokens_per_sentence(collection, tokeniser)[source]
Parameters
  • collection (TargetTextCollection) – The collection to generate the statistic for.

  • tokeniser (Callable[[str], List[str]]) – The tokenizer to use to split the sentences/texts into tokens. If the collection has already been tokenised then it will use the tokens in the tokenized_text key within each sample in the collection else it will produce the tokens within this function and save them to that key as well. For a module of comptabile tokenisers target_extraction.tokenizers

Return type

Dict[int, int]

Returns

A dictionary of sentence lengths and their frequency. This is a defaultdict where the value will be 0 if the key does not exist.

target_extraction.analysis.dataset_statistics.tokens_per_target(collection, target_key, tokeniser, normalise=False, cumulative_percentage=False)[source]
Parameters
  • collection (TargetTextCollection) – collection to analyse

  • target_key (str) – The key within each sample in the collection that contains the list of targets to be analysed. This can also be the predicted target key, which might be useful for error analysis.

  • tokenizer – The tokenizer to use to split the target(s) into tokens. See for a module of comptabile tokenisers target_extraction.tokenizers

  • normalise (bool) – The values are normalised based on the total number of targets. (This does not change the return if cumulative_percentage is True)

  • cumulative_percentage (bool) – If the return should not be frequency counts of the number of tokens in each target but rather the cumulative percentage of targets with that number of tokens.

Return type

Dict[int, int]

Returns

The dictionary where keys are the target length based on the number of tokens in the target and the values are the number of targets in the dataset that contain that number of tokens (same target can be counted more than once if it exists in the dataset more then once). This is a defaultdict where the value will be 0 if the key does not exist.

target_extraction.analysis.sentiment_error_analysis module

This module is dedicated to creating new TargetTextCollections that are subsamples of the original(s) that will allow the user to analysis the data with respect to some certain property.

target_extraction.analysis.sentiment_error_analysis.ERROR_SPLIT_SUBSET_NAMES = {'DS': ['distinct_sentiment_1', 'distinct_sentiment_2', 'distinct_sentiment_3'], 'NT': ['1-target', 'low-targets', 'med-targets', 'high-targets'], 'TSR': ['unknown_sentiment_known_target', 'unknown_targets', 'known_sentiment_known_target'], 'TSSR': ['1-TSSR', '1-multi-TSSR', 'low-TSSR', 'high-TSSR'], 'n-shot': ['zero-shot', 'low-shot', 'med-shot', 'high-shot']}
exception target_extraction.analysis.sentiment_error_analysis.NoSamplesError(error_string)[source]

Bases: Exception

If there are or will be no samples within a Dataset or subset.

target_extraction.analysis.sentiment_error_analysis.PLOT_SUBSET_ABBREVIATION = {'1-TSSR': '1', '1-multi-TSSR': '1-Multi', '1-target': '1', 'distinct_sentiment_1': 'DS1', 'distinct_sentiment_2': 'DS2', 'distinct_sentiment_3': 'DS3', 'high-TSSR': 'High', 'high-shot': 'High', 'high-targets': 'High', 'known_sentiment_known_target': 'KSKT', 'low-TSSR': 'Low', 'low-shot': 'Low', 'low-targets': 'Low', 'med-shot': 'Med', 'med-targets': 'Med', 'unknown_sentiment_known_target': 'USKT', 'unknown_targets': 'UT', 'zero-shot': 'Zero'}
target_extraction.analysis.sentiment_error_analysis.SUBSET_NAMES_ERROR_SPLIT = {}
target_extraction.analysis.sentiment_error_analysis.count_error_key_occurrence(dataset, error_key)[source]
Parameters
  • dataset (TargetTextCollection) – The dataset that contains error analysis key which are one hot encoding of whether a target is in that error analysis class or not. Example function that produces these error keys are target_extraction.error_analysis.same_one_sentiment()

  • error_key (str) – Name of the error key e.g. same_one_sentiment

Return type

int

Returns

The number of targets within the dataset that are in that error class.

Raises

KeyError – If the error_key does not exist in one or more of the TargetText objects within the dataset

target_extraction.analysis.sentiment_error_analysis.different_sentiment(test_dataset, train_dataset, lower=True)[source]

Given a test and train dataset will return the same test dataset but with an additional key different_sentiment for each TargetText object in the test collection. This different_sentiment key will contain a list the same length as the number of targets in that TargetText object with 0’s and 1’s where a 1 represents the associated target has no overlap in sentiment labels between the test and the train.

Note

If the TargetText object targets is None as in there are no targets in that sample then the different_sentiment key will be represented as an empty list

Parameters
Return type

TargetTextCollection

Returns

The test dataset but with each TargetText object containing a different_sentiment key and associated list of values.

target_extraction.analysis.sentiment_error_analysis.distinct_sentiment(dataset, separate_labels=False, true_sentiment_key='target_sentiments')[source]
Parameters
  • dataset (TargetTextCollection) – The dataset to add the distinct sentiment labels to

  • separate_labels (bool) – If True instead of having one error key distinct_sentiment which contains a value of a list of the number of distinct sentiments. There will be n error keys of the format distinct_sentiment_n where for each TargetText object each one will contain 0’s apart from the n value which is the correct number of distinct sentiments. The value n is computed based on the number of unique distinct sentiments in the collection. Example if there are 2 distinct sentiment in the collection {2, 3} and the current TargetText contain 2 targets with 2 distinct sentiments then it will contain the following keys and values: distinct_sentiment_2: [1,1] and distinct_sentiment_3: [0,0].

  • true_sentiment_key (str) – Key in the target_collection targets that contains the true sentiment scores for each target in the TargetTextCollection.

Return type

TargetTextCollection

Returns

The same dataset but with each TargetText object containing a distinct_sentiment or distinct_sentiment_n key(s) and associated number of distinct sentiments that are in that TargetText object per target.

Example

Given a TargetTextCollection that contains a single TargetText object that has three targets where the first two have the label positive and the last is negative it will add the distinct_sentiment key to the TargetText object with the following value [2,2,2] as there are two unique/distinct sentiments in that TargetText object.

Raises

ValueError – If separate_labels is True and there are no sentiment labels in the collection.

target_extraction.analysis.sentiment_error_analysis.error_analysis_wrapper(error_function_name)[source]

To get a list of all possible function names easily use the keys of target_extraction.analysis.sentiment_error_analysis.ERROR_SPLIT_SUBSET_NAMES dictionary.

Parameters

error_function_name (str) – This can be either 1. DS, 2. NT, 3. TSSR, 4. n-shot, 5. TSR

Return type

Callable[[TargetTextCollection, TargetTextCollection, bool], TargetTextCollection]

Returns

The relevant error function where all error functions have the same function signature where the input is: 1. Train TargetTextCollection, 2. Test TargetTextCollection, and 3. Lower bool - whether to lower the targets. This then returns a the Test TargetTextCollection with the relevant new keys. From the inputs only the Train and Lower are applicable to n-shot and TSR error function due to them both being global functions and relying on target text information.

Raises

ValueError – If the error_function_name is not one of the 5 listed.

target_extraction.analysis.sentiment_error_analysis.error_split_df(train_collection, test_collection, prediction_keys, true_sentiment_key, error_split_and_subset_names, metric_func, metric_kwargs=None, num_cpus=None, lower_targets=True, collection_subsetting=None, include_dataset_size=False, table_format_return=True)[source]

This will perform error_analysis_wrapper over all error_split_subset_names keys and then returns the output from _error_split_df

Parameters
  • train_collection (TargetTextCollection) – The collection that was used to train the models that have made the predictions within test_collection

  • test_collection (TargetTextCollection) – The collection where all TargetText’s contain all prediction_keys, and true_sentiment_key.

  • prediction_keys (List[str]) – A list of keys that contain the predicted sentiment scores for each target in the TargetTextCollection

  • true_sentiment_key (str) – Key that contains the true sentiment scores for each target in the TargetTextCollection

  • error_split_and_subset_names (Dict[str, List[str]]) – The keys do not matter but the List values must represent error subset names. An example dictionary would be: ERROR_SPLIT_SUBSET_NAMES

  • metric_func (Callable[[TargetTextCollection, str, str, bool, bool, Optional[int], bool], Union[float, List[float]]]) –

    A Metric function from target_extraction.analysis.sentiment_metrics. Example

  • metric_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to give to the metric_func the arguments given are: 1. target_collection, 2. true_sentiment_key, 3. predicted_sentiment_key, 4. average, and 5. array_scores

  • num_cpus (Optional[int]) – Number of cpus to use for multiprocessing. The task of subsetting and metric scoring is split down into one task and all tasks are then multiprocessed. This is also done in a Lazy fashion.

  • lower_targets (bool) – Whether or not the targets should be lowered during the error_analysis_wrapper function.

  • collection_subsetting (Optional[List[List[str]]]) – A list of lists where the outer list represents the order of subsetting where as the inner list specifies the subset names to subset on. For example [[‘1-TSSR’, ‘high-shot’], [‘distinct_sentiment_2’]] would first subset the test_collection so that only samples that are within [‘1-TSSR’, ‘high-shot’] subsets are in the collection and then it would subset that collection further so that only ‘distinct_sentiment_2’ samples exist in the collection.

  • include_dataset_size (bool) – The returned DataFrame will have two values the metric associated with the error splits and the size of the dataset from that subset.

  • table_format_return (bool) – If this is True then the return will not be a pivot table but the raw dataframe. This can be more useful as a return format if include_dataset_size is True.

Return type

DataFrame

Returns

A dataframe that has a multi index of [prediction key, run number] and the columns are the error split subset names and the values are the metric associated to those error splits given the prediction key and the model run (run number)

target_extraction.analysis.sentiment_error_analysis.known_sentiment_known_target(test_dataset, train_dataset, lower=True)[source]

Given a test and train dataset will return the same test dataset but with an additional key known_sentiment_known_target for each TargetText object in the test collection. This known_sentiment_known_target key will contain a list the same length as the number of targets in that TargetText object with 0’s and 1’s where a 1 represents a target that exists in both train and test where that target for that instance in the test set has a sentiment that has been seen before in the training set for that target.

Note

If the TargetText object targets is None as in there are no targets in that sample then the known_sentiment_known_target key will be represented as an empty list

Parameters
Return type

TargetTextCollection

Returns

The test dataset but with each TargetText object containing a known_sentiment_known_target key and associated list of values.

target_extraction.analysis.sentiment_error_analysis.n_shot_subsets(test_dataset, train_dataset, lower=True, return_n_values=False)[source]

Given a test and train dataset will return the same test dataset but with 4 additional keys denoted as zero-shot, low-shot, med-shot, and high-shot. Each one of these represents a different set of n values within the n-shot setup. The n-shot setup is the number of times the target within the test sample has been seen in the training dataset. The zero-shot subset contains all targets that have n=0. The low, med, and high contain increasing values n respectively where each subset will contain approximately 1/3 of all samples in the test dataset once the zero-shot subset has been removed.

Parameters
  • test_dataset (TargetTextCollection) – Test dataset to sub-sample

  • train_dataset (TargetTextCollection) – Train dataset to reference

  • lower (bool) – Whether to lower case the target words

  • return_n_values (bool) – If True will return a tuple containing 1. The TargetTextCollection with the new error keys and 2. A list of tuples one for each of the error keys stating the values of n that the error keys are associated too.

Return type

Union[TargetTextCollection, Tuple[TargetTextCollection, List[Tuple[int, int]]]]

Returns

The test dataset but with each TargetText object containing a zero-shot, low-shot, med-shot, and high-shot key and associated list of values.

target_extraction.analysis.sentiment_error_analysis.n_shot_targets(test_dataset, train_dataset, n_condition, error_name, lower=True)[source]

Given a test and train dataset will return the same test dataset but with an additional key denoted by error_name argument for each TargetText object in the test collection. This error_name key will contain a list the same length as the number of targets in that TargetText object with 0’s and 1’s where a 1 represents a target that has meet the n_condition. This allows you to find the performance of n shot target learning where the n_condition can allow you to find zero shot target (targets not seen in training but in test (also known as unknown targets)) or find >K shot targets where targets have been seen K or more times.

Note

If the TargetText object targets is None as in there are no targets in that sample then the error_name argument key will be represented as an empty list

Parameters
  • test_dataset (TargetTextCollection) – Test dataset to sub-sample

  • train_dataset (TargetTextCollection) – Train dataset to reference

  • n_condition (Callable[[int], bool]) – A callable that denotes the number of times the target has to be seen in the training dataset to represent a 1 in the error key. Example n_condition lambda x: x>5 this means that a target has to be seen more than 5 times in the training set.

  • error_name (str) – The name of the error key

  • lower (bool) – Whether to lower case the target words

Return type

TargetTextCollection

Returns

The test dataset but with each TargetText object containing a unknown_sentiment_known_target key and associated list of values.

target_extraction.analysis.sentiment_error_analysis.num_targets_subset(dataset, return_n_values=False)[source]

Given a dataset it will add the following four error keys: 1-target, low-targets, med-targets, high-targets to each target text object. where each value associated to the error keys are a list of 1’s or 0’s the length of the number of samples where 1 denotes the error key is True and 0 otherwise. 1-target is 1 when the target text object contains one target. The others are based on the frequency of targets with respect to the number of samples in the dataset where if the target is in the low 1/3 of most frequent targets based on samples then it is binned in the low-targets, middle 1/3 med-targets etc.

Parameters
  • dataset (TargetTextCollection) – The dataset to add the following four error keys: 1-target, low-targets, med-targets, high-targets.

  • return_n_values (bool) – Whether to return the number of targets in the sentence are associated to the 4 error keys as a List of Tuples.

Return type

Union[TargetTextCollection, Tuple[TargetTextCollection, List[Tuple[int, int]]]]

Returns

The same dataset but with each TargetText object containing those four stated error keys and associated list of 1’s or 0’s denoting if the error key exists or not.

target_extraction.analysis.sentiment_error_analysis.reduce_collection_by_key_occurrence(dataset, error_key, associated_keys)[source]
Parameters
  • dataset (TargetTextCollection) – The dataset that contains error analysis key which are one hot encoding of whether a target is in that error analysis class or not. Example function that produces these error keys are target_extraction.error_analysis.same_one_sentiment()

  • error_key (Union[str, List[str]]) – Name of the error key e.g. same_one_sentiment. Or it can be a list of error keys for which this will reduce the collection so that it includes all samples that contain at least one of these error keys.

  • associated_keys (List[str]) – The keys that are associated to the target that must be kept and are linked to that target. E.g. target_sentiments, targets, spans, and subset error keys.

Return type

TargetTextCollection

Returns

A new TargetTextCollection that contains only those targets and relevant associated_keys within the TargetText’s that the error analysis key(s) were True (1 in the one hot encoding). This could mean that some TargetText’s will no longer exist.

Raises

KeyError – If the error_key or one or more of the associated_keys does not exist in one or more of the TargetText objects within the dataset

target_extraction.analysis.sentiment_error_analysis.reduce_collection_by_sentiment_class(dataset, reduce_sentiment, associated_keys, sentiment_key='target_sentiments')[source]
Parameters
  • dataset (TargetTextCollection) – The dataset that is to be reduced so that it only contains the given sentiment class.

  • reduce_sentiment (str) – The sentiment class that the target must be associated with to be returned in this TargetTextCollection.

  • associated_keys (List[str]) – The keys that are associated to the target that must be kept and are linked to that target. E.g. target_sentiments, targets, spans, and subset error keys.

  • sentiment_key (str) – The key in the TargetText samples within the collection that contains the sentiment values to reduce by.

Return type

TargetTextCollection

Returns

A new TargetTextCollection that contains only those targets and relevant associated_keys within the TargetText’s that contains the given sentiment.

Raises

KeyError – If the error_key or one or more of the associated_keys does not exist in one or more of the TargetText objects within the dataset

target_extraction.analysis.sentiment_error_analysis.same_multi_sentiment(test_dataset, train_dataset, lower=True)[source]

Given a test and train dataset will return the same test dataset but with an additional key same_multi_sentiment for each TargetText object in the test collection. This same_multi_sentiment key will contain a list the same length as the number of targets in that TargetText object with 0’s and 1’s where a 1 represents the associated target has the same sentiment labels (more than one sentiment label e.g. positive and negative not just positive or not just negative) in the train and test where as the 0 means it does not.

Note

If the TargetText object targets is None as in there are no targets in that sample then the same_multi_sentiment key will be represented as an empty list

Parameters
Return type

TargetTextCollection

Returns

The test dataset but with each TargetText object containing a same_multi_sentiment key and associated list of values.

target_extraction.analysis.sentiment_error_analysis.same_one_sentiment(test_dataset, train_dataset, lower=True)[source]

Given a test and train dataset will return the same test dataset but with an additional key same_one_sentiment for each TargetText object in the test collection. This same_one_sentiment key will contain a list the same length as the number of targets in that TargetText object with 0’s and 1’s where a 1 represents the associated target has the same one sentiment label in the train and test where as the 0 means it does not.

Note

If the TargetText object targets is None as in there are no targets in that sample then the same_one_sentiment key will be represented as an empty list

Parameters
Return type

TargetTextCollection

Returns

The test dataset but with each TargetText object containing a same_one_sentiment key and associated list of values.

target_extraction.analysis.sentiment_error_analysis.similar_sentiment(test_dataset, train_dataset, lower=True)[source]

Given a test and train dataset will return the same test dataset but with an additional key similar_sentiment for each TargetText object in the test collection. This similar_sentiment key will contain a list the same length as the number of targets in that TargetText object with 0’s and 1’s where a 1 represents the associated target has occured more than once in the train or test sets with at least some overlap between the test and train sentiments but not identical. E.g. the target camera could occur with positive and negative sentiment in the test set and only negative in the train set.

Note

If the TargetText object targets is None as in there are no targets in that sample then the similar_sentiment key will be represented as an empty list

Parameters
Return type

TargetTextCollection

Returns

The test dataset but with each TargetText object containing a similar_sentiment key and associated list of values.

target_extraction.analysis.sentiment_error_analysis.subset_metrics(target_collection, subset_error_key, metric_funcs, metric_names, metric_kwargs, include_dataset_size=False)[source]

This is most useful to find the metric score of an error subset

Parameters
  • target_collection (TargetTextCollection) – TargetTextCollection that contains the subset_error_key in each TargetText within the collection

  • subset_error_key (Union[str, List[str]]) – The error key(s) to reduce the collection by. The samples left will only be those where the error key is True. An example of a subset_error_key would be zero-shot from the n_shot_targets(). This can also be a list of keys e.g. [zero-shot, low-shot] from the n_shot_targets().

  • metric_funcs (List[Callable[[TargetTextCollection, str, str, bool, bool, Optional[int], bool], Union[float, List[float]]]]) – A list of metric functions from target_extraction.analysis.sentiment_metrics. Example metric function is target_extraction.analysis.sentiment_metrics.accuracy()

  • metric_names (List[str]) – Names to give to each metric_funcs

  • metric_kwargs (Dict[str, Union[str, bool, int]]) – Keywords argument to give to the metric_funcs the only argument given is the first argument which will always be target_collection

  • include_dataset_size (bool) – If True the returned dictionary will also include a key dataset size that will contain an integer specifying the size of the dataset the metric(s) was calculated on.

Return type

Dict[str, Union[List[float], float, int]]

Returns

A dictionary where the keys are the metric_names and the values are the respective metric applied to the reduced/subsetted dataset. Thus if average in metric_kwargs is True then the return will be Dict[str, float] where as if array_scores is True then the return will be Dict[str, List[float]]. If no targets exist in the collection through subsetting then the metric returned is 0.0 or [0.0] if array_scores is true in metric_kwargs.

target_extraction.analysis.sentiment_error_analysis.subset_name_to_error_split(subset_name)[source]

This in affect inverts the ERROR_SPLIT_SUBSET_NAMES dictionary and returns the relevant error split name. It also initialises ERROR_SPLIT_SUBSET_NAMES.

Parameters

subset_name (str) – Name of the subset you want to know which error split it has come from.

Return type

str

Returns

Associated error split name that the subset name has come from.

target_extraction.analysis.sentiment_error_analysis.swap_and_reduce(_collection, subset_key, true_sentiment_key, prediction_keys)[source]

Furthermore the keys that will be reduced won’t just be the targets, spans, true_sentiment_key and all prediction_keys but any error subset name from within PLOT_SUBSET_ABBREVIATION that is in the TargetTexts in the collection.

Parameters
  • _collection (TargetTextCollection) – TargetTextCollection to reduce the samples based on the subset_key argument given.

  • subset_key (Union[str, List[str]]) – Name of the error key e.g. same_one_sentiment. Or it can be a list of error keys for which this will reduce the collection so that it includes all samples that contain at least one of these error keys.

  • true_sentiment – The key in each TargetText within the collection that contains the true sentiment labels.

  • prediction_keys (List[str]) – The list of keys in each TargetText where each key contains a list of predicted sentiments. These predicted sentiments are expected to be in a list of a list where the outer list defines the number of models trained e.g. number of model runs and the inner list is the length of the number of predictions required for that text/sentence.

Return type

TargetTextCollection

Returns

A collection that has been reduced based on the subset_key argument. This is a helper function for the reduce_collection_by_key_occurrence as this function ensure that the predicted sentiment keys are changed before and after reducing the collection so that they are processed properly as the predicted sentiment labels are of shape (number of model runs, number of sentiments) where as all other lists in the TargetText are of (number of sentiments) size. Furthermore if the reduction causes all any of the TargetText’s in the collection to have no Targets then that TargetText will be removed from the collection, thus you could have a collection of zero.

target_extraction.analysis.sentiment_error_analysis.swap_list_dimensions(collection, key)[source]
Parameters
  • collection (TargetTextCollection) – The TargetTextCollection to change

  • key (str) – The key within the TargetText objects in the collection that contain a List Value of shape (dim 1, dim 2)

Return type

TargetTextCollection

Returns

The collection but with the key values shape changed from (dim 1, dim 2) to (dim 2, dim 1)

Note

This is a useful function when you need to change the predicted values from shape (number runs, number targets) to (number target, number runs) before using the following function reduce_collection_by_key_occurrence where one of the associated_keys are predicted values. It is required that the sentiment predictions are of shape (number runs, number targets) for the sentiment_metrics functions.

target_extraction.analysis.sentiment_error_analysis.tssr_raw(dataset)[source]

Given a dataset it will add a continuos number of error keys to each target text object, where each key represents the TSSR value that the associated target is within. Each value associated to the error keys are a list of 1’s or 0’s the length of the number of samples where 1 denotes the error key is True and 0 otherwise. See :py:func`target_extraction.error_analysis.tssr_target_value` for an explanation of how the TSSR value is calculated.

Parameters

dataset (TargetTextCollection) – The dataset to add the continuos TSSR error keys too.

Return type

Tuple[TargetTextCollection, Dict[str, int]]

Returns

The same dataset but with each TargetText object containing the continuos TSSR error keys and associated list of 1’s or 0’s denoting if the error key exists or not. The dictionary contains keys which are the TSSR values detected in the dataset and the values are the number of targets that contain that TSSR value.

target_extraction.analysis.sentiment_error_analysis.tssr_subset(dataset, return_tssr_boundaries=False)[source]

Given a dataset it will add either 1-multi-TSSR, 1-TSSR, high-TSSR or low-TSSR error keys to each target text object. Each value associated to the error keys are a list of 1’s or 0’s the length of the number of samples where 1 denotes the error key is True and 0 otherwise. For more information on how TSSR is calculated see :py:func`target_extraction.error_analysis.tssr_target_value`. Once you know what TSSR is: 1-TSSR contains all of the targets that have a TSSR value of 1 but each one is the only target in the sentence, 1-multi-TSSR contains all of the targets that have a TSSR value of 1 and the sentence it comes from contain more than one target. high-TSSR are targets that are in the top 50% of the TSSR values for this dataset excluding the 1-TSSR samples, low-TSSR are the bottom 50% of the TSSR values.

Parameters
  • dataset (TargetTextCollection) – The dataset to add the continuos TSSR error keys too.

  • return_tssr_boundaries (bool) – If to return the TSSR value boundaries for the 1-TSSR, high-TSSR, and low-TSSR subsets. NOTE that 1-multi-TSSR is not in that list as it would have the same TSSR value boundaries as 1-TSSR.

Return type

Union[TargetTextCollection, Tuple[TargetTextCollection, List[Tuple[float, float]]]]

Returns

The same dataset but with each TargetText object containing the TSSR subset error keys and associated list of 1’s or 0’s denoting if the error key exists or not. The optional second Tuple return are a list of the tssr boundaries.

Raises

NoSamplesError – If there are no samples within a subset.

target_extraction.analysis.sentiment_error_analysis.tssr_target_value(target_data, current_target_sentiment, subset_values=False)[source]

Need to insert the TSSR value equation below: ` `

Parameters
  • target_data (TargetText) – The TargetText object that contains the target associated to the current_target_sentiment

  • current_target_sentiment (Union[str, int]) – The sentiment value associated to the target you want the TSSR value for.

  • subset_values (bool) – If True it produceds to different values for when the TSSR value is 1.0. It produces just 1.0 when there is only one target in the sentence and 1.1 when there is more than one target in the sentence but all of them are 1.0 TSSR value i.e. the sentence only contains one sentiment.

Return type

float

Returns

The TSSR value for a target within target_data with current_target_sentiment sentiment value.

target_extraction.analysis.sentiment_error_analysis.unknown_sentiment_known_target(test_dataset, train_dataset, lower=True)[source]

Given a test and train dataset will return the same test dataset but with an additional key unknown_sentiment_known_target for each TargetText object in the test collection. This unknown_sentiment_known_target key will contain a list the same length as the number of targets in that TargetText object with 0’s and 1’s where a 1 represents a target that exists in both train and test where that target for that instance in the test set has a sentiment that has NOT been seen before in the training set for that target.

Note

If the TargetText object targets is None as in there are no targets in that sample then the unknown_sentiment_known_target key will be represented as an empty list

Parameters
Return type

TargetTextCollection

Returns

The test dataset but with each TargetText object containing a unknown_sentiment_known_target key and associated list of values.

target_extraction.analysis.sentiment_error_analysis.unknown_targets(test_dataset, train_dataset, lower=True)[source]

Given a test and train dataset will return the same test dataset but with an additional key unknown_targets for each TargetText object in the test collection. This unknown_targets key will contain a list the same length as the number of targets in that TargetText object with 0’s and 1’s where a 1 represents a target that exists in the test set but not in the train.

Note

If the TargetText object targets is None as in there are no targets in that sample then the unknown_targets key will be represented as an empty list

Parameters
Return type

TargetTextCollection

Returns

The test dataset but with each TargetText object containing a unknown_targets key and associated list of values.

target_extraction.analysis.sentiment_metrics module

This module contains functions that expect a TargetTextCollection that contains target_sentiments key that represent the true sentiment values and a prediction key e.g. sentiment_predictions. Given these the function will return either a metric score e.g. Accuracy or a list of scores based on the arguments given to the function and if the sentiment_predictions key is an array of values.

Arguments for all functions in this module:

  1. TargetTextCollection – Contains the true and predicted sentiment scores

  2. true_sentiment_key – Key that contains the true sentiment scores for each target in the TargetTextCollection

  3. predicted_sentiment_key – Key that contains the predicted sentiment scores for each target in the TargetTextCollection

  4. average – If the predicting model was ran N times whether or not to average the score over the N runs. Assumes array_scores is False.

  5. array_scores – If average is False and you a model that has predicted N times then this will return the N scores, one for each run.

  6. assert_number_labels – Whether or not to assert this many number of unique labels must exist in the true sentiment key. If this is None then the assertion is not raised.

  7. ignore_label_differences – If True then the ValueError will not be raised if the predicted sentiment values are not in the true sentiment values. See get_labels() for more details.

raises ValueError

If the the prediction model has ran N times where N>1 and average or array_scores are either both True or both False.

raises ValueError

If the number of predictions made per target are different or zero.

raises ValueError

If only one set of model prediction exist then average and array_scores should be False.

raises KeyError

If either the true_sentiment_key or predicted_sentiment_key does not exist.

raises LabelError

If assert_number_labels is not None and the number of unique true labels does not equal the assert_number_labels this is raised.

exception target_extraction.analysis.sentiment_metrics.LabelError(true_number_unique_labels, number_unique_labels_wanted)[source]

Bases: Exception

If the number of unique labels does not match your expected number of unique labels.

target_extraction.analysis.sentiment_metrics.accuracy(target_collection, true_sentiment_key, predicted_sentiment_key, average, array_scores, assert_number_labels=None, ignore_label_differences=True)[source]
Parameters

ignore_label_differences (bool) – See get_labels()

Accuracy score. Description at top of module explains arguments.

Return type

Union[float, List[float]]

target_extraction.analysis.sentiment_metrics.get_labels(target_collection, true_sentiment_key, predicted_sentiment_key, labels_per_text=False, ignore_label_differences=True)[source]
Parameters
  • target_collection (TargetTextCollection) – Collection of targets that have true and predicted sentiment values.

  • true_sentiment_key (str) – Key that contains the true sentiment scores for each target in the TargetTextCollection

  • predicted_sentiment_key (str) – Key that contains the predicted sentiment scores for each target in the TargetTextCollection. It assumes that the predictions is a List of List where the outer list are the number of model runs and the inner list is the number of targets to predict for, the the second Tuple of the example return for an example of this.

  • labels_per_text (bool) – If True instead of returning a List[Any] it will return a List[List[Any]] where in the inner list represents the predictions per text rather than in the normal case where it is all predictions ignoring which text they came from.

  • ignore_label_differences (bool) – If True then the ValueError will not be raised if the predicted sentiment values are not in the true sentiment values.

Return type

Tuple[Union[List[Any], List[List[Any]]], Union[List[List[Any]], List[List[List[Any]]]]]

Returns

A tuple of 1; true sentiment value 2; predicted sentiment values. where the predicted sentiment values is a list of predicted sentiment value, one for each models predictions. See Example of return 2 for an example of what this means where in that example there are two texts/sentences.

Raises
  • ValueError – If the number of predicted sentiment values are not equal to the number true sentiment values.

  • ValueError – If the labels in the predicted sentiment values are not in the true sentiment values.

Example of return 1

([‘pos’, ‘neg’, ‘neu’], [[‘neg’, ‘pos’, ‘neu’], [‘neu’, ‘pos’, ‘neu’]])

Example of return 2

([[‘pos’], [‘neg’, ‘neu’]], [[[‘neg’], [‘pos’, ‘neu’]], [[‘neu’], [‘pos’, ‘neu’]]])

target_extraction.analysis.sentiment_metrics.macro_f1(target_collection, true_sentiment_key, predicted_sentiment_key, average, array_scores, assert_number_labels=None, ignore_label_differences=True, **kwargs)[source]
Parameters
  • ignore_label_differences (bool) – See get_labels()

  • **kwargs

    These are the keyword arguments to give to the underlying scikit learn f1_metric(). Note that the only argument that cannot be changed that is given to f1_metric() is average. If you want the F1 score for one label this can still be done by providing the labels argument where the value would be the label you want the F1 score for e.g. labels = [positive].

Macro F1 score. Description at top of module explains arguments.

Return type

Union[float, List[float]]

target_extraction.analysis.sentiment_metrics.metric_error_checks(func)[source]

Decorator for the metric functions within this module. Will raise any of the Errors stated above in the module documentation before the metric functions is called.

Return type

Callable[[TargetTextCollection, str, str, bool, bool, Optional[int], bool], Union[float, ndarray]]

target_extraction.analysis.sentiment_metrics.strict_text_accuracy(target_collection, true_sentiment_key, predicted_sentiment_key, average, array_scores, assert_number_labels=None, ignore_label_differences=True)[source]

This is performed at the text/sentence level where a sample is not denoted as one target but as all targets within a text. A sample is correct if all targets within the text have been predicted correctly. This will return the average of the correct predictions. Strict Text ACcuracy also known as STAC.

This metric also assumes that all the texts within the target_collection also contains at least one target. If it does not a ValueError will be raised.

Parameters

ignore_label_differences (bool) – See get_labels()

Return type

Union[float, List[float]]

target_extraction.analysis.statistical_analysis module

target_extraction.analysis.statistical_analysis.find_k_estimator(p_values, alpha, method='B')[source]

Given a list of p-values returns the number of those p-values that are significant at the level of alpha according to either the Bonferroni or Fisher correction method. This code has come from Dror et al. 2017 paper. Code base for the paper here Fisher is used if the p-values have come from an independent set i.e. method p-values results from independent datasets. Bonferroni used if this independent assumption is not True.

Fisher is currently not implemented.

Parameters
  • p_values (List[float]) – list of p-values.

  • alpha (float) – significance level.

  • method (str) – ‘B’ for Bonferroni

Return type

int

Returns

Number of datasets that are significant at the level of alpha for the p_values given.

Raises

NotImplementedError – If F is given for the method argument.

target_extraction.analysis.statistical_analysis.one_tailed_p_value(scores_1, scores_2, assume_normal)[source]
Parameters
  • scores_1 (List[float]) – The scores e.g. list of accuracy values that reprsent one model/methods results (multiple scores can come from running the same model/method over different random seeds and/or dataset splits).

  • scores_2 (List[float]) – Same as scores_1 but coming from a different method/model

  • assume_normal (bool) – If the the scores are assumed to come from a normal distribution. See the following guide by Dror and Reichart 2018 to know if your metric/scores can be assumed to be normal or not. The test used when the scores are normal is the Welch’s t-test. When not normal it is the Wilcoxon signed-rank test.

Return type

float

Returns

The p-value of a one-tailed test to determine if scores_1 is better than scores_2.

target_extraction.analysis.util module

target_extraction.analysis.util.add_metadata_to_df(df, target_collection, metadata_prediction_key, metadata_keys=None)[source]
Parameters
  • df (DataFrame) – A DataFrame that contains at least one column named prediction key of which the values in prediction key releate to the keys within TargetTextCollection that store the related to predicted values

  • target_collection (TargetTextCollection) – The collection that stores prediction key and the metadata within target_collection.metadata

  • metadata_prediction_key (str) – The key that stores all of the metadata associated to the prediction key values within target_collection.metadata

  • metadata_keys (Optional[List[str]]) – If not None will only add the metadata keys that relate to the prediction key that are stated in this list of Strings else will add all.

Return type

DataFrame

Returns

The df dataframe but with new columns that are the names of the metadata fields with the values being the values from those metadata fields that relate to the prediction key value.

Raises
  • KeyError – If any of the prediction key values are not keys within the TargetTextCollection targets.

  • KeyError – If any of the prediction key values in the dataframe are not in the target_collection metadata.

target_extraction.analysis.util.combine_metrics(metric_df, other_metric_df, other_metric_name)[source]
Parameters
  • metric_df (DataFrame) – DataFrame that contains all the metrics to be kept

  • other_metric_df (DataFrame) – Contains metric scores that are to be added to a copy of metric_df

  • other_metric_name (str) – Name of the column of the metric scores to be copied from other_metric_df

Return type

DataFrame

Returns

A copy of the metric_df with a new column other_metric_name that contains the other metric scores.

Note

This assumes that the two dataframes come from target_extraction.analysis.util.metric_df() with the argument include_run_number as True. This is due to the columns used to combine the metric scores are prediction key and run number.

Raises

KeyError – If prediction key and run number are not columns within metric_df and other_metric_df

target_extraction.analysis.util.create_subset_heatmap(subset_df, value_column, pivot_table_agg_func=None, font_label_size=10, cubehelix_palette_kwargs=None, value_range=None, lines=True, line_color='k', vertical_lines_index=None, horizontal_lines_index=None, ax=None, heatmap_kwargs=None)[source]
Parameters
  • subset_df (DataFrame) – A DataFrame that contains the following columns: 1. Error Split, 2. Error Subset, 3. Dataset, and 4. value_column

  • value_column (str) – The column that contains the value to be plotted in the heatmap.

  • pivot_table_agg_func (Optional[Callable[[Series], Any]]) – As a pivot table is created to create the heatmap. This allows the replacement default aggregation function (np.mean) with a custom function. The pivot table aggregates the value_column by Dataset, Error Split, and Error Subset.

  • font_label_size (int) – Font sizes of the labels on the returned plot

  • cubehelix_palette_kwargs (Optional[Dict[str, Any]]) – Keywords arguments to give to the seaborn.cubehelix_palette https://seaborn.pydata.org/generated/seaborn.cubehelix_palette.html. Default produces white to dark red.

  • value_range (Optional[List[int]]) – This can also be interpreted as the values allowed in the color range and should cover at least all unique values in value_column.

  • lines (bool) – Whether or not lines should appear on the plot to define the different error splits.

  • line_color (str) – Color of the lines if the lines are to be displayed. The choice of color names can be found here: https://matplotlib.org/3.1.1/gallery/color/named_colors.html#sphx-glr-gallery-color-named-colors-py

  • vertical_lines_index (Optional[List[int]]) – The index of the lines in vertical/column direction. If None default is [0,3,7,11,15,18]

  • horizontal_lines_index (Optional[List[int]]) – The index of the lines in vertical/column direction. If None default is [0,1,2,3]

  • ax (Optional[Axes]) – A matplotlib Axes to give to the seaborn function to plot the heatmap on to.

  • heatmap_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass to the seaborn.heatmap function

Return type

Axes

Returns

A heatmap where the Y-axis represents the datasets, X-axis represents the Error subsets formatted when appropriate with the Error split name, and the values come from the value_column. The heatmap assumes the value_column contains discrete values as the color bar is discrete rather than continuos. If you want a continuos color bar it is recommended that you use Seaborn heatmap.

target_extraction.analysis.util.long_format_metrics(metric_df, metric_column_names)[source]
Parameters
  • metric_df (DataFrame) – DataFrame from target_extraction.analysis.util.metric_df() that contains more than one metric score e.g. Accuracy and Macro F1

  • metric_column_names (List[str]) – The list of the metrics columns names that exist in metric_df

Return type

DataFrame

Returns

A long format metric version of the metric_df e.g. converts a DataFrame that contains Accuracy and Macro F1 scores to a DataFrame that contains Metric and Metric Score columns where the Metric column contains either Accuracy or Macro F1 score and the Metric Score contains the relevant metric score. This will increase the number of row in metric_df by N where N is the length of metric_column_names.

target_extraction.analysis.util.metric_df(target_collection, metric_function, true_sentiment_key, predicted_sentiment_keys, average, array_scores, assert_number_labels=None, ignore_label_differences=True, metric_name='metric', include_run_number=False)[source]
Parameters
  • target_collection (TargetTextCollection) – Collection of targets that have true and predicted sentiment values.

  • metric_function (Callable[[TargetTextCollection, str, str, bool, bool, Optional[int], bool], Union[float, List[float]]]) – A metric function from target_extraction.analysis.sentiment_metrics()

  • true_sentiment_key (str) – Key in the target_collection targets that contains the true sentiment scores for each target in the TargetTextCollection

  • predicted_sentiment_keys (List[str]) – The name of the predicted sentiment keys within the TargetTextCollection for which the metric function should be applied to.

  • average (bool) – For each predicted sentiment key it will return the average metric score across the N predictions made for each predicted sentiment key.

  • array_scores (bool) – If average is False then this will return all of the N model runs metric scores.

  • assert_number_labels (Optional[int]) – Whether or not to assert this many number of unique labels must exist in the true sentiment key. If this is None then the assertion is not raised.

  • ignore_label_differences (bool) – If True then the ValueError will not be raised if the predicted sentiment values are not in the true sentiment values.

  • metric_name (str) – The name to give to the metric value column.

  • include_run_number (bool) – If array_scores is True then this will add an extra column to the returned dataframe (run number) which will include the model run number. This can be used to uniquely identify each row when combined with the prediction key string.

Return type

DataFrame

Returns

A pandas DataFrame with two columns: 1. The prediction key string 2. The metric value. Where the number of rows in the DataFrame is either Number of prediction keys when average is True or Number of prediction keys * Number of model runs when array_scores is True

Raises

ValueError – If include_run_number is True and array_scores is False.

target_extraction.analysis.util.metric_p_values(data_split_df, better_split, compare_splits, datasets, metric_names_assume_normals, better_and_compare_column_name='Model')[source]
Parameters
  • data_split_df (DataFrame) – The DataFrame that contains at least the following columns: 1. value for better_and_compare_column_name, 2. Dataset, and 3. all metric name

  • better_split (str) – The name of the model you are testing if it is better than all other models in the compare_splits

  • compare_splits (List[str]) – The name of the models you assume are no different in score to the better_split model.

  • datasets (List[str]) – Datasets to test the hypothesis on.

  • metric_names_assume_normals (List[Tuple[str, bool]]) – A list of Tuples that contain (metric name, assumed to be normal) where the assumed to be normal is False or True based on whether the metric scores from metric name column can be assumed to be normal or not. e.g. [(Accuracy, True)]

  • better_and_compare_column_name (str) – The column that contains the better_split and compare_splits values.

Return type

DataFrame

Returns

A DataFrame containing the following columns: 1. Metric, 2. Dataset, 3. P-Value, 4. Compared {better_and_compare_column_name}, and 5. Better {better_and_compare_column_name}. Where it tests that one Model is statistically better than the compare models on each given dataset for each metric given.

target_extraction.analysis.util.overall_metric_results(collection, prediction_keys=None, true_sentiment_key='target_sentiments', strict_accuracy_metrics=False)[source]
Parameters
  • collection (TargetTextCollection) – Dataset that contains all of the results. Furthermore it should have the name attribute as something meaningful e.g. Laptop for the Laptop dataset.

  • prediction_keys (Optional[List[str]]) – A list of prediction keys that you want the results for. If None then it will get all of the prediction keys from collection.metadatap[‘predicted_target_sentiment_key’].

  • true_sentiment_key (str) – Key in the target_collection targets that contains the true sentiment scores for each target in the TargetTextCollection.

  • strict_accuracy_metrics (bool) – If this is True the dataframe will also contain three additional columns: ‘STAC’, ‘STAC 1’, and ‘STAC Multi’. Where ‘STAC’ is the Strict Target Accuracy (STAC) on the whole dataset, ‘STAC 1’ and ‘STAC Multi’ is the STAC metric performed on the subset of the dataset that contain either one unique sentiment or more than one unique sentiment per text respectively.

Return type

DataFrame

Returns

A pandas dataframe with the following columns: [‘prediction key’, ‘run number’, ‘Accuracy’, ‘Macro F1’, ‘Dataset’]. The Dataset column will contain one unique value and that will come from the name attribute of the collection. The DataFrame will also contain columns and values from the associated metadata see add_metadata_to_df() for more details.

target_extraction.analysis.util.plot_error_subsets(metric_df, df_column_name, df_row_name, df_x_name, df_y_name, df_hue_name='Model', seaborn_plot_name='pointplot', seaborn_kwargs=None, legend_column=0, figsize=None, legend_bbox_to_anchor=(-0.13, 1.1), fontsize=14, legend_fontsize=10, tick_font_size=12, title_on_every_plot=False, df_overall_metric=None, overall_seaborn_plot_name=None, overall_seaborn_kwargs=None, df_dataset_size=None, dataset_h_line_offset=0.2, dataset_h_line_color='k', h_line_legend_name='Dataset Size (Number of Samples)', h_line_legend_bbox_to_anchor=None, dataset_y_label='Dataset Size\n(Number of Samples)', gridspec_kw=None, row_order=None, column_order=None)[source]

This function is named what it is as it is a good way to visualise the different error subsets and thus error splits after running different error functions from :py:func`target_extraction.analysis.sentiment_error_analysis.error_analysis_wrapper` and further more if you are exploring them over different datasets. To create a graph with these different error analysis subsets, Models, and datasets the following column and row names may be useful: df_column_name = Dataset, df_row_name = Error Split, df_x_name = Error Subset, df_y_name = Accuracy (%), and df_hue_name = Model.

Parameters
  • metric_df (DataFrame) – A DataFrame that will

  • df_column_name (str) – Name of the column in metric_df that will be used to determine the categorical variables to facet the column part of the returned figure

  • df_row_name (str) – Name of the column in metric_df that will be used to determine the categorical variables to facet the row part of the returned figure

  • df_x_name (str) – Name of the column in metric_df that will be used to represent the X-axis in the figure.

  • df_y_name (str) – Name of the column in metric_df that will be used to represent the Y-axis in the figure.

  • df_hue_name (str) – Name of the column in metric_df that will be used to represent the hue in the figure

  • seaborn_plot_name (str) – Name of the seaborn plotting function to use as the plots within the figure

  • seaborn_kwargs (Optional[Dict[str, Any]]) – The key word arguments to give to the seaborn plotting function.

  • legend_column (Optional[int]) – Which column in the figure the legend should be associated too. The row the legend is associated with is fixed at row 0.

  • figsize (Optional[Tuple[float, float]]) – Size of the figure, this is passed to the matplotlib.pyplot.subplots() as an argument.

  • legend_bbox_to_anchor (Tuple[float, float]) – Where the legend box should be within the figure. This is passed as the bbox_to_anchor argument to matplotlib.pyplot.Axes.legend()

  • fontsize (int) – Size of the font for the title, y-axis label, and x-axis label.

  • legend_fontsize (int) – Size of the font for the legend.

  • tick_font_size (int) – Size of the font on the y and x axis ticks.

  • title_on_every_plot (bool) – Whether or not to have the title above every plot in the grid or just over the top row of plots.

  • df_overall_metric (Optional[str]) – Name of the column in metric_df that stores the overall metric score for the entire dataset and not just the subsets.

  • overall_seaborn_plot_name (Optional[str]) – Same as the seaborn_plot_name but for plotting the overall metric

  • overall_seaborn_kwargs (Optional[Dict[str, Any]]) – Same as the seaborn_kwargs but for the overall metric plot.

  • df_dataset_size (Optional[str]) – Name of the column in metric_df that stores the dataset size for one of the X-axis. If this is given it will create h_lines for each X-axis representing the dataset size

  • dataset_h_line_offset (float) – +/- offsets indicating the length of each hline

  • dataset_h_line_color (str) – Color of the hline

  • h_line_legend_name (str) – Name to give to the h_line legend.

  • h_line_legend_bbox_to_anchor (Optional[Tuple[float, float]]) – Where the h line legend box should be within the figure. This is passed as the bbox_to_anchor argument to matplotlib.pyplot.Axes.legend()

  • dataset_y_label (str) – The Y-Label for the right hand side Y-axis.

  • gridspec_kw (Optional[Dict[str, Any]]) – matplotlib.pyplot.subplots() gridspec_kw argument

  • row_order (Optional[List[Any]]) – A list of all unique df_row_name values in the order the rows should appear in.

  • column_order (Optional[List[Any]]) – A list of all unique df_column_name values in the order the columns should appear in.

Return type

Tuple[Figure, List[List[Axes]]]

Returns

A tuple of 1. The figure 2. The associated axes within the figure. The figure will contain N x M plots where N is the number of unique values in the metric_df df_column_name column and M is the number of unique values in the metric_df df_row_name column.

Module contents