target_extraction.allen package

Subpackages

Submodules

target_extraction.allen.allennlp_model module

class target_extraction.allen.allennlp_model.AllenNLPModel(name, model_param_fp, predictor_name, save_dir=None)[source]

Bases: object

This is a wrapper for the AllenNLP dataset readers, models, and predictors so that the input to functions can be target_extraction.data_types.TargetTextCollection objects and the return a metric or metrics as well as predicitons within the target_extraction.data_types.TargetTextCollection objects. This is instead of running everything through multiple bash files calling allennlp train etc.

fit(train_data, val_data, test_data=None)[source]

Given the training, validation, and optionally the test data it will train the model that is defined in the model params file provided as argument to the constructor of the class. Once trained the model can be accessed through the model attribute.

NOTE: If the test data is given the model only uses it to fit to the vocabularly that is within the test data, the model NEVER trains on the test data.

Parameters
Return type

None

load(cuda_device=-1)[source]

Loads the model. This does not require you to train the model if the save_dir attribute is pointing to a folder containing a trained model. This is just a wrapper around the load_archive function.

Parameters

cuda_device (int) – Whether the loaded model should be loaded on to the CPU (-1) or the GPU (0). Default CPU.

Return type

Model

Returns

The model that was saved at self.save_dir

Raises
  • AssertionError – If the save_dir argument is None

  • FileNotFoundError – If the save directory does not exist.

predict_into_collection(collection, key_mapping, batch_size=None, append_if_exists=True)[source]
Parameters
  • collection (TargetTextCollection) – The TargetTextCollection that is to be predicted on and to be the store of the predicted data.

  • key_mapping (Dict[str, str]) – Dictionary mapping the prediction keys that contain the prediction values to the keys that will store those prediction values within the collection that has been predicted on.

  • batch_size (Optional[int]) – Specify the batch size to predict on. If left None defaults to 64 unless it is specified in the model_param_fp within the constructor then the batch size from the param file is used.

  • append_if_exists (bool) – If False and a TargetText within the collection already has a prediction within the given key based on the key_mapping then KeyError is raised.

Return type

TargetTextCollection

Returns

The collection that was predict on with the new predictions within the collection stored in keys that are the values of the key_mapping argument. Note that all predictions are sotred within Lists within their respective keys in the collection.

Raises
  • KeyError – If the keys from key_mapping is not within the prediction dictionary.

  • KeyError – If append_if_exists is False and the a TargetText within the collection already has a prediction within the given key based on the key_mapping then this is raised.

predict_sequences(data, batch_size=None)[source]

Given the data it will predict the sequence labels and return the confidence socres in those labels as well as the words and text the prediction was predicting on.

Parameters
  • data (Union[Iterable[Dict[str, Any]], List[Dict[str, Any]]]) – Iterable or list of dictionaries that contains at least text key and value and if you do not want the predictor to do the tokenization then provide tokens as well. Some model may also expect pos_tags which the predictor will provide if the text key is only provided.

  • batch_size (Optional[int]) – Specify the batch size to predict on. If left None defaults to 64 unless it is specified in the model_param_fp within the constructor then the batch size from the param file is used.

Yields

A dictionary containing all the following keys and values: 1. sequence_labels: A list of predicted sequence labels.

This will be a List of Strings.

  1. confidence: The confidence the model had in predicting each sequence label, this comes from the softmax score. This will be a List of floats.

  2. tokens: The tokens that the confidence and sequence labels are associated to

  3. text: The text that the tokens/words relate to.

Return type

Iterable[Dict[str, Any]]

Module contents