bella.models package¶

Submodules¶

bella.models.base module¶

Module contains all of the main base classes for the machine learning models these are grouped into 3 categories; 1. Mixin, 2. Abstract, and 3. Concrete.

Mixin classes - This is a function based class that contains functions that do not rely on the type of model and are useful for all:

bella.models.base.ModelMixin

Abstract classes - This is used to enforce all the functions that all the machine learning models must have. This is also the class that inherits the Mixin class:

bella.models.base.BaseModel

Concrete classes - These are more concete classes that still contain some abstract methods. However they are the classes to inherit from to create a machine learning model base on a certain framework e.g. SKlearn or Keras:

bella.models.base.SKLearnModel
bella.models.base.KerasModel

class bella.models.base.BaseModel[source]¶

Bases: bella.models.base.ModelMixin, abc.ABC

Abstract class for all of the machine learning models.

Attributes:

model – Machine learning model that is associated to this instance.
fitted – If the machine learning model has been fitted (default False)

Methods:

fit – Fit the model according to the given training data.
predict – Predict class labels for samples in X.
probabilities – The probability of each class label for all samples in X.
__repr__ – Name of the machine learning model.

Class Methods:

name – – Returns the name of the model.

Functions:

save – Saves the given machine learning model instance to a file.
load – Loads the entire machine learning model from a file.
evaluate_parameter – fit and predict given training, validation and test data the given model when the given parameter is changed on the model.
evaluate_parameters – same as evaluate_parameter however it evaluates over many parameter values for the same parameter.

static evaluate_parameter(model, train, val, test, parameter_name, parameter)[source]¶

Given a model will set the parameter_name to parameter fit the model and return the a Tuple of parameter changed and predictions of the model on the test data, using the train and validation data for fitting.

Parameters:	model (`BaseModel`) – `bella.models.base.BaseModel` instance train (`Tuple`[`ndarray`, `ndarray`]) – Tuple of (X_train, y_train). Used to fit the model. val (`Union`[`None`, `Tuple`[`ndarray`, `ndarray`]]) – Tuple of (X_val, y_val) or None is not required. This is only required if the model requires validation data like the `bella.models.base.KerasModel` models do. test (`ndarray`) – X_test data to predict on. parameter_name (`str`) – Name of the parameter to change e.g. optimiser parameter (`Any`) – value to assign to the parameter e.g. `keras.optimizers.RMSprop`
Return type:	`Tuple`[`Any`, `ndarray`]
Returns:	A tuple of (parameter value, predictions)

static evaluate_parameters(model, train, val, test, parameter_name, parameters, n_jobs)[source]¶

Performs bella.models.base.BaseModel.evaluate_parameter() on one parameter_name but with multiple parameter values.

This is useful if you would like to know the affect of changing the values of a parameter. It can also perform the task in a multiprocessing manner if n_jobs > 1.

Parameters:	model (`BaseModel`) – `bella.models.base.BaseModel` instance train (`Tuple`[`ndarray`, `ndarray`]) – Tuple of (X_train, y_train). Used to fit the model. val (`Union`[`None`, `Tuple`[`ndarray`, `ndarray`]]) – Tuple of (X_val, y_val) or None is not required. This is only required if the model requires validation data like the `bella.models.base.KerasModel` models do. test (`ndarray`) – X_test data to predict on. parameter_name (`str`) – Name of the parameter to change e.g. optimiser parameters (`List`[`Any`]) – A list of values to assign to the parameter e.g. [`keras.optimizers.RMSprop`] n_jobs (`int`) – Number of cpus to use for multiprocessing if 1 then will not multiprocess.
Return type:	`List`[`Tuple`[`Any`, `ndarray`]]
Returns:	A list of tuples of (parameter value, predictions)

fit(X, y)[source]¶

Fit the model according to the given training data.

Parameters:	X (`ndarray`) – Training samples matrix, shape = [n_samples, n_features] y (`ndarray`) – Training targets, shape = [n_samples]
Return type:	`None`
Returns:	The model attribute will now be trained.

fitted¶

If the machine learning model has been fitted (default False)

Return type:	`bool`
Returns:	True or False

static load(load_fp)[source]¶

Loads the entire machine learning model from a file.

Parameters:	load_fp (`Path`) – File path of the location that the model was saved to.
Return type:	`BaseModel`
Returns:	self

model¶

Machine learning model that is associated to this instance.

Return type:	`Any`
Returns:	The machine learning model

classmethod name()[source]¶

Returns the name of the model.

Return type:	`str`
Returns:	Name of the model

predict(X)[source]¶

Predict class labels for samples in X.

Parameters:	X (`ndarray`) – Test samples matrix, shape = [n_samples, n_features]
Return type:	`ndarray`
Returns:	Predicted class label per sample, shape = [n_samples]

probabilities(X)[source]¶

The probability of each class label for all samples in X.

Parameters:	X (`ndarray`) – Test samples matrix, shape = [n_samples, n_features]]
Return type:	`ndarray`
Returns:	Probability of each class label for all samples, shape = [n_samples, n_classes]

static save(model, save_fp)[source]¶

Saves the entire machine learning model to a file.

Parameters:	model (`BaseModel`) – The machine learning model instance to be saved. save_fp (`Path`) – File path of the location that the model is to be saved to.
Return type:	`None`
Returns:	Nothing.

class bella.models.base.KerasModel[source]¶

Bases: bella.models.base.BaseModel

Concrete class that is designed to be used as the base class for all machine learning models that are based on the Keras library.

Attributes:

tokeniser – Tokeniser model uses e.g. str.split().
embeddings – the word embeddings the model uses. e.g. bella.word_vectors.SSWE
lower – if the model lower cases the words when pre-processing the data
reproducible – Whether to be reproducible. If None then it is quicker to run. Else provide a int that will represent the random seed value.
patience – Number of epochs with no improvement before training is stopped.
batch_size – Number of samples per gradient update.
epcohs – Number of times to train over the entire training set before stopping.
optimiser – Optimiser the model uses. e.g. keras.optimizers.SGD
optimiser_params – Parameters for the optimiser. If None uses default for the optimiser being used.

Abstract Methods:

keras_model – Keras machine Learning model that represents the class e.g. single forward LSTM.
create_training_text – Converts the training and validation data into a format that the keras model can take as input.
create_training_y – Converts the training and validation targets into a format that can be used by the keras model.

Methods:

fit – Fit the model according to the given training and validation data.
probabilities – The probability of each class label for all samples in X.
predict – Predict class labels for samples in X.

Functions:

save – Given a instance of this class will save it to a file.
load – Loads an instance of this class from a file.
evaluate_parameter – fit and predict given training, validation and test data the given model when the given parameter is changed on the model.
evaluate_parameters – same as evaluate_parameter however it evaluates over many parameter values for the same parameter.

batch_size¶

batch_size attribute

Return type:	`int`
Returns:	The batch_size used in the model

create_training_text(train_data, validation_data)[source]¶

Converts the training and validation data into a format that the keras model can take as input.

Return type:	`Tuple`[`Any`, `Any`]
Returns:	A tuple of length two containing the keras model training and validation input respectively.

create_training_y(train_y, validation_y)[source]¶

Converts the training and validation targets into a format that can be used by the keras model

Return type:	`Tuple`[`ndarray`, `ndarray`]
Returns:	A tuple of length containing two array the first for training and the second for validation.

embeddings¶

embeddings attribute

Return type:	`WordVectors`
Returns:	The embeddings used in the model

epochs¶

epochs attribute

Return type:	`int`
Returns:	The epochs used in the model

static evaluate_parameter(model, train, val, test, parameter_name, parameter)[source]¶

Given a model will set the parameter_name to parameter fit the model and return the a Tuple of parameter changed and predictions of the model on the test data, using the train and validation data for fitting.

Parameters:	model (`KerasModel`) – KerasModel instance train (`Tuple`[`ndarray`, `ndarray`]) – Tuple of (X_train, y_train). Used to fit the model. val (`Tuple`[`ndarray`, `ndarray`]) – Tuple of (X_val, y_val). Used to evaluate the model at each epoch. Will not be trained on this data. test (`ndarray`) – X_test data to predict on. parameter_name (`str`) – Name of the parameter to change e.g. optimiser parameter (`Any`) – value to assign to the parameter e.g. `keras.optimizers.RMSprop`
Return type:	`Tuple`[`Any`, `ndarray`]
Returns:	A tuple of (parameter value, predictions)

static evaluate_parameters(model, train, val, test, parameter_name, parameters, n_jobs)[source]¶

Performs bella.models.base.KerasModel.evaluate_parameter() on one parameter_name but with multiple parameter values.

This is useful if you would like to know the affect of changing the values of a parameter. It can also perform the task in a multiprocessing manner if n_jobs > 1.

Parameters:	model (`KerasModel`) – `bella.models.base.KerasModel` instance train (`Tuple`[`ndarray`, `ndarray`]) – Tuple of (X_train, y_train). Used to fit the model. val (`Tuple`[`ndarray`, `ndarray`]) – Tuple of (X_val, y_val). Used to evaluate the model at each epoch. Will not be trained on this data. test (`ndarray`) – X_test data to predict on. parameter_name (`str`) – Name of the parameter to change e.g. optimiser parameters (`List`[`Any`]) – A list of values to assign to the parameter e.g. [`keras.optimizers.RMSprop`] n_jobs (`int`) – Number of cpus to use for multiprocessing if 1 then will not multiprocess.
Return type:	`List`[`Tuple`[`Any`, `ndarray`]]
Returns:	A list of tuples of (parameter value, predictions)

fit(X, y, validation_data, verbose=0, continue_training=False)[source]¶

Fit the model according to the given training and validation data.

Parameters:	X (`ndarray`) – Training samples matrix, shape = [n_samples, n_features] y (`ndarray`) – Training targets, shape = [n_samples] validation_data (`Tuple`[`ndarray`, `ndarray`]) – Tuple of (x_val, y_val). Used to evaluate the model at each epoch. Will not be trained on this data. verbose (`int`) – 0 = silent, 1 = progress continue_training (`bool`) – Whether the model that has already been trained should be trained further.
Return type:	`History`
Returns:	A record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values.

keras_model(num_classes)[source]¶

Keras machine Learning model that represents the class e.g. single forward LSTM.

Return type:	`Model`
Returns:	Keras machine learning model

static load(load_fp)[source]¶

Loads an instance of this class from a file.

Parameters:	load_fp (`Path`) – File path of the location that the model was saved to.
Return type:	`KerasModel`
Returns:	self

lower¶

lower attribute

Return type:	`bool`
Returns:	The lower used in the model

optimiser¶

optimiser attribute

Return type:	`OptimizerV2`
Returns:	The optimiser used in the model

optimiser_params¶

optimiser_params attribute

Return type:	`Optional`[`Dict`[`str`, `Any`]]
Returns:	The optimiser_params used in the model

patience¶

patience attribute

Return type:	`int`
Returns:	The patience used in the model

predict(X)[source]¶

Predict class labels for samples in X.

Parameters:	X (`ndarray`) – Test samples matrix, shape = [n_samples, n_features]
Return type:	`ndarray`
Returns:	Predicted class label per sample, shape = [n_samples]

probabilities(X)[source]¶

The probability of each class label for all samples in X.

Parameters:	X (`ndarray`) – Test samples matrix, shape = [n_samples, n_features]]
Return type:	`ndarray`
Returns:	Probability of each class label for all samples, shape = [n_samples, n_classes]

process_text(texts, max_length, padding='pre', truncate='pre')[source]¶

Given a list of Strings, tokenised the text and lower case if set and then convert the tokens into a integers representing the tokens in the embeddings. Lastly it pads the data based on the max_length param.

If the max_length is smaller than the sentences size it truncates the sentence. If max_length = -1 then the max_length is that of the longest sentence in the texts.

Params max_length:
Params texts:	List of texts
	How many tokens a sentence can contain. If it is -1 then it uses the sentence with the most tokens as the max_length parameter.
Params padding:	Which side of the sentence to pad: pre beginning, post end.
Params truncate:
	Which side of the sentence to truncate: pre beginning post end.
Return type:	`Tuple`[`int`, `ndarray`]
Returns:	A tuple of length 2 containg: 1. The max_length parameter, 2. A matrix of shape [n_samples, pad_size] where each integer in the matrix represents the word embedding lookup.
Raises:	ValueError – If the mex_length argument is equal to or less than 0. Or if the calculated max_length is 0.

reproducible¶

reproducible attribute

Return type:	`Optional`[`int`]
Returns:	The reproducible used in the model

static save(model, save_fp)[source]¶

Given a Keras Model, mode, path to the folder to save too, and a name to save the files it will save the data to restore the model.

Parameters:	model (`KerasModel`) – The machine learning model instance to be saved. save_fp (`Path`) – File path of the location that the model is to be saved.
Return type:	`None`
Returns:	Nothing.
Raises:	ValueError – If the model has not been fitted or if the model is not of type `bella.models.base.KerasModel`

tokeniser¶

tokeniser attribute

Return type:	`Callable`[[`str`], `List`[`str`]]
Returns:	The tokeniser used in the model

class bella.models.base.ModelMixin[source]¶

Bases: object

Mixin class for all of the machine learning models. Contain functions only so they are as generic as possible.

Functions:

train_val_split – Splits the training dataset into a train and validation set in a stratified split.

static train_val_split(train, split_size=0.2, seed=42)[source]¶

Splits the training dataset into a train and validation set in a stratified split.

Parameters:	train (`TargetCollection`) – The training dataset that needs to be split into split_size (`float`) – Fraction of the dataset to assign to the validation set. seed (`Union`[`None`, `int`]) – Seed value to give to the stratified splitter. If None then it uses the radnom state of numpy.
Return type:	`Tuple`[`Tuple`[`ndarray`, `ndarray`], `Tuple`[`ndarray`, `ndarray`]]
Returns:	Two tuples of length two where each tuple is the train and validation splits respectively, and each tuple contains the data (X) and class labels (y) respectively. Returns ((X_train, y_train), (X_val, y_val))

class bella.models.base.SKLearnModel(*args, **kwargs)[source]¶

Bases: bella.models.base.BaseModel

Concrete class that is designed to be used as the base class for all machine learning models that are based on the scikit learn library.

At the moment expects all of the machine learning models to use a SVM as their classifier. This is due to assuming the model will have the method sklearn.svm.SVC.decision_function() to get probabilities.

NOTE each time the model_parameters are set it resets the model i.e. the fitted attribute is False

Attributes:

model – Machine learning model. Expects it to be a sklearn.pipeline.Pipeline instance.
fitted – If the machine learning model has been fitted (default False)
model_parameters – The parameters that are set in the machine learning model. E.g. Parameter could be the tokeniser used.

Abstract Class Methods:

get_parameters – Transform the given parameters into a dictonary that is accepted as model parameters.
get_cv_parameters – Transform the given parameters into a list of dictonaries that is accepted as param_grid parameter in sklearn.model_selection.GridSearchCV
normalise_parameter_names – Converts the output of get_parameters() into a dictionary that can be used as input into get_parameters(). This is required so that the evaluate_parameters() can work this class.

Methods:

fit – Fit the model according to the given training data.
predict – Predict class labels for samples in X.
probabilities – The probability of each class label for all samples in X.
__repr__ – Name of the machine learning model.

Functions:

save – Given a instance of this class will save it to a file.
load – Loads an instance of this class from a file.
evaluate_parameter – fit and predict given training, validation and test data the given model when the given parameter is changed on the model.
evaluate_parameters – same as evaluate_parameter however it evaluates over many parameter values for the same parameter.
grid_search_model – Given a model class it will perform a Grid Search over the parameters you give to the models bella.models.base.SKLearnModel.get_cv_parameters() function via the keyword arguments. Returns a pandas dataframe representation of the grid search results.
get_grid_score – Given the return of the grid_search_model() will return the grid scores as a List of the mean test accuracy result.
models_best_parameter – Given a list of models and their base model arguments, it will find the best parameter value out of the values given for that parameter while keeping the base model arguments constant for each model.

Abstract Functions:

Pipeline – Machine Learning model that is used as the base template for the model attribute. Expects it to be a sklearn.pipeline.Pipeline instance.

__init__(*args, **kwargs)[source]¶

Return type:	`None`

static evaluate_parameter(model, train, val, test, parameter_name, parameter)[source]¶

Given a model will set the parameter_name to parameter fit the model and return the a Tuple of parameter changed and predictions of the model on the test data, using the train and validation data for fitting.

Parameters:	model (`SKLearnModel`) – `bella.models.base.SKLearn` instance train (`Tuple`[`ndarray`, `ndarray`]) – Tuple of (X_train, y_train). Used to fit the model. val (`None`) – Use None. This is only kept to keep the API clean. test (`ndarray`) – X_test data to predict on. parameter_name (`str`) – Name of the parameter to change e.g. word_vectors parameter (`Any`) – value to assign to the parameter e.g. `bella.word_vectors.SSWE`
Return type:	`Tuple`[`Any`, `ndarray`]
Returns:	A tuple of (parameter value, predictions)

static evaluate_parameters(model, train, val, test, parameter_name, parameters, n_jobs)[source]¶

Performs bella.models.base.KerasModel.evaluate_parameter() on one parameter_name but with multiple parameter values.

This is useful if you would like to know the affect of changing the values of a parameter. It can also perform the task in a multiprocessing manner if n_jobs > 1.

Parameters:	model (`SKLearnModel`) – `bella.models.base.SKLearn` instance train (`Tuple`[`ndarray`, `ndarray`]) – Tuple of (X_train, y_train). Used to fit the model. val (`None`) – Use None. This is only kept to keep the API clean. test (`ndarray`) – X_test data to predict on. parameter_name (`str`) – Name of the parameter to change e.g. word_vectors parameters (`List`[`Any`]) – A list of values to assign to the parameter e.g. [`bella.word_vectors.SSWE`] n_jobs (`int`) – Number of cpus to use for multiprocessing if 1 then will not multiprocess.
Return type:	`List`[`Tuple`[`Any`, `ndarray`]]
Returns:	A list of tuples of (parameter value, predictions)

fit(X, y)[source]¶

Fit the model according to the given training data.

Parameters:	X (`ndarray`) – Training samples matrix, shape = [n_samples, n_features] y (`ndarray`) – Training targets, shape = [n_samples]
Returns:	The model attribute will now be trained.

classmethod get_cv_parameters()[source]¶

Transform the given parameters into a list of dictonaries that is accepted as param_grid parameter in sklearn.model_selection.GridSearchCV

Return type:	`List`[`Dict`[`str`, `List`[`Any`]]]

static get_grid_score(grid_scores, associated_param=None)[source]¶

Given the return of the grid_search_model() will return the grid scores as a List of the mean test accuracy result.

Parameters:	grid_scores (`DataFrame`) – Return of the `grid_search_model()` associated_param (`Optional`[`str`]) – Optional. The name of the parameter you want to associate to the score. E.g. lexicon as you have grid searched over different lexicons and you want the return to be associated with the lexicon name e.g. [(0.68, ‘MPQA), (0.70, ‘NRC’)]
Return type:	`Union`[`List`[`float`], `List`[`Tuple`[`float`, `str`]]]
Returns:	A list of test scores from the grid search and if associated_param is not None a list of scores and parameter names.

classmethod get_parameters()[source]¶

Transform the given parameters into a dictonary that is accepted as model parameters

Return type:	`Dict`[`str`, `Any`]

static grid_search_model(model, X, y, n_cpus=1, num_folds=5, **kwargs)[source]¶

Given a model class it will perform a Grid Search over the parameters you give to the models bella.models.base.SKLearnModel .get_cv_parameters() function via the keyword arguments. Returns a pandas dataframe representation of the grid search results.

Parameters:	model (`SKLearnModel`) – The class of the model to use not an instance of the model. X (`ndarray`) – Training samples matrix, shape = [n_samples, n_features] y (`ndarray`) – Training targets, shape = [n_samples] n_cpus (`int`) – Number of estimators to fit in parallel. Default 1. num_folds (`int`) – Number of Stratified cross validation folds. Default 5. kwargs – Keyword arguments to give to the models `bella.models.base.SKLearnModel .get_cv_parameters()` function.
Return type:	`DataFrame`
Returns:	Pandas dataframe representation of the grid search results.

static load(load_fp)[source]¶

Loads an instance of this class from a file.

Parameters:	load_fp (`Path`) – File path of the location that the model was saved to.
Return type:	`SKLearnModel`
Returns:	self

model_parameters¶

The parameters that are set in the machine learning model. E.g. Parameter could be the tokeniser used.

Return type:	`Dict`[`str`, `Any`]
Returns:	parameters of the machine learning model

static models_best_parameter(models_kwargs, param_name, param_values, X, y, n_cpus=1, num_folds=5)[source]¶

Given a list of models and their base model arguments, it will find the best parameter value out of the values given for that parameter while keeping the base model arguments constant for each model.

This essentially performs 5 fold cross validation grid search for the one parameter given, across all models given.

Parameters:	models_kwargs (`List`[`Tuple`[`SKLearnModel`, `Dict`[`str`, `Any`]]]) – A list of tuples where each tuple contains a model and the models keyword arguments to give to its get_cv_parameters method. These arguments are the models standard arguments that are not to be changed. param_name (`str`) – Name of the parameter to be changed. This name has to be the name of the keyword argument in the models get_cv_parameters method. param_values (`List`[`Any`]) – The different values to assign to the param_name argument. X (`List`[`Any`]) – The training samples. y (`ndarray`) – The training target samples.
Return type:	`Dict`[`SKLearnModel`, `str`]
Returns:	A dictionary of model and the name of the best parameter.

classmethod normalise_parameter_names(parameter_dict)[source]¶

Converts the output of get_parameters() into a dictionary that can be used as input into get_parameters().

Return type:	`Dict`[`str`, `Any`]
Returns:	A dictonary that can be used as keyword arguments into the `get_parameters()` method

static pipeline()[source]¶

Machine Learning model that is used as the base template for the model attribute.

Return type:	`Pipeline`
Returns:	The template machine learning model

predict(X)[source]¶

Predict class labels for samples in X.

Parameters:	X (`ndarray`) – Test samples matrix, shape = [n_samples, n_features]
Returns:	Predicted class label per sample, shape = [n_samples]
Raises:	ValueError – If the model has not been fitted

probabilities(X)[source]¶

The probability of each class label for all samples in X.

Parameters:	X (`ndarray`) – Test samples matrix, shape = [n_samples, n_features]]
Returns:	Probability of each class label for all samples, shape = [n_samples, n_classes]
Raises:	ValueError – If the model has not been fitted

static save(model, save_fp, compress=0)[source]¶

Given an instance of this class will save it to a file.

Parameters:	model (`SKLearnModel`) – The machine learning model instance to be saved. save_fp (`Path`) – File path of the location that the model is to be saved to. compress (`int`) – Optional (default 0). Level of compression 0 is no compression and 9 is the most compressed. The more compressed the lower the read/write time.
Return type:	`None`
Returns:	Nothing.
Raises:	ValueError – If the model has not been fitted or if the model is not of type `bella.models.base.SKLearn`

bella.models.target module¶

Module contains all of the classes that represent Machine Learning models that are within Vo and Zhang 2015 paper:

bella.models.target.TargetInd – Target Indepdent model
bella.models.target.TargetDepMinus – Target Dependent Minus model
bella.models.target.TargetDep – Target Dependent model
bella.models.target.TargetDepPlus – Target Dependent Plus model

class bella.models.target.TargetDep(word_vectors, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Bases: bella.models.target.TargetInd

Target-dep model from Vo and Zhang 2015 paper.

__init__(word_vectors, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Parameters:

word_vectors (List[WordVectors]) – A list of one or more word vectors to be used as feature vector lookups. If more than one is used the word vectors are concatenated together to create a the feature vector for each word.
tokeniser (Callable[[str], List[str]]) – Tokeniser to be used e.g. str.split()
lower (bool) – Wether to lower case the words
C (float) – The C value for the sklearn.svm.SVC estimator that is used in the pipeline.
random_state (int) – The random_state value for the sklearn.svm.SVC estimator that is used in the pipeline.
scale (Any) – How to scale the data before input into the estimator. If no scaling is to be used set this to None.

Return type:

None

classmethod get_cv_parameters(word_vectors, tokeniser=[<function ark_twokenize>], lower=[True], C=[0.01], random_state=[42], scale=[MinMaxScaler(copy=True, feature_range=(0, 1))])[source]¶

Transform the given parameters into a list of dictonaries that is accepted as param_grid parameter in sklearn.model_selection.GridSearchCV

Parameters:

word_vectors – A list of a list of word vectors e.g. [[SSWE()], [SSWE(), GloveCommonCrawl()]].
tokenisers – A list of tokeniser to be used e.g. str.split(). Default [ark_twokenize]
lowers – A list of bool values which indicate whether to lower case the input words. Default [True]
C – A list of C values for the sklearn.svm.SVC estimator that is used in the pipeline. Default [0.01]
random_state – A list of random_state values for the sklearn.svm.SVC estimator that is used in the pipeline. Default [42]
scale – List of scale values. The list can include sklearn.preprocessing.MinMaxScaler type of clases or None if no scaling is to be used. Default [sklearn.preprocessing.MinMaxScaler]

Returns:

Parameters to explore through cross validation

classmethod get_parameters(word_vectors, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Transform the given parameters into a dictonary that is accepted as model parameters

Parameters:	word_vectors (`List`[`WordVectors`]) – A list of one or more word vectors to be used as feature vector lookups. If more than one is used the word vectors are concatenated together to create a the feature vector for each word. tokeniser (`Callable`[[`str`], `List`[`str`]]) – Tokeniser to be used e.g. `str.split()` lower (`bool`) – Wether to lower case the words C (`float`) – The C value for the `sklearn.svm.SVC` estimator that is used in the pipeline. random_state (`int`) – The random_state value for the `sklearn.svm.SVC` estimator that is used in the pipeline. scale (`Any`) – How to scale the data before input into the estimator. If no scaling is to be used set this to None.
Return type:	`Dict`[`str`, `Any`]
Returns:	Model parameters

classmethod name()[source]¶

Return type:	`str`

static pipeline()[source]¶

Machine Learning model that is used as the base template for the model attribute.

Return type:	`Pipeline`
Returns:	The template machine learning model

class bella.models.target.TargetDepMinus(word_vectors, tokeniser=<function ark_twokenize>, lower=True, C=0.025, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Bases: bella.models.target.TargetInd

__init__(word_vectors, tokeniser=<function ark_twokenize>, lower=True, C=0.025, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Parameters:

word_vectors (List[WordVectors]) – A list of one or more word vectors to be used as feature vector lookups. If more than one is used the word vectors are concatenated together to create a the feature vector for each word.
tokeniser (Callable[[str], List[str]]) – Tokeniser to be used e.g. str.split()
lower (bool) – Wether to lower case the words
C (float) – The C value for the sklearn.svm.SVC estimator that is used in the pipeline.
random_state (int) – The random_state value for the sklearn.svm.SVC estimator that is used in the pipeline.
scale (Any) – How to scale the data before input into the estimator. If no scaling is to be used set this to None.

Return type:

None

classmethod get_cv_parameters(word_vectors, tokeniser=[<function ark_twokenize>], lower=[True], C=[0.025], random_state=[42], scale=[MinMaxScaler(copy=True, feature_range=(0, 1))])[source]¶

Transform the given parameters into a list of dictonaries that is accepted as param_grid parameter in sklearn.model_selection.GridSearchCV

Parameters:

word_vectors – A list of a list of word vectors e.g. [[SSWE()], [SSWE(), GloveCommonCrawl()]].
tokenisers – A list of tokeniser to be used e.g. str.split(). Default [ark_twokenize]
lowers – A list of bool values which indicate whether to lower case the input words. Default [True]
C – A list of C values for the sklearn.svm.SVC estimator that is used in the pipeline. Default [0.025]
random_state – A list of random_state values for the sklearn.svm.SVC estimator that is used in the pipeline. Default [42]
scale – List of scale values. The list can include sklearn.preprocessing.MinMaxScaler type of clases or None if no scaling is to be used. Default [sklearn.preprocessing.MinMaxScaler]

Returns:

Parameters to explore through cross validation

classmethod get_parameters(word_vectors, tokeniser=<function ark_twokenize>, lower=True, C=0.025, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Transform the given parameters into a dictonary that is accepted as model parameters

Parameters:	word_vectors (`List`[`WordVectors`]) – A list of one or more word vectors to be used as feature vector lookups. If more than one is used the word vectors are concatenated together to create a the feature vector for each word. tokeniser (`Callable`[[`str`], `List`[`str`]]) – Tokeniser to be used e.g. `str.split()` lower (`bool`) – Wether to lower case the words C (`float`) – The C value for the `sklearn.svm.SVC` estimator that is used in the pipeline. random_state (`int`) – The random_state value for the `sklearn.svm.SVC` estimator that is used in the pipeline. scale (`Any`) – How to scale the data before input into the estimator. If no scaling is to be used set this to None.
Return type:	`Dict`[`str`, `Any`]
Returns:	Model parameters

classmethod name()[source]¶

Return type:	`str`

static pipeline()[source]¶

Machine Learning model that is used as the base template for the model attribute.

Return type:	`Pipeline`
Returns:	The template machine learning model

class bella.models.target.TargetDepPlus(word_vectors, senti_lexicon, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Bases: bella.models.target.TargetInd

__init__(word_vectors, senti_lexicon, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Parameters:

word_vectors (List[WordVectors]) – A list of one or more word vectors to be used as feature vector lookups. If more than one is used the word vectors are concatenated together to create a the feature vector for each word.
senti_lexicon (Lexicon) – Sentiment Lexicon to be used for the Left and Right sentiment context (LS and RS).
tokeniser (Callable[[str], List[str]]) – Tokeniser to be used e.g. str.split()
lower (bool) – Whether to lower case the words
C (float) – The C value for the sklearn.svm.SVC estimator that is used in the pipeline.
random_state (int) – The random_state value for the sklearn.svm.SVC estimator that is used in the pipeline.
scale (Any) – How to scale the data before input into the estimator. If no scaling is to be used set this to None.

Return type:

None

classmethod get_cv_parameters(word_vectors, senti_lexicon, tokeniser=[<function ark_twokenize>], lower=[True], C=[0.01], random_state=[42], scale=[MinMaxScaler(copy=True, feature_range=(0, 1))])[source]¶

Transform the given parameters into a list of dictonaries that is accepted as param_grid parameter in sklearn.model_selection.GridSearchCV

Parameters:

word_vectors (List[List[WordVectors]]) – A list of a list of word vectors e.g. [[SSWE()], [SSWE(), GloveCommonCrawl()]].
senti_lexicon (List[Lexicon]) – A list of Sentiment Lexicons to be explored for the Left and Right sentiment context (LS and RS). Default None, use the sentiment lexicons already within the model.
tokenisers – A list of tokeniser to be used e.g. str.split(). Default [ark_twokenize]
lowers – A list of bool values which indicate whether to lower case the input words. Default [True]
C – A list of C values for the sklearn.svm.SVC estimator that is used in the pipeline. Default [0.01]
random_state – A list of random_state values for the sklearn.svm.SVC estimator that is used in the pipeline. Default [42]
scale – List of scale values. The list can include sklearn.preprocessing.MinMaxScaler type of clases or None if no scaling is to be used. Default [sklearn.preprocessing.MinMaxScaler]

Returns:

Parameters to explore through cross validation

classmethod get_parameters(word_vectors, senti_lexicon, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Transform the given parameters into a dictonary that is accepted as model parameters

Parameters:	word_vectors (`List`[`WordVectors`]) – A list of one or more word vectors to be used as feature vector lookups. If more than one is used the word vectors are concatenated together to create a the feature vector for each word. senti_lexicon (`Lexicon`) – Sentiment Lexicon to be used for the Left and Right sentiment context (LS and RS). tokeniser (`Callable`[[`str`], `List`[`str`]]) – Tokeniser to be used e.g. `str.split()` lower (`bool`) – Whether to lower case the words C (`float`) – The C value for the `sklearn.svm.SVC` estimator that is used in the pipeline. random_state (`int`) – The random_state value for the `sklearn.svm.SVC` estimator that is used in the pipeline. scale (`Any`) – How to scale the data before input into the estimator. If no scaling is to be used set this to None.
Return type:	`Dict`[`str`, `Any`]
Returns:	Model parameters

classmethod name()[source]¶

Return type:	`str`

classmethod normalise_parameter_names(parameter_dict)[source]¶

Converts the output of get_parameters() into a dictionary that can be used as input into get_parameters().

Return type:	`Dict`[`str`, `Any`]
Returns:	A dictonary that can be used as keyword arguments into the `get_parameters()` method

static pipeline()[source]¶

Machine Learning model that is used as the base template for the model attribute.

Return type:	`Pipeline`
Returns:	The template machine learning model

class bella.models.target.TargetInd(word_vectors, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Bases: bella.models.base.SKLearnModel

Attributes:

model – Machine learning model. Expects it to be a sklearn.pipeline.Pipeline instance.
fitted – If the machine learning model has been fitted (default False)
model_parameters – The parameters that are set in the machine learning model. E.g. Parameter could be the tokeniser used.

Methods:

fit – Fit the model according to the given training data.
predict – Predict class labels for samples in X.
probabilities – The probability of each class label for all samples in X.
__repr__ – Name of the machine learning model.

Class Methods:

get_parameters – Transform the given parameters into a dictonary that is accepted as model parameters.
get_cv_parameters – Transform the given parameters into a list of dictonaries that is accepted as param_grid parameter in sklearn.model_selection.GridSearchCV
name – – Returns the name of the model.

Functions:

save – Given a instance of this class will save it to a file.
load – Loads an instance of this class from a file.
pipeline – Machine Learning model that is used as the base template for the model attribute.

__init__(word_vectors, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Parameters:

word_vectors (List[WordVectors]) – A list of one or more word vectors to be used as feature vector lookups. If more than one is used the word vectors are concatenated together to create a the feature vector for each word.
tokeniser (Callable[[str], List[str]]) – Tokeniser to be used e.g. str.split()
lower (bool) – Whether to lower case the words
C (float) – The C value for the sklearn.svm.SVC estimator that is used in the pipeline.
random_state (int) – The random_state value for the sklearn.svm.SVC estimator that is used in the pipeline.
scale (Any) – How to scale the data before input into the estimator. If no scaling is to be used set this to None.

Return type:

None

classmethod get_cv_parameters(word_vectors, tokeniser=[<function ark_twokenize>], lower=[True], C=[0.01], random_state=[42], scale=[MinMaxScaler(copy=True, feature_range=(0, 1))])[source]¶

Transform the given parameters into a list of dictonaries that is accepted as param_grid parameter in sklearn.model_selection.GridSearchCV

Parameters:

word_vectors – A list of a list of word vectors e.g. [[SSWE()], [SSWE(), GloveCommonCrawl()]].
tokenisers – A list of tokeniser to be used e.g. str.split(). Default [ark_twokenize]
lowers – A list of bool values which indicate whether to lower case the input words. Default [True]
C – A list of C values for the sklearn.svm.SVC estimator that is used in the pipeline. Default [0.01]
random_state – A list of random_state values for the sklearn.svm.SVC estimator that is used in the pipeline. Default [42]
scale – List of scale values. The list can include sklearn.preprocessing.MinMaxScaler type of clases or None if no scaling is to be used. Default [sklearn.preprocessing.MinMaxScaler]

Returns:

Parameters to explore through cross validation

classmethod get_parameters(word_vectors, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Transform the given parameters into a dictonary that is accepted as model parameters

Parameters:	word_vectors (`List`[`WordVectors`]) – A list of one or more word vectors to be used as feature vector lookups. If more than one is used the word vectors are concatenated together to create a the feature vector for each word. tokeniser (`Callable`[[`str`], `List`[`str`]]) – Tokeniser to be used e.g. `str.split()` lower (`bool`) – Whether to lower case the words C (`float`) – The C value for the `sklearn.svm.SVC` estimator that is used in the pipeline. random_state (`int`) – The random_state value for the `sklearn.svm.SVC` estimator that is used in the pipeline. scale (`Any`) – How to scale the data before input into the estimator. If no scaling is to be used set this to None.
Return type:	`Dict`[`str`, `Any`]
Returns:	Model parameters

classmethod name()[source]¶

Return type:	`str`

classmethod normalise_parameter_names(parameter_dict)[source]¶

Converts the output of get_parameters() into a dictionary that can be used as input into get_parameters().

Return type:	`Dict`[`str`, `Any`]
Returns:	A dictonary that can be used as keyword arguments into the `get_parameters()` method

static pipeline()[source]¶

Machine Learning model that is used as the base template for the model attribute.

Return type:	`Pipeline`
Returns:	The template machine learning model

bella.models.tdlstm module¶

Module contains all of the classes that represent Machine Learning models that are within Tang et al. 2016 paper:

bella.models.tdlstm.LSTM – LSTM model.
bella.models.tdlstm.TDLSTM – TDLSTM model.
bella.models.tdlstm.TCLSTM – TCLSTM model.

class bella.models.tdlstm.LSTM(tokeniser, embeddings, reproducible=None, pad_size=-1, lower=True, patience=10, batch_size=32, epochs=300, embedding_layer_kwargs=None, lstm_layer_kwargs=None, dense_layer_kwargs=None, optimiser=<class 'tensorflow.python.keras.optimizer_v2.gradient_descent.SGD'>, optimiser_params=None)[source]¶

Bases: bella.models.base.KerasModel

Attributes:

pad_size – The max number of tokens to use per sequence. If -1 use the text sequence in the training data that has the most tokens as the pad size.
embedding_layer_kwargs – Keyword arguments to pass to the embedding layer which is a keras.layers.Embedding object. Can be None if no parameters to pass.
lstm_layer_kwargs – Keyword arguments to pass to the lstm layer(s) which is a keras.layers.LSTM object. Can be None if no parameters to pass.
dense_layer_kwargs – Keyword arguments to pass to the dense (final layer) which is a keras.layers.Dense object. Can be None if no parameters to pass.

Methods:

model_parameters – Returns a dictionary containing the attributes of the class instance, the parameters to give to the class constructior to re-create this instance, and the class itself.
create_training_text – Converts the training and validation data into a format that the keras model can take as input.
create_training_y – Converts the training and validation target values from a vector of class lables into a matrix of binary values. of shape [n_samples, n_classes].
keras_model – The model that represents this class. This is a single forward LSTM.

__init__(tokeniser, embeddings, reproducible=None, pad_size=-1, lower=True, patience=10, batch_size=32, epochs=300, embedding_layer_kwargs=None, lstm_layer_kwargs=None, dense_layer_kwargs=None, optimiser=<class 'tensorflow.python.keras.optimizer_v2.gradient_descent.SGD'>, optimiser_params=None)[source]¶

Parameters:

tokeniser (Callable[[str], List[str]]) – Tokeniser to be used e.g. str.split().
embeddings (WordVectors) – Embedding (Word vectors) to be used e.g. bella.word_vectors.SSWE
reproducible (Optional[int]) – Whether to be reproducible. If None then it is quicker to run. Else provide a int that will represent the random seed value.
pad_size (int) – The max number of tokens to use per sequence. If -1 use the text sequence in the training data that has the most tokens as the pad size.
lower (bool) – Whether to lower case the words being processed.
patience (int) – Number of epochs with no improvement before training is stopped.
batch_size (int) – Number of samples per gradient update.
epochs (int) – Number of times to train over the entire training set before stopping. If patience is set, then it may stop before reaching the number of epochs specified here.
embedding_layer_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass to the embedding layer which is a keras.layers.Embedding object. If no parameters to pass leave as None.
lstm_layer_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass to the lstm layer(s) which is a keras.layers.LSTM object. If no parameters to pass leave as None.
dense_layer_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass to the dense (final layer) which is a keras.layers.Dense object. If no parameters to pass leave as None.
optimiser (OptimizerV2) – Optimiser to be used accepts any keras optimiser. Default is keras.optimizers.SGD
optimiser_params (Optional[Dict[str, Any]]) – Parameters for the optimiser. If None uses default optimiser parameters.

Return type:

None

create_training_text(train_data, validation_data)[source]¶

Converts the training and validation data into a format that the keras model can take as input.

Parameters:	train_data (`List`[`Dict`[`str`, `str`]]) – Data to be trained on. Which is a list of dictionaries where each dictionary has a text field containing text. validation_data (`List`[`Dict`[`str`, `str`]]) – Data to evaluate the model at training time. Which is a list of dictionaries where each dictionary has a text field containing text.
Return type:	`Tuple`[`ndarray`, `ndarray`]
Returns:	A tuple of length two containing the train and validation input that are both the output of `_pre_process()`

create_training_y(train_y, validation_y)[source]¶

Converts the training and validation target values from a vector of class lables into a matrix of binary values of shape [n_samples, n_classes].

To convert the vector of classes to a matrix we the keras.utils.to_categorical() function.

Parameters:	train_y (`ndarray`) – Vector of class labels, shape = [n_samples] validation_y (`ndarray`) – Vector of class labels, shape = [n_samples]
Return type:	`Tuple`[`ndarray`, `ndarray`]
Returns:	A tuple of length two containing the train and validation matrices respectively. The shape of each matrix is: [n_samples, n_classes]

dense_layer_kwargs¶

dense_layer_kwargs attribute

Return type:	`Dict`[`str`, `Any`]
Returns:	The dense_layer_kwargs used in the model

embedding_layer_kwargs¶

embedding_layer_kwargs attribute

Return type:	`Dict`[`str`, `Any`]
Returns:	The embedding_layer_kwargs used in the model

keras_model(num_classes)[source]¶

The model that represents this class. This is a single forward LSTM.

Parameters:	num_classes (`int`) – Number of classes to predict.
Return type:	`Model`
Returns:	Forward LSTM keras model.

lstm_layer_kwargs¶

lstm_layer_kwargs attribute

Return type:	`Dict`[`str`, `Any`]
Returns:	The lstm_layer_kwargs used in the model

model_parameters()[source]¶

Returns a dictionary containing the attributes of the class instance, the parameters to give to the class constructior to re-create this instance, and the class itself.

This is used by the save() method so that the instance can be re-created when loaded by the load() method.

Return type:	`Dict`[`str`, `Any`]

classmethod name()[source]¶

Return type:	`str`

pad_size¶

pad_size attribute

Return type:	`int`
Returns:	The pad_size used in the model

class bella.models.tdlstm.TCLSTM(tokeniser, embeddings, reproducible=None, pad_size=-1, lower=True, patience=10, batch_size=32, epochs=300, embedding_layer_kwargs=None, lstm_layer_kwargs=None, dense_layer_kwargs=None, optimiser=<class 'tensorflow.python.keras.optimizer_v2.gradient_descent.SGD'>, optimiser_params=None, include_target=True)[source]¶

Bases: bella.models.tdlstm.TDLSTM

create_training_text(train_data, validation_data)[source]¶

Converts the training and validation data into a format that the keras model can take as input.

Parameters:	train_data (`List`[`Dict`[`str`, `Any`]]) – See `bella.models.tdlstm. TDLSTM.create_training_text()` train_data parameter. validation_data (`List`[`Dict`[`str`, `Any`]]) – See `bella.models.tdlstm. TDLSTM.create_training_text()` validation_data parameter.
Return type:	`Tuple`[`List`[`ndarray`], `List`[`ndarray`]]
Returns:	A tuple of length two containing the train and validation input that are both the output of `_pre_process()`

keras_model(num_classes)[source]¶

The model that represents this class. This is the same as the bella.models.tdlstm.TDLSTM.keras_model() model, however the words in before inputting into the LSTM are concatenated with the word embedding of the target. If the target is more than one word then the word embedding of the target is the average (median in our case) embeddings of the target words.

Parameters:	num_classes (`int`) – Number of classes to predict.
Return type:	`Model`
Returns:	Two LSTMs one forward from the left context and the other backward from the right context taking into account the target vector embedding.

classmethod name()[source]¶

Return type:	`str`

class bella.models.tdlstm.TDLSTM(tokeniser, embeddings, reproducible=None, pad_size=-1, lower=True, patience=10, batch_size=32, epochs=300, embedding_layer_kwargs=None, lstm_layer_kwargs=None, dense_layer_kwargs=None, optimiser=<class 'tensorflow.python.keras.optimizer_v2.gradient_descent.SGD'>, optimiser_params=None, include_target=True)[source]¶

Bases: bella.models.tdlstm.LSTM

Attributes:

include_target – Wheather to include the target in the LSTM representations.

__init__(tokeniser, embeddings, reproducible=None, pad_size=-1, lower=True, patience=10, batch_size=32, epochs=300, embedding_layer_kwargs=None, lstm_layer_kwargs=None, dense_layer_kwargs=None, optimiser=<class 'tensorflow.python.keras.optimizer_v2.gradient_descent.SGD'>, optimiser_params=None, include_target=True)[source]¶

Parameters:

tokeniser (Callable[[str], List[str]]) – Tokeniser to be used e.g. str.split().
embeddings (WordVectors) – Embedding (Word vectors) to be used e.g. bella.word_vectors.SSWE
reproducible (Optional[int]) – Whether to be reproducible. If None then it is but quicker to run. Else provide a int that will represent the random seed value.
pad_size (int) – The max number of tokens to use per sequence. If -1 use the text sequence in the training data that has the most tokens as the pad size.
lower (bool) – Whether to lower case the words being processed.
patience (int) – Number of epochs with no improvement before training is stopped.
batch_size (int) – Number of samples per gradient update.
epochs (int) – Number of times to train over the entire training set before stopping. If patience is set, then it may stop before reaching the number of epochs specified here.
embedding_layer_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass to the embedding layer which is a keras.layers.Embedding object. If no parameters to pass leave as None.
lstm_layer_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass to the lstm layer(s) which is a keras.layers.LSTM object. If no parameters to pass leave as None.
dense_layer_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass to the dense (final layer) which is a keras.layers.Dense object. If no parameters to pass leave as None.
optimiser (OptimizerV2) –
Optimiser to be used accepts any keras optimiser. Default is keras.optimizers.SGD
optimiser_params (Optional[Dict[str, Any]]) – Parameters for the optimiser. If None uses default optimiser parameters.
include_target (bool) – Wheather to include the target in the LSTM representations.

Return type:

None

create_training_text(train_data, validation_data)[source]¶

Converts the training and validation data into a format that the keras model can take as input.

Parameters:	train_data (`List`[`Dict`[`str`, `Any`]]) – Data to be trained on. Which is a list of dictionaries where each dictionary has a text field containing text and a field spans containing a list of Tuples where each Tuple represents a occurence of the Target, each Tuple contains the index of the starting and ending character index (Expects the List to be of size 1 as there should be only one target per target sample. This case is not True for the Dong et al. dataset therefore it only takes the first target instance in the sentence as the target). validation_data (`List`[`Dict`[`str`, `Any`]]) – Data to evaluate the model at training time. Expects the same data as the train_data parameter.
Return type:	`Tuple`[`List`[`ndarray`], `List`[`ndarray`]]
Returns:	A tuple of length two containing the train and validation input that are both the output of `_pre_process()`

include_target¶

include_target attribute

Return type:	`bool`
Returns:	The include_target used in the model

keras_model(num_classes)[source]¶

The model that represents this class. This is a custom combination of two LSTMs.

Parameters:	num_classes (`int`) – Number of classes to predict.
Return type:	`Model`
Returns:	Two LSTMs, one forward from the left context and the other backward from the right context. The output of the two are concatenated and are input to the output layer.

model_parameters()[source]¶

Returns a dictionary containing the attributes of the class instance, the parameters to give to the class constructior to re-create this instance, and the class itself.

This is used by the save() method so that the instance can be re-created when loaded by the load() method.

Return type:	`Dict`[`str`, `Any`]

classmethod name()[source]¶

Return type:	`str`

bella.models.tdparse module¶

Module contains all of the classes that represent Machine Learning models that are within Wang et al. paper.

bella.models.target.TDParseMinus – TDParse Minus model
bella.models.target.TDParse – TDParse model
bella.models.tdparse.TDParsePlus – TDParse Plus model

class bella.models.tdparse.TDParse(word_vectors, parser, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Bases: bella.models.tdparse.TDParseMinus

__init__(word_vectors, parser, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Parameters:

word_vectors (List[WordVectors]) – A list of one or more word vectors to be used as feature vector lookups. If more than one is used the word vectors are concatenated together to create a the feature vector for each word.
parser (Any) – The dependency parser to be used.
tokeniser (Callable[[str], List[str]]) – Tokeniser to be used e.g. str.split()
lower (bool) – Whether to lower case the words
C (float) – The C value for the sklearn.svm.SVC estimator that is used in the pipeline.
random_state (int) – The random_state value for the sklearn.svm.SVC estimator that is used in the pipeline.
scale (Any) – How to scale the data before input into the estimator. If no scaling is to be used set this to None.

Return type:

None

classmethod name()[source]¶

Return type:	`str`

static pipeline()[source]¶

Machine Learning model that is used as the base template for the model attribute.

Return type:	`Pipeline`
Returns:	The template machine learning model

class bella.models.tdparse.TDParseMinus(word_vectors, parser, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Bases: bella.models.target.TargetInd

__init__(word_vectors, parser, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Parameters:

word_vectors (List[WordVectors]) – A list of one or more word vectors to be used as feature vector lookups. If more than one is used the word vectors are concatenated together to create a the feature vector for each word.
parser (Any) – The dependency parser to be used.
tokeniser (Callable[[str], List[str]]) – Tokeniser to be used e.g. str.split()
lower (bool) – Whether to lower case the words
C (float) – The C value for the sklearn.svm.SVC estimator that is used in the pipeline.
random_state (int) – The random_state value for the sklearn.svm.SVC estimator that is used in the pipeline.
scale (Any) – How to scale the data before input into the estimator. If no scaling is to be used set this to None.

Return type:

None

classmethod get_cv_parameters(word_vectors, parser, tokeniser=[<function ark_twokenize>], lower=[True], C=[0.01], random_state=[42], scale=[MinMaxScaler(copy=True, feature_range=(0, 1))])[source]¶

Transform the given parameters into a list of dictonaries that is accepted as param_grid parameter in sklearn.model_selection.GridSearchCV

Parameters:

word_vectors (List[List[WordVectors]]) – A list of a list of word vectors e.g. [[SSWE()], [SSWE(), GloveCommonCrawl()]].
parser (List[Any]) – A list of dependency parser to be used.
tokenisers – A list of tokeniser to be used e.g. str.split(). Default [ark_twokenize]
lowers – A list of bool values which indicate whether to lower case the input words. Default [True]
C – A list of C values for the sklearn.svm.SVC estimator that is used in the pipeline. Default [0.01]
random_state – A list of random_state values for the sklearn.svm.SVC estimator that is used in the pipeline. Default [42]
scale – List of scale values. The list can include sklearn.preprocessing.MinMaxScaler type of clases or None if no scaling is to be used. Default [sklearn.preprocessing.MinMaxScaler]

Returns:

Parameters to explore through cross validation

classmethod get_parameters(word_vectors, parser, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Transform the given parameters into a dictonary that is accepted as model parameters

Parameters:	word_vectors (`List`[`WordVectors`]) – A list of one or more word vectors to be used as feature vector lookups. If more than one is used the word vectors are concatenated together to create a the feature vector for each word. parser (`Any`) – The dependency parser to be used. tokeniser (`Callable`[[`str`], `List`[`str`]]) – Tokeniser to be used e.g. `str.split()` lower (`bool`) – Whether to lower case the words C (`float`) – The C value for the `sklearn.svm.SVC` estimator that is used in the pipeline. random_state (`int`) – The random_state value for the `sklearn.svm.SVC` estimator that is used in the pipeline. scale (`Any`) – How to scale the data before input into the estimator. If no scaling is to be used set this to None.
Return type:	`Dict`[`str`, `Any`]
Returns:	Model parameters

classmethod name()[source]¶

Return type:	`str`

classmethod normalise_parameter_names(parameter_dict)[source]¶

Converts the output of get_parameters() into a dictionary that can be used as input into get_parameters().

Return type:	`Dict`[`str`, `Any`]
Returns:	A dictonary that can be used as keyword arguments into the `get_parameters()` method

static pipeline()[source]¶

Machine Learning model that is used as the base template for the model attribute.

Return type:	`Pipeline`
Returns:	The template machine learning model

class bella.models.tdparse.TDParsePlus(word_vectors, parser, senti_lexicon, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Bases: bella.models.tdparse.TDParseMinus

__init__(word_vectors, parser, senti_lexicon, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Parameters:

word_vectors (List[WordVectors]) – A list of one or more word vectors to be used as feature vector lookups. If more than one is used the word vectors are concatenated together to create a the feature vector for each word.
parser (Any) – The dependency parser to be used.
senti_lexicon (Lexicon) – Sentiment Lexicon to be used for the Left and Right sentiment context (LS and RS).
tokeniser (Callable[[str], List[str]]) – Tokeniser to be used e.g. str.split()
lower (bool) – Whether to lower case the words
C (float) – The C value for the sklearn.svm.SVC estimator that is used in the pipeline.
random_state (int) – The random_state value for the sklearn.svm.SVC estimator that is used in the pipeline.
scale (Any) – How to scale the data before input into the estimator. If no scaling is to be used set this to None.

Return type:

None

classmethod get_cv_parameters(word_vectors, parser, senti_lexicon, tokeniser=[<function ark_twokenize>], lower=[True], C=[0.01], random_state=[42], scale=[MinMaxScaler(copy=True, feature_range=(0, 1))])[source]¶

Transform the given parameters into a list of dictonaries that is accepted as param_grid parameter in sklearn.model_selection.GridSearchCV

Parameters:

word_vectors (List[List[WordVectors]]) – A list of a list of word vectors e.g. [[SSWE()], [SSWE(), GloveCommonCrawl()]].
parser (List[Any]) – A list of dependency parser to be used.
senti_lexicon (List[Lexicon]) – A list of Sentiment Lexicons to be explored for the Left and Right sentiment context (LS and RS).
tokenisers – A list of tokeniser to be used e.g. str.split(). Default [ark_twokenize]
lowers – A list of bool values which indicate whether to lower case the input words. Default [True]
C – A list of C values for the sklearn.svm.SVC estimator that is used in the pipeline. Default [0.01]
random_state – A list of random_state values for the sklearn.svm.SVC estimator that is used in the pipeline. Default [42]
scale – List of scale values. The list can include sklearn.preprocessing.MinMaxScaler type of clases or None if no scaling is to be used. Default [sklearn.preprocessing.MinMaxScaler]

Returns:

Parameters to explore through cross validation

classmethod get_parameters(word_vectors, parser, senti_lexicon, tokeniser=<function ark_twokenize>, lower=True, C=0.01, random_state=42, scale=MinMaxScaler(copy=True, feature_range=(0, 1)))[source]¶

Transform the given parameters into a dictonary that is accepted as model parameters

Parameters:	word_vectors (`List`[`WordVectors`]) – A list of one or more word vectors to be used as feature vector lookups. If more than one is used the word vectors are concatenated together to create a the feature vector for each word. parser (`Any`) – The dependency parser to be used. senti_lexicon (`Lexicon`) – Sentiment Lexicon to be used for the Left and Right sentiment context (LS and RS). tokeniser (`Callable`[[`str`], `List`[`str`]]) – Tokeniser to be used e.g. `str.split()` lower (`bool`) – Whether to lower case the words C (`float`) – The C value for the `sklearn.svm.SVC` estimator that is used in the pipeline. random_state (`int`) – The random_state value for the `sklearn.svm.SVC` estimator that is used in the pipeline. scale (`Any`) – How to scale the data before input into the estimator. If no scaling is to be used set this to None.
Return type:	`Dict`[`str`, `Any`]
Returns:	Model parameters

classmethod name()[source]¶

Return type:	`str`

classmethod normalise_parameter_names(parameter_dict)[source]¶

Converts the output of get_parameters() into a dictionary that can be used as input into get_parameters().

Return type:	`Dict`[`str`, `Any`]
Returns:	A dictonary that can be used as keyword arguments into the `get_parameters()` method

static pipeline()[source]¶

Machine Learning model that is used as the base template for the model attribute.

Return type:	`Pipeline`
Returns:	The template machine learning model

bella.models package¶

Submodules¶

bella.models.base module¶

bella.models.target module¶

bella.models.tdlstm module¶

bella.models.tdparse module¶

Module contents¶