bella.scikit_features package

Submodules

bella.scikit_features.context module

Module contains a Class that is a scikit learn Transformer.

class bella.scikit_features.context.Context(context='left', inc_target=False)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

__init__(context='left', inc_target=False)[source]
fit(target_dicts, y=None)[source]

Kept for consistnecy with the TransformerMixin

fit_transform(target_dicts, y=None)[source]

see self.transform

transform(target_dicts)[source]

Given a list of target dictionaries containing the spans of the targets and the texts that are about the targets it returns the relevant left, right and target contexts with respect to the target word(s). Returns a list of contexts.

Parameters:target_dicts (list) – list of dictionaries containing at least spans and text keys.
Returns:a list of left, right and target contexts with respect to the target word and the values in the self.context if self.context = ‘lt’ will only return the left and target contexts and not right.
Return type:list

bella.scikit_features.debug module

SKLearn Transformer

Classes:

  1. Debug – Allows you to debug previous transformers
class bella.scikit_features.debug.Debug[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

SKLearn transformer that is to be used a debugger between different SKLearn transformers.

Methods:

  1. fit - Does nothing as nothing is done at fit time.
  2. fit_transform - Performs the transform method.
  3. transform - Creates a REPL terminal to inspect the input transformed data
__init__()[source]
fit(data, y=None)[source]

Kept for consistnecy with the TransformerMixin

fit_transform(data, y=None)[source]

see self.transform

transform(data)[source]
Parameters:data (array) – Data from the previous transformer in the pipeline
Returns:Nothing. Creates a Python interactive shell to inspect the data from the previous transformation.
Return type:None

bella.scikit_features.join_context_vectors module

class bella.scikit_features.join_context_vectors.JoinContextVectors(pool_func=<function matrix_median>)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

__init__(pool_func=<function matrix_median>)[source]
fit(context_pool_vectors, y=None)[source]

Kept for consistnecy with the TransformerMixin

fit_transform(context_pool_vectors, y=None)[source]

see self.transform

transform(context_pool_vectors)[source]

Given a list of train data which contain a list of numpy.ndarray one for each context. Return a list of train data of numpy.ndarray which are the contexts joined together using one of the pool functions.

bella.scikit_features.lexicon_filter module

class bella.scikit_features.lexicon_filter.LexiconFilter(lexicon=None, zero_token='$$$ZERO_TOKEN$$$')[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

__init__(lexicon=None, zero_token='$$$ZERO_TOKEN$$$')[source]
fit(context_tokens, y=None)[source]

Kept for consistnecy with the TransformerMixin

fit_transform(context_tokens, y=None)[source]

see self.transform

transform(contexts_tokens)[source]

bella.scikit_features.neural_pooling module

class bella.scikit_features.neural_pooling.NeuralPooling(pool_func=<function matrix_max>)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

__init__(pool_func=<function matrix_max>)[source]
fit(context_word_matrixs, y=None)[source]

Kept for consistnecy with the TransformerMixin

fit_transform(context_word_matrixs, y=None)[source]

see self.transform

transform(context_word_matrixs)[source]

bella.scikit_features.syntactic_context module

Module contains a Class that is a scikit learn Transformer.

Classes:

  1. SyntacticContext - Converts a list of dictionaries containg text, targets, and target spans into text contexts defined the targets dependency tree. Returns a list of a list of dictionaries containing text and span.
  2. DependencyChildContext - Simialr to SyntacticContext but returns a list of a list of Strings instead of dicts. Each String represents a targets children dependency relations.

3. Context - Given the output from SyntacticContext returns the left, right, target or full text contexts. Where left and right is with respect to the target word the text is about.

class bella.scikit_features.syntactic_context.Context(context='left', inc_target=False)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Given the output from SyntacticContext returns the left, right, target or full text contexts. Where left and right is with respect to the target word the text is about.

Attributes:

  1. context - Defines the text context that is returned
  2. inc_target - Whether to include the target word in the context

Methods:

  1. fit - Does nothing as nothing is done at fit time.
  2. fit_transform - Performs the transform method.
  3. transform - Converts a list of a list of dictionaries into a list of a list of Strings where each String represents a the targets context.
__init__(context='left', inc_target=False)[source]
Parameters:
  • context (String. Can only be left, right, target, or full . Default left) – left, right, target or full context will be returned with respect to the target word in the target sentence.
  • inc_target (bool. Default False) – Whether to include the target word in the text context.
fit(target_contexts, y=None)[source]

Kept for consistnecy with the TransformerMixin

fit_transform(target_contexts, y=None)[source]

see self.transform

transform(target_contexts)[source]

Given a list of of a list of dicts where each dict contains

Parameters:target_dicts (list) – list of a list of dictionaries
Returns:A list of a list of Strings where each String represents a different context as each target can have many targets within a text thus multiple contexts. The context it returns depends on self.context
Return type:list
class bella.scikit_features.syntactic_context.DependencyChildContext(parser=<function tweebo>, lower=False, rel_depth=(1, 1))[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Simialr to SyntacticContext but returns a list of a list of Strings instead of dicts. Each String represents a targets children dependency relations.

Attributes:

  1. parser - dependency parser to use.
  2. lower - whether or not the parser should process the text in lower or case or not
  3. rel_depth - The depth of the child relations (1, 1) = 1st relations

Methods:

  1. fit - Does nothing as nothing is done at fit time.
  2. fit_transform - Performs the transform method.
  3. transform - Converts the list of dicts into a list of a list of Strings which are the target words child relations within the targets dependency tree.
__init__(parser=<function tweebo>, lower=False, rel_depth=(1, 1))[source]

For more information on what the function does see the following functions documentation: dependency_relation_context

Parameters:
  • parser (function. Default tweebo) – Dependency parser to use.
  • lower (bool. Default False) – Whether to lower case the words before going through the dependency parser.
  • rel_depth (tuple. Default (1, 1)) – Depth of the dependency relations to use as context words
fit(target_dicts, y=None)[source]

Kept for consistnecy with the TransformerMixin

fit_transform(target_dicts, y=None)[source]

see self.transform

transform(target_dicts)[source]

Given a list of target dictionaries it returns a list of a list of Strings where each String is the concatenation of the targets rel_depth child related words where rel_depth determines the number of child dependency related words to include.

Parameters:target_dicts (list) – list of dictionaries
Returns:A list of a list of Strings
Return type:list
class bella.scikit_features.syntactic_context.SyntacticContext(parser=<function tweebo>, lower=False)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Converts a list of dictionaries containg text, targets, and target spans into text contexts defined the targets dependency tree. Returns a list of a list of dictionaries containing text and span.

Attributes:

  1. parser - dependency parser to use.
  2. lower - whether or not the parser should process the text in lower or case or not

Methods:

  1. fit - Does nothing as nothing is done at fit time.
  2. fit_transform - Performs the transform method.
  3. transform - Converts the list of dicts into a list of a list of dicts where each each dict contains the target text and span where the text is the targets full dependency tree word context.
__init__(parser=<function tweebo>, lower=False)[source]

For more information on what the function does see the following functions documentation: dependency_context

Parameters:
  • parser (function. Default tweebo) – Dependency parser to use.
  • lower (bool. Default False) – Whether to lower case the words before going through the dependency parser.
fit(target_dicts, y=None)[source]

Kept for consistnecy with the TransformerMixin

fit_transform(target_dicts, y=None)[source]

see self.transform

transform(target_dicts)[source]

Given a list of target dictionaries it returns the syntactic context of each target therefore returns a list of a list of dicts where each dict represents a targets syntactic context within the associated text. The syntactic context depends on the self.parser function.

Parameters:target_dicts (list) – list of dictionaries
Returns:A list of a list of dicts
Return type:list

bella.scikit_features.tokeniser module

Module contains a Class that is a scikit learn Transformer.

Classes:

1. ContextTokeniser - Converts a list of String lists into token lists. See the transformer method of the class for more details.

class bella.scikit_features.tokeniser.ContextTokeniser(tokeniser=<function whitespace>, lower=False)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Scikit learn transformer class. Converts list of String lists into tokens.

Attributes:

1. self.tokeniser - tokeniser function. Given a String returns a list of Strings. Default whitespace tokeniser. 2. self.lower - whether to lower case the tokens. Default False.

See bella.tokenisers() for more tokeniser functions that can be used here or create your own function.

__init__(tokeniser=<function whitespace>, lower=False)[source]
fit(target_contexts, y=None)[source]

Kept for consistnecy with the TransformerMixin

fit_transform(target_contexts, y=None)[source]

see self.transform

transform(target_contexts)[source]

Given a list of String lists where each String represents a context per target span it returns those Strings as a list of Strings (tokens).

Parameters:target_contexts (list) – A list of String lists e.g. [[‘It was nice this morning’, ‘It was nice this morning but not yesterday morning’], [‘another day’]] where each String is a span context for a target.
Returns:A list of Strings (tokens) per span context. e.g.

[[[‘It’, ‘was’, ‘nice’, ‘this’, ‘morning’], [‘It’, ‘was’, ‘nice’, ‘this’, ‘morning’, ‘but’, ‘not’, ‘yesterday’, ‘morning’]], [[‘another’, ‘day’]]] :rtype: list

bella.scikit_features.word_vector module

class bella.scikit_features.word_vector.ContextWordVectors(vectors=None, zero_token='$$$ZERO_TOKEN$$$')[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

__init__(vectors=None, zero_token='$$$ZERO_TOKEN$$$')[source]
fit(context_tokens, y=None)[source]

Kept for consistnecy with the TransformerMixin

fit_transform(context_tokens, y=None)[source]

see self.transform

static list_to_matrix(word_vector_list)[source]

Converts a list of numpy.ndarrays (vectors) into a numpy.ndarray (matrix).

Parameters:word_vector_list (list) – list of numpy.ndarray
Returns:a matrix of the numpy.ndarray
Return type:numpy.ndarray
transform(contexts_tokens)[source]

Given a list of contexts (either right, left or target) which are made up of lists of tokens return the tokens as a word vector matrix.

The word vector matrix is a word vector for each token but instead of storing in a list it is stored in a numpy.ndarray of shape: (length of word vector, number of tokens).

Example of the input [[[‘context’, ‘one’], [‘context’, ‘two’]], [[‘another context’]]]

Parameters:contexts_tokens – A list of data of which each data contains a list of contexts which contains a list of tokens.
Returns:The same list but with word vectors as numpy.ndarray instead of tokens which are Strings
Return type:list

Module contents