bella.scikit_features package¶
Submodules¶
bella.scikit_features.context module¶
Module contains a Class that is a scikit learn Transformer.
-
class
bella.scikit_features.context.
Context
(context='left', inc_target=False)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
-
transform
(target_dicts)[source]¶ Given a list of target dictionaries containing the spans of the targets and the texts that are about the targets it returns the relevant left, right and target contexts with respect to the target word(s). Returns a list of contexts.
Parameters: target_dicts (list) – list of dictionaries containing at least spans and text keys. Returns: a list of left, right and target contexts with respect to the target word and the values in the self.context if self.context = ‘lt’ will only return the left and target contexts and not right. Return type: list
-
bella.scikit_features.debug module¶
SKLearn Transformer
Classes:
- Debug – Allows you to debug previous transformers
-
class
bella.scikit_features.debug.
Debug
[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
SKLearn transformer that is to be used a debugger between different SKLearn transformers.
Methods:
- fit - Does nothing as nothing is done at fit time.
- fit_transform - Performs the transform method.
- transform - Creates a REPL terminal to inspect the input transformed data
bella.scikit_features.join_context_vectors module¶
-
class
bella.scikit_features.join_context_vectors.
JoinContextVectors
(pool_func=<function matrix_median>)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
bella.scikit_features.lexicon_filter module¶
-
class
bella.scikit_features.lexicon_filter.
LexiconFilter
(lexicon=None, zero_token='$$$ZERO_TOKEN$$$')[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
bella.scikit_features.neural_pooling module¶
-
class
bella.scikit_features.neural_pooling.
NeuralPooling
(pool_func=<function matrix_max>)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
bella.scikit_features.syntactic_context module¶
Module contains a Class that is a scikit learn Transformer.
Classes:
- SyntacticContext - Converts a list of dictionaries containg text, targets, and target spans into text contexts defined the targets dependency tree. Returns a list of a list of dictionaries containing text and span.
- DependencyChildContext - Simialr to SyntacticContext but returns a list of a list of Strings instead of dicts. Each String represents a targets children dependency relations.
3. Context - Given the output from SyntacticContext returns the left, right, target or full text contexts. Where left and right is with respect to the target word the text is about.
-
class
bella.scikit_features.syntactic_context.
Context
(context='left', inc_target=False)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Given the output from SyntacticContext returns the left, right, target or full text contexts. Where left and right is with respect to the target word the text is about.
Attributes:
- context - Defines the text context that is returned
- inc_target - Whether to include the target word in the context
Methods:
- fit - Does nothing as nothing is done at fit time.
- fit_transform - Performs the transform method.
- transform - Converts a list of a list of dictionaries into a list of a list of Strings where each String represents a the targets context.
-
__init__
(context='left', inc_target=False)[source]¶ Parameters: - context (String. Can only be left, right, target, or full . Default left) – left, right, target or full context will be returned with respect to the target word in the target sentence.
- inc_target (bool. Default False) – Whether to include the target word in the text context.
-
transform
(target_contexts)[source]¶ Given a list of of a list of dicts where each dict contains
Parameters: target_dicts (list) – list of a list of dictionaries Returns: A list of a list of Strings where each String represents a different context as each target can have many targets within a text thus multiple contexts. The context it returns depends on self.context Return type: list
-
class
bella.scikit_features.syntactic_context.
DependencyChildContext
(parser=<function tweebo>, lower=False, rel_depth=(1, 1))[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Simialr to SyntacticContext but returns a list of a list of Strings instead of dicts. Each String represents a targets children dependency relations.
Attributes:
- parser - dependency parser to use.
- lower - whether or not the parser should process the text in lower or case or not
- rel_depth - The depth of the child relations (1, 1) = 1st relations
Methods:
- fit - Does nothing as nothing is done at fit time.
- fit_transform - Performs the transform method.
- transform - Converts the list of dicts into a list of a list of Strings which are the target words child relations within the targets dependency tree.
-
__init__
(parser=<function tweebo>, lower=False, rel_depth=(1, 1))[source]¶ For more information on what the function does see the following functions documentation: dependency_relation_context
Parameters: - parser (function. Default tweebo) – Dependency parser to use.
- lower (bool. Default False) – Whether to lower case the words before going through the dependency parser.
- rel_depth (tuple. Default (1, 1)) – Depth of the dependency relations to use as context words
-
transform
(target_dicts)[source]¶ Given a list of target dictionaries it returns a list of a list of Strings where each String is the concatenation of the targets rel_depth child related words where rel_depth determines the number of child dependency related words to include.
Parameters: target_dicts (list) – list of dictionaries Returns: A list of a list of Strings Return type: list
-
class
bella.scikit_features.syntactic_context.
SyntacticContext
(parser=<function tweebo>, lower=False)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Converts a list of dictionaries containg text, targets, and target spans into text contexts defined the targets dependency tree. Returns a list of a list of dictionaries containing text and span.
Attributes:
- parser - dependency parser to use.
- lower - whether or not the parser should process the text in lower or case or not
Methods:
- fit - Does nothing as nothing is done at fit time.
- fit_transform - Performs the transform method.
- transform - Converts the list of dicts into a list of a list of dicts where each each dict contains the target text and span where the text is the targets full dependency tree word context.
-
__init__
(parser=<function tweebo>, lower=False)[source]¶ For more information on what the function does see the following functions documentation: dependency_context
Parameters: - parser (function. Default tweebo) – Dependency parser to use.
- lower (bool. Default False) – Whether to lower case the words before going through the dependency parser.
-
transform
(target_dicts)[source]¶ Given a list of target dictionaries it returns the syntactic context of each target therefore returns a list of a list of dicts where each dict represents a targets syntactic context within the associated text. The syntactic context depends on the self.parser function.
Parameters: target_dicts (list) – list of dictionaries Returns: A list of a list of dicts Return type: list
bella.scikit_features.tokeniser module¶
Module contains a Class that is a scikit learn Transformer.
Classes:
1. ContextTokeniser - Converts a list of String lists into token lists. See the transformer method of the class for more details.
-
class
bella.scikit_features.tokeniser.
ContextTokeniser
(tokeniser=<function whitespace>, lower=False)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Scikit learn transformer class. Converts list of String lists into tokens.
Attributes:
1. self.tokeniser - tokeniser function. Given a String returns a list of Strings. Default whitespace tokeniser. 2. self.lower - whether to lower case the tokens. Default False.
See
bella.tokenisers()
for more tokeniser functions that can be used here or create your own function.-
transform
(target_contexts)[source]¶ Given a list of String lists where each String represents a context per target span it returns those Strings as a list of Strings (tokens).
Parameters: target_contexts (list) – A list of String lists e.g. [[‘It was nice this morning’, ‘It was nice this morning but not yesterday morning’], [‘another day’]] where each String is a span context for a target. Returns: A list of Strings (tokens) per span context. e.g. [[[‘It’, ‘was’, ‘nice’, ‘this’, ‘morning’], [‘It’, ‘was’, ‘nice’, ‘this’, ‘morning’, ‘but’, ‘not’, ‘yesterday’, ‘morning’]], [[‘another’, ‘day’]]] :rtype: list
-
bella.scikit_features.word_vector module¶
-
class
bella.scikit_features.word_vector.
ContextWordVectors
(vectors=None, zero_token='$$$ZERO_TOKEN$$$')[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
-
static
list_to_matrix
(word_vector_list)[source]¶ Converts a list of numpy.ndarrays (vectors) into a numpy.ndarray (matrix).
Parameters: word_vector_list (list) – list of numpy.ndarray Returns: a matrix of the numpy.ndarray Return type: numpy.ndarray
-
transform
(contexts_tokens)[source]¶ Given a list of contexts (either right, left or target) which are made up of lists of tokens return the tokens as a word vector matrix.
The word vector matrix is a word vector for each token but instead of storing in a list it is stored in a numpy.ndarray of shape: (length of word vector, number of tokens).
Example of the input [[[‘context’, ‘one’], [‘context’, ‘two’]], [[‘another context’]]]
Parameters: contexts_tokens – A list of data of which each data contains a list of contexts which contains a list of tokens. Returns: The same list but with word vectors as numpy.ndarray instead of tokens which are Strings Return type: list
-
static