automl.metalearning.database package¶
Submodules¶
automl.metalearning.database.configurations_parsing module¶
Provide functions, classes and private variables to build configurations.
We define a configuration as a set of classifiers / imputation-methods / rescalers / preprocessors for a given problem. The set of these configurations can be understood as The Search Space for a problem.
-
class
automl.metalearning.database.configurations_parsing.
ConfigurationBuilder
(model_row=None)¶ Bases:
object
Build a configuration: model/pre-processor/encoder/scaler/imputation.
A set of configurations can be used to feed TPOT with.
Parameters: model_row (pandas.Series or pandas.DataFrame) – Represents a row coming from a configurations.csv file. Defaults to None -
model_row
¶ Represents a row coming from a configurations.csv file.
Type: pandas.Series or pandas.DataFrame
Raises: TypeError
– If no pandas Series or DataFrame is passed as argument.-
build_configuration
()¶ Build a ML Suggestion with the row passed at instaciation time.
Returns: - A suggestion with classifiers, pre-processors,
- scalers, encoders and imputation methods.
Return type: MLSuggestion
-
-
class
automl.metalearning.database.configurations_parsing.
MLSuggestion
(classifiers=None, rescalers=None, preprocessors=None, encoders=None, imputations=None)¶ Bases:
object
Class that represents a MLSuggestion.
A Machine Learning suggestion can (in the biggest picture we consider) contain imputation methods, encoders, pre-processors, rescalers and/or classifiers.
Please note that in principle we are restricted to the scikit-learn classes available and moreover to the auto-sklearn results.
Parameters: - classifiers (list) – A list of strings defining the full class path of the classifiers (e.g. [sklearn.subgroup.MyClass]).
- rescalers (list) – A list of strings defining the full class path of the rescalers (e.g. [sklearn.subgroup.MyClass]).
- preprocessors (list) – A list of strings defining the full class path of the preprocessors (e.g. [sklearn.subgroup.MyClass]).
- encoders (list) – A list of strings defining the full class path of the encoders (e.g. [sklearn.subgroup.MyClass]).
- imputations (list) – A list of strings defining the full class path of the imputations (e.g. [sklearn.subgroup.MyClass]).
-
None.
-
add_classifier
(classifier)¶ Add a new classifier or set of classifiers.
-
classifier
¶ If str, then the element is added. If list, then all elements in the list are added.
Type: str or list
-
-
add_encoder
(encoder)¶ Add a new encoder or set of encoders.
-
encoder
¶ If str, then the element is added. If list, then all elements in the list are added.
Type: str or list
-
-
add_imputation
(imputation)¶ Add a new imputation or set of imputation methods.
-
imputation
¶ If str, then the element is added. If list, then all elements in the list are added.
Type: str or list
-
-
add_preprocessor
(preprocessor)¶ Add a new pre-processor or set of pre-processors.
-
preprocessor
¶ If str, then the element is added. If list, then all elements in the list are added.
Type: str or list
-
-
add_rescaler
(rescaler)¶ Add a new rescaler or set of rescalers.
-
rescaler
¶ If str, then the element is added. If list, then all elements in the list are added.
Type: str or list
-
-
get_all_elements
()¶ Return all elements.
Returns: - A single list with all classifiers, rescalers,
- pre-processors, encoders and imputation methods.
Return type: list
-
classifiers
¶ Return the valid classifiers.
None values are considered invalid, and hence not included.
Returns: The classifiers. Return type: list
-
encoders
¶ Return the valid _encoders.
None values are considered invalid, and hence not included.
Returns: The encoders. Return type: list
-
imputations
¶ Return the valid imputations.
None values are considered invalid, and hence not included.
Returns: The imputations. Return type: list
-
preprocessors
¶ Return the valid preprocessors.
None values are considered invalid, and hence not included.
Returns: The preprocessors. Return type: list
-
rescalers
¶ Return the valid rescalers.
None values are considered invalid, and hence not included.
Returns: The rescalers. Return type: list
-
automl.metalearning.database.configurations_parsing.
mix_suggestions
(suggestion_list)¶ Mix a list of MLSuggestion’s into a single one.
Parameters: suggestion_list (list) – Raises: TypeError
– If the argument is not a list or any of the elements in the list is not an instance of MLSuggestion.Returns: MLSuggestion The resulting MLSuggestion.
automl.metalearning.database.load_db module¶
File defining classes to load the meta db and interact with it.
The MetaDB is the metalearning information for the 140+ datasets defined in the auto-sklearn paper. These help us to build a space of datasets we can query against to obtain similar datasets and consequently, potential algorithms to work with in a given dataset.
In this implementation, we provide methods to retrieve the simple space and its weighted version.
-
class
automl.metalearning.database.load_db.
AlgorithmRunsFile
(algruns_file)¶ Bases:
object
Abstract the algorithm_runs.arff file.
-
get_associated_configuration_id
(instance_id)¶ Get the associated configuration for a given instance id.
A configuration is a solution for the dataset (instance). This returns the id for that solution, not the solution itself.
-
instance_id
¶ The id of the dataset (instance) to search for.
Type: int
Returns: - The id of the configuration solving the instance_id problem
- (dataset).
Return type: int -
-
get_associated_configuration_ids
(instances_ids)¶ Get the associated configuration for a given set of instance ids.
A configuration is a solution for the dataset (instance). This returns the ids for the solutions, not the solutions themselves.
-
instances_ids
¶ The ids of the datasets (instances) to search for.
Type: list
Returns: - The ids of the configurations solving the instance_id’s
- (datasets) problems.
Return type: list(int) -
-
-
class
automl.metalearning.database.load_db.
ConfigurationsFile
(configs_file)¶ Bases:
object
Abstract a configurations.csv file.
This class represents a configurations.csv file for our meta-knowledge. It also provides some useful methods to interact with the file.
-
class
automl.metalearning.database.load_db.
LandmarkModelParser
¶ Bases:
object
Class to interact with the models stored per instance (dataset).
-
static
metrics_available
()¶ Return the metrics that are available in the meta-knowledge.
Returns: Metrics available in the package’s local storage. Return type: list
-
static
models_by_metric
(instances_ids=None, dataset=None, metric='accuracy')¶ Return the models for a list of instances by the given accuracy.
-
instances_ids
¶ List of integers with the ids of the instances (datasets).
Type: list
-
metric
¶ metrics returned by LandmarkModelParser.metrics_available().
Type: str
- Results:
- list: List of models. One element per instance.
-
-
static
-
class
automl.metalearning.database.load_db.
MKDatabaseClient
¶ Bases:
object
MKDatabase (Meta-Knowledge Database) to perform queries.
This class serves as a facade to interact with the Meta Knowledge. We would like to expose the next features: - Ability to find the nearest datasets given a metric - Reload the database (in case of any change at running time in the arff files). TODO // To consider (future implementation): - Ability to add observations into the database at running time, e.g. MKDatabase().add_dataset_metaknowledge()
-
get_metaknowledge_space
(weighted=False)¶ Get the metaknowledge space.
Retrieving the metaknowledge space may be useful for plotting purposes.
Parameters: weighted (bool) – Whether to retrieve the weighted version. Defaults to False. Returns: - 2-tuple where the first element is a list of ids and the
- second is the matrix representing the metafeatures of each of the ids.
Return type: tuple
-
meta_suggestions
(dataset=None, ids_list=None, metric='accuracy')¶ Retrieve the Model Suggestions for a set of ids, based on a dataset.
Using a given metric, retrieve the models suggested.
-
ids_list
¶ The list of ids to retrieve information about.
Type: list
-
metric
¶ A Metalearning metric.
Type: str
Returns: A list of MLSuggestions. Return type: list -
-
nearest_datasets
(dataset=None, k=5, weighted=False, distance_metric='minkowski')¶ Find the _k_ nearest datasets in the meta-knwonledge DB.
This method finds the _k_ nearest neighbors for a given dataset, based on a given metric. This helps, for instance, to latter make the relation with the saved algorithms for each of the metrics in automl/metalearning/db/files.
-
dataset
The dataset to use. Default is None, which will cause the method to fail.
Type: automl.datahandler.dataloader.Dataset
-
weighted
¶ True if the costs should be used. Defaults to False.
Type: bool
-
k
¶ The number of neighbor datasets to retrieve.
Type: int
-
distance_metric
¶ The distance metric to use for the KNN algorithm.
Type: string or sklearn callable
Returns: - (np.array, np.array) A tuple where the first element is a numpy
- array of the similarity metrics for the result datasets and the second element contains the similar dataset’s ids.
-
-
reload
()¶ Reload the metaknowledge object.
This is helpful if any change is done at runtime in the metaknowledge files.
-
-
class
automl.metalearning.database.load_db.
MetaKnowledge
¶ Bases:
object
This class is a representation of the feature’s costs/values.
The information comes from the meta-knowledge acquired by auto-sklearn. It provides a way to load the information in the form of ARFFWrapper objects and retrieve pandas and numpy representations of this information to ease 1) interaction with the features and 2) matrix operations for more mathematical methods that are needed for machine learning.
-
load_datasets_info
()¶ Load both the costs and values file for the features in the meta db.
It will initialize the features/costs objects with ARFFWrapper objs.
Returns: Self object with the initialized features/costs. Return type: LoadMetaDB
-
simple_matrix
()¶ Return the feature values in the meta db - without the weights.
Returns: The ordered list of instance_id’s for the features np.darray: The matrix with the features values. Each row corresponds to the id in the first return value. Return type: np.array
-
weighted_matrix
()¶ Return a matrix with the weighted (costs) features in the meta db.
Returns: The ordered list of instance_id’s for the features np.darray: The matrix with the weighted features values. Each row corresponds to the id in the first return value. Return type: np.array
-
Module contents¶
Store the Meta Knowledge database and provide classes to interact with it.
The Meta Knowledge is understood as the results from the meta-learning component of auto-sklearn. Here we idendity two submodules: one to interact with the results and the other to compute new meta-knowledge either for a new dataset or to ease the inclusion of new results into the database.