automl.metalearning.database package

Submodules

automl.metalearning.database.configurations_parsing module

Provide functions, classes and private variables to build configurations.

We define a configuration as a set of classifiers / imputation-methods / rescalers / preprocessors for a given problem. The set of these configurations can be understood as The Search Space for a problem.

class automl.metalearning.database.configurations_parsing.ConfigurationBuilder(model_row=None)

Bases: object

Build a configuration: model/pre-processor/encoder/scaler/imputation.

A set of configurations can be used to feed TPOT with.

Parameters:model_row (pandas.Series or pandas.DataFrame) – Represents a row coming from a configurations.csv file. Defaults to None
model_row

Represents a row coming from a configurations.csv file.

Type:pandas.Series or pandas.DataFrame
Raises:TypeError – If no pandas Series or DataFrame is passed as argument.
build_configuration()

Build a ML Suggestion with the row passed at instaciation time.

Returns:
A suggestion with classifiers, pre-processors,
scalers, encoders and imputation methods.
Return type:MLSuggestion
class automl.metalearning.database.configurations_parsing.MLSuggestion(classifiers=None, rescalers=None, preprocessors=None, encoders=None, imputations=None)

Bases: object

Class that represents a MLSuggestion.

A Machine Learning suggestion can (in the biggest picture we consider) contain imputation methods, encoders, pre-processors, rescalers and/or classifiers.

Please note that in principle we are restricted to the scikit-learn classes available and moreover to the auto-sklearn results.

Parameters:
  • classifiers (list) – A list of strings defining the full class path of the classifiers (e.g. [sklearn.subgroup.MyClass]).
  • rescalers (list) – A list of strings defining the full class path of the rescalers (e.g. [sklearn.subgroup.MyClass]).
  • preprocessors (list) – A list of strings defining the full class path of the preprocessors (e.g. [sklearn.subgroup.MyClass]).
  • encoders (list) – A list of strings defining the full class path of the encoders (e.g. [sklearn.subgroup.MyClass]).
  • imputations (list) – A list of strings defining the full class path of the imputations (e.g. [sklearn.subgroup.MyClass]).
None.
add_classifier(classifier)

Add a new classifier or set of classifiers.

classifier

If str, then the element is added. If list, then all elements in the list are added.

Type:str or list
add_encoder(encoder)

Add a new encoder or set of encoders.

encoder

If str, then the element is added. If list, then all elements in the list are added.

Type:str or list
add_imputation(imputation)

Add a new imputation or set of imputation methods.

imputation

If str, then the element is added. If list, then all elements in the list are added.

Type:str or list
add_preprocessor(preprocessor)

Add a new pre-processor or set of pre-processors.

preprocessor

If str, then the element is added. If list, then all elements in the list are added.

Type:str or list
add_rescaler(rescaler)

Add a new rescaler or set of rescalers.

rescaler

If str, then the element is added. If list, then all elements in the list are added.

Type:str or list
get_all_elements()

Return all elements.

Returns:
A single list with all classifiers, rescalers,
pre-processors, encoders and imputation methods.
Return type:list
classifiers

Return the valid classifiers.

None values are considered invalid, and hence not included.

Returns:The classifiers.
Return type:list
encoders

Return the valid _encoders.

None values are considered invalid, and hence not included.

Returns:The encoders.
Return type:list
imputations

Return the valid imputations.

None values are considered invalid, and hence not included.

Returns:The imputations.
Return type:list
preprocessors

Return the valid preprocessors.

None values are considered invalid, and hence not included.

Returns:The preprocessors.
Return type:list
rescalers

Return the valid rescalers.

None values are considered invalid, and hence not included.

Returns:The rescalers.
Return type:list
automl.metalearning.database.configurations_parsing.mix_suggestions(suggestion_list)

Mix a list of MLSuggestion’s into a single one.

Parameters:suggestion_list (list) –
Raises:TypeError – If the argument is not a list or any of the elements in the list is not an instance of MLSuggestion.
Returns:MLSuggestion The resulting MLSuggestion.

automl.metalearning.database.load_db module

File defining classes to load the meta db and interact with it.

The MetaDB is the metalearning information for the 140+ datasets defined in the auto-sklearn paper. These help us to build a space of datasets we can query against to obtain similar datasets and consequently, potential algorithms to work with in a given dataset.

In this implementation, we provide methods to retrieve the simple space and its weighted version.

class automl.metalearning.database.load_db.AlgorithmRunsFile(algruns_file)

Bases: object

Abstract the algorithm_runs.arff file.

get_associated_configuration_id(instance_id)

Get the associated configuration for a given instance id.

A configuration is a solution for the dataset (instance). This returns the id for that solution, not the solution itself.

instance_id

The id of the dataset (instance) to search for.

Type:int
Returns:
The id of the configuration solving the instance_id problem
(dataset).
Return type:int
get_associated_configuration_ids(instances_ids)

Get the associated configuration for a given set of instance ids.

A configuration is a solution for the dataset (instance). This returns the ids for the solutions, not the solutions themselves.

instances_ids

The ids of the datasets (instances) to search for.

Type:list
Returns:
The ids of the configurations solving the instance_id’s
(datasets) problems.
Return type:list(int)
class automl.metalearning.database.load_db.ConfigurationsFile(configs_file)

Bases: object

Abstract a configurations.csv file.

This class represents a configurations.csv file for our meta-knowledge. It also provides some useful methods to interact with the file.

get_configuration(algorithm_id)

Get the configurations for a given algorithm id.

algorithm_id

The id for the algorithm. This should come from the results in algorithm_runs.arff.

Type:int
get_configurations(algorithms_ids)

Get the configurations for a given set of algorithm ids.

algorithm_ids

The ids for the algorithms. These should come from the results in algorithm_runs.arff

Type:list
class automl.metalearning.database.load_db.LandmarkModelParser

Bases: object

Class to interact with the models stored per instance (dataset).

static metrics_available()

Return the metrics that are available in the meta-knowledge.

Returns:Metrics available in the package’s local storage.
Return type:list
static models_by_metric(instances_ids=None, dataset=None, metric='accuracy')

Return the models for a list of instances by the given accuracy.

instances_ids

List of integers with the ids of the instances (datasets).

Type:list
dataset

The dataset to work with.

Type:Dataset
metric

metrics returned by LandmarkModelParser.metrics_available().

Type:str
Results:
list: List of models. One element per instance.
class automl.metalearning.database.load_db.MKDatabaseClient

Bases: object

MKDatabase (Meta-Knowledge Database) to perform queries.

This class serves as a facade to interact with the Meta Knowledge. We would like to expose the next features: - Ability to find the nearest datasets given a metric - Reload the database (in case of any change at running time in the arff files). TODO // To consider (future implementation): - Ability to add observations into the database at running time, e.g. MKDatabase().add_dataset_metaknowledge()

get_metaknowledge_space(weighted=False)

Get the metaknowledge space.

Retrieving the metaknowledge space may be useful for plotting purposes.

Parameters:weighted (bool) – Whether to retrieve the weighted version. Defaults to False.
Returns:
2-tuple where the first element is a list of ids and the
second is the matrix representing the metafeatures of each of the ids.
Return type:tuple
meta_suggestions(dataset=None, ids_list=None, metric='accuracy')

Retrieve the Model Suggestions for a set of ids, based on a dataset.

Using a given metric, retrieve the models suggested.

dataset

The dataset to use as areference.

Type:Dataset
ids_list

The list of ids to retrieve information about.

Type:list
metric

A Metalearning metric.

Type:str
Returns:A list of MLSuggestions.
Return type:list
nearest_datasets(dataset=None, k=5, weighted=False, distance_metric='minkowski')

Find the _k_ nearest datasets in the meta-knwonledge DB.

This method finds the _k_ nearest neighbors for a given dataset, based on a given metric. This helps, for instance, to latter make the relation with the saved algorithms for each of the metrics in automl/metalearning/db/files.

dataset

The dataset to use. Default is None, which will cause the method to fail.

Type:automl.datahandler.dataloader.Dataset
weighted

True if the costs should be used. Defaults to False.

Type:bool
k

The number of neighbor datasets to retrieve.

Type:int
distance_metric

The distance metric to use for the KNN algorithm.

Type:string or sklearn callable
Returns:
(np.array, np.array) A tuple where the first element is a numpy
array of the similarity metrics for the result datasets and the second element contains the similar dataset’s ids.
reload()

Reload the metaknowledge object.

This is helpful if any change is done at runtime in the metaknowledge files.

class automl.metalearning.database.load_db.MetaKnowledge

Bases: object

This class is a representation of the feature’s costs/values.

The information comes from the meta-knowledge acquired by auto-sklearn. It provides a way to load the information in the form of ARFFWrapper objects and retrieve pandas and numpy representations of this information to ease 1) interaction with the features and 2) matrix operations for more mathematical methods that are needed for machine learning.

load_datasets_info()

Load both the costs and values file for the features in the meta db.

It will initialize the features/costs objects with ARFFWrapper objs.

Returns:Self object with the initialized features/costs.
Return type:LoadMetaDB
simple_matrix()

Return the feature values in the meta db - without the weights.

Returns:The ordered list of instance_id’s for the features np.darray: The matrix with the features values. Each row corresponds to the id in the first return value.
Return type:np.array
weighted_matrix()

Return a matrix with the weighted (costs) features in the meta db.

Returns:The ordered list of instance_id’s for the features np.darray: The matrix with the weighted features values. Each row corresponds to the id in the first return value.
Return type:np.array

Module contents

Store the Meta Knowledge database and provide classes to interact with it.

The Meta Knowledge is understood as the results from the meta-learning component of auto-sklearn. Here we idendity two submodules: one to interact with the results and the other to compute new meta-knowledge either for a new dataset or to ease the inclusion of new results into the database.