Welcome to auto-ml’s documentation!¶
This is the official documentation for our Automated Machine Learning solution in Python.
This package has been developed by students of the Eindhoven University of Technology (TU/e) for Achmea’s internal use as part of an internship program, but it is also shared Open Source via github as agreed by all parts.
The package is intended to assist the Data Scientists to solve classification (and possibly regression) problems by automatically finding pipelines that include pre-processing, feature engineering and classification/regression models for a given dataset. This tool does not implement a new algorithm from scratch to come up with a solution, but instead it tries to gather the most relevant features from well known approaches that have proved efficient so that a usable framework can assist Data Scientsts.
In more detail, our solution makes use of the pre-learned meta-knowledge acquired by auto-sklearn to find candidate models for a given dataset, TPOT’s Genetic Programming approach to find pipelines in an automated way and the Bayesian Optimization implemeted in SMAC to fine-tune a given pipeline.
Authors:
- AutoML in a nutshell
- Overview of our solution
- Installing the tool
- Using the tool
- Build a Dataset object from an OpenML dataset
- Build a Dataset object from a pandas data frame
- Get similar datasets, based on a valid meta-learning metric
- Query the reduced search space, based on the similar datasets
- Discover a Pipeline using the reduced search space
- Discover a pipeline from scratch
- Optimize a pipeline with Bayesian Optimization
- Optimize any pipeline using Bayesian Optimization
- Results
- API
- Python basics to understand the tool
- Contribute