How to use the tool
===================================

There are different use cases that can be handled from the tool. Here, we
present the ones we think are the most important and helpful to the Data
Scientist.

Build a Dataset object from an OpenML dataset
---------------------------------------------
.. code-block:: python
   :name: get-openml-dataset

    from automl.datahandler.dataloader import DataLoader

    # Download OpenML dataset 179, and consider it for classification (0)
    dataset = DataLoader.get_openml_dataset(openml_id=179, problem_type=0)

Build a Dataset object from a pandas data frame
-----------------------------------------------
.. code-block:: python
   :name: build-dataset

    from automl.datahandler.dataloader import Dataset

    features_df, target_df = ... # Some data frames

    data = Dataset(
        dataset_id="test-dataset",
        X=features_df,
        y=target_df,
        problem_type=0  # Problem type for classification
    )

Get similar datasets, based on a valid meta-learning metric
-----------------------------------------------------------
.. code-block:: python
   :name: metalearning-hints

    from automl.datahandler.dataloader import DataLoader
    from automl.discovery.assistant import Assistant

    dataset = DataLoader.get_openml_dataset(openml_id=179, problem_type=0)

    # start assistant
    assistant = Assistant(dataset=dataset, 
                          metalearning_metric='accuracy')

    # Compute similar datasets
    datasets, distances = assistant.compute_similar_datasets()

Query the reduced search space, based on the similar datasets
-------------------------------------------------------------
.. code-block:: python
   :name: reduced-searchspace

    from automl.datahandler.dataloader import DataLoader
    from automl.discovery.assistant import Assistant

    dataset = DataLoader.get_openml_dataset(openml_id=179, problem_type=0)

    # start assistant
    assistant = Assistant(dataset=dataset, 
                          metalearning_metric='accuracy')

    # Compute similar datasets
    assistant.compute_similar_datasets()
    reduced_ss = assistant.reduced_search_space

    # Get the reduced search space
    classifiers = reduced_search_space.classifiers
    encoders = reduced_search_space.encoders
    scalers = reduced_search_space.rescalers
    preprocessors = reduced_search_space.preprocessors
    imputations = reduced_search_space.imputations

Discover a Pipeline using the reduced search space
--------------------------------------------------
.. code-block:: python
   :name: pipeline-reduced

    from automl.datahandler.dataloader import DataLoader
    from automl.discovery.assistant import Assistant

    dataset = DataLoader.get_openml_dataset(openml_id=179, problem_type=0)

    # start assistant
    assistant = Assistant(dataset=dataset, 
                          metalearning_metric='accuracy',
                          evaluation_metric='accuracy')

    # Compute similar datasets
    assistant.compute_similar_datasets()

    pipeline_obj = assistant.generate_pipeline()
    pipeline_obj.save_pipeline(target_dir="results")

    # Get the scikit-learn pipeline object
    sklearn_pipeline = pipeline_obj.pipeline

Discover a pipeline from scratch
--------------------------------
.. code-block:: python
   :name: pipeline-scratch

    from automl.datahandler.dataloader import DataLoader
    from automl.discovery.assistant import Assistant

    dataset = DataLoader.get_openml_dataset(openml_id=179, problem_type=0)

    # start assistant
    assistant = Assistant(dataset=dataset, 
                          metalearning_metric='accuracy',
                          evaluation_metric='accuracy')

    pipeline_obj = assistant.generate_pipeline()
    pipeline_obj.save_pipeline(target_dir="results")

    # Get the scikit-learn pipeline object
    sklearn_pipeline = pipeline_obj.pipeline

Optimize a pipeline with Bayesian Optimization
----------------------------------------------
.. code-block:: python
   :name: bayesian-full

    from automl.datahandler.dataloader import DataLoader
    from automl.discovery.assistant import Assistant

    dataset = DataLoader.get_openml_dataset(openml_id=179, problem_type=0)

    # start assistant
    assistant = Assistant(dataset=dataset, 
                          metalearning_metric='accuracy',
                          evaluation_metric='accuracy')

    # Compute similar datasets
    assistant.compute_similar_datasets()

    pipeline_obj = assistant.generate_pipeline()
    pipeline_obj.save_pipeline(target_dir="results")

    # Get the scikit-learn pipeline object
    sklearn_pipeline = pipeline_obj.pipeline

    # Run the optimizer
    assistant.bayesian_optimize()

Optimize any pipeline using Bayesian Optimization
-------------------------------------------------
.. code-block:: python
   :name: bayesian-only

    from automl.datahandler.dataloader import DataLoader
    from automl.discovery.assistant import Assistant

    dataset = DataLoader.get_openml_dataset(openml_id=179, problem_type=0)

    # start assistant
    assistant = Assistant(dataset=dataset, evaluation_metric='accuracy')

    # Get the scikit-learn pipeline object
    my_pipeline = ... # A pipeline

    # Run the optimizer
    assistant.bayesian_optimize(my_pipeline)