Automated machine learning

Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning.^[1]^[2] Automating the end-to-end process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand.

Targets of automation

Automated machine learning can target various stages of the machine learning process:^[2]

Automated data preparation and ingestion (from raw data and miscellaneous formats)
- Automated column type detection; e.g., boolean, discrete numerical, continuous numerical, or text
- Automated column intent detection; e.g., target/label, stratification field, numerical feature, categorical text feature, or free text feature
- Automated task detection; e.g., binary classification, regression, clustering, or ranking
Automated feature engineering
- Feature selection
- Feature extraction
- Meta learning and transfer learning
- Detection and handling of skewed data and/or missing values
Automated model selection
Hyperparameter optimization of the learning algorithm and featurization
Automated pipeline selection under time, memory, and complexity constraints
Automated selection of evaluation metics / validation procedures
Automated problem checking
- Leakage detection
- Misconfiguration detection
Automated analysis of results obtained
User interfaces and visualizations for automated machine learning

Examples

Software tackling various stages of AutoML:

Hyperparameter optimization and model selection

H2O AutoML provides automated data preparation, hyperparameter tuning via random search, and stacked ensembles in a distributed machine learning platform.

mlr is a R package that contains several hyperparameter optimization techniques for machine learning problems.

Full pipeline optimization

Auto-WEKA^[3] is a Bayesian hyperparameter optimization layer on top of WEKA.
auto-sklearn^[4] is a Bayesian hyperparameter optimization layer on top of scikit-learn.
Firefly.aia Cloud-Based system for automatic generation of machine learning models
TPOT^[5]^[6] is a Python library that automatically creates and optimizes full machine learning pipelines using genetic programming.
TransmogrifAI^[7]^[8] is a Scala/SparkML library created by Salesforce for automated data cleansing, feature engineering, model selection, and hyperparameter optimization
RECIPE ^[9] is a framework based on grammar-based genetic programming that builds customized scikit-learn classification pipelines.

Deep neural network architecture search

devol is a Python package that performs Deep Neural Network architecture search using genetic programming.
Google AutoML for deep learning model architecture selection.
Auto Keras is an open-source python package for neural architecture search.

References

↑ Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013). Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. KDD '13 Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 847–855.
1 2 Hutter F, Caruana R, Bardenet R, Bilenko M, Guyon I, Kegl B, and Larochelle H. "AutoML 2014 @ ICML". AutoML 2014 Workshop @ ICML. Retrieved 2018-03-28.
↑ Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K (2017). "Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA". Journal of Machine Learning Research: 1–5.
↑ Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F (2015). "Efficient and Robust Automated Machine Learning". Advances in Neural Information Processing Systems 28 (NIPS 2015): 2962--2970.
↑ Olson RS, Urbanowicz RJ, Andrews PC, Lavender NA, Kidd L, Moore JH (2016). "Automating biomedical data science through tree-based pipeline optimization". Proceedings of EvoStar 2016: 123–137. arXiv:1601.07925. doi:10.1007/978-3-319-31204-0_9.
↑ Olson RS, Bartley N, Urbanowicz RJ, Moore JH (2016). "Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science". Proceedings of EvoBIO 2016: 485–492. arXiv:1603.06212. doi:10.1145/2908812.2908918.
↑ Shubha Nabar (2018-08-16). "Open Sourcing TransmogrifAI – Automated Machine Learning for Structured Data - Salesforce Engineering". Salesforce Engineering. Retrieved 2018-08-16.
↑ Kyle Wiggers (2018-08-16). "Salesforce open-sources TransmogrifAI, the machine learning library that powers Einstein". VentureBeat. Retrieved 2018-08-16. Once TransmogrifAI has extracted features from the dataset, it’s primed to begin automated model training. At this stage, it runs a cadre of machine learning algorithms in parallel on the data, automatically selects the best-performing model, and samples and recalibrates predictions to avoid imbalanced data.
↑ de Sá, Alex G. C.; Pinto, Walter José G. S.; Oliveira, Luiz Otavio V. B.; Pappa, Gisele L. (2017), "RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines", Lecture Notes in Computer Science, Springer International Publishing, pp. 246–261, doi:10.1007/978-3-319-55696-3_16, ISBN 9783319556956, retrieved 2018-09-04

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[autoweka1-1] Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013). Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. KDD '13 Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 847–855.

[AutoML2014ICML-2] 1 2 Hutter F, Caruana R, Bardenet R, Bilenko M, Guyon I, Kegl B, and Larochelle H. "AutoML 2014 @ ICML". AutoML 2014 Workshop @ ICML. Retrieved 2018-03-28.

[autoweka2-3] Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K (2017). "Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA". Journal of Machine Learning Research: 1–5.

[autosklearn-4] Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F (2015). "Efficient and Robust Automated Machine Learning". Advances in Neural Information Processing Systems 28 (NIPS 2015): 2962--2970.

[tpot1-5] Olson RS, Urbanowicz RJ, Andrews PC, Lavender NA, Kidd L, Moore JH (2016). "Automating biomedical data science through tree-based pipeline optimization". Proceedings of EvoStar 2016: 123–137. arXiv:1601.07925. doi:10.1007/978-3-319-31204-0_9.

[tpot2-6] Olson RS, Bartley N, Urbanowicz RJ, Moore JH (2016). "Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science". Proceedings of EvoBIO 2016: 485–492. arXiv:1603.06212. doi:10.1145/2908812.2908918.

[engi_Open-7] Shubha Nabar (2018-08-16). "Open Sourcing TransmogrifAI – Automated Machine Learning for Structured Data - Salesforce Engineering". Salesforce Engineering. Retrieved 2018-08-16.

[vent_Sale-8] Kyle Wiggers (2018-08-16). "Salesforce open-sources TransmogrifAI, the machine learning library that powers Einstein". VentureBeat. Retrieved 2018-08-16. Once TransmogrifAI has extracted features from the dataset, it’s primed to begin automated model training. At this stage, it runs a cadre of machine learning algorithms in parallel on the data, automatically selects the best-performing model, and samples and recalibrates predictions to avoid imbalanced data.

[9] Sá, Alex G. C.; Pinto, Walter José G. S.; Oliveira, Luiz Otavio V. B.; Pappa, Gisele L. (2017), "RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines", Lecture Notes in Computer Science, Springer International Publishing, pp. 246–261, doi:10.1007/978-3-319-55696-3_16, ISBN 9783319556956, retrieved 2018-09-04