Machine Learning: Introduction and Examples of AutoGluon

2019/12/1418:50:09 technology 2718

Introduction AutoGluon

AutoGluon is a new open source AutoML library that can automatically perform deep learning (DL) and machine learning (ML) for practical applications involving image, text, and tabular data sets. Whether you are a novice in machine learning or an experienced practitioner, AutoGluon can simplify your workflow. With AutoGluon, you can use only a few lines of Python code to develop and improve deep learning models.

Main features

Historically, to create a machine learning model, a lot of background knowledge, experience and manpower are required. Data preparation, feature engineering, validation splitting, missing value processing, and model selection are just some of the many tasks that machine learning applications must solve. A particularly difficult task is to select hyperparameters.

Hyperparameters represent many choices that users must make when building a model, such as data processing steps, neural network architecture, and optimization procedures used during training. Each hyperparameter affects the predictive performance of the machine learning model in an opaque manner, and the more powerful a model (such as a deep neural network) requires tuning, the more hyperparameters. Slight hyperparameter modifications may significantly change the quality of the model. Since it is often unclear how to make these decisions, developers usually manually adjust various aspects of their ML pipeline, which may require multiple iterations and hard human work.
AutoGluon automates all the tasks mentioned earlier, creating a truly manual experience. AutoGluon will use available computing resources to find the strongest ML method.
AutoGluon enables you to automatically implement supervised learning tasks such as image classification, object detection, and text classification. The hyperparameters of each task are automatically selected through optimization algorithms such as Bayesian optimization, superband, and reinforcement learning. With AutoGluon, you don't have to be familiar with the underlying model, because all hyperparameters will be automatically tuned to the default ranges, which perform well for specific tasks and models.
For professional ML practitioners, AutoGluon allows easy customization of this process. For example, you can specify a range of values for certain hyperparameters, or you can use AutoGluon to automatically adjust various aspects of a custom model. If you have access to multiple machines, AutoGluon can easily distribute its calculations to these machines in order to return the trained model faster.

AutoGluon example

Install

 CUDA 10.0 and a GPU for object detection is recommended
 We install MXNet to utilize deep learning models
pip install --upgrade mxnet-cu100
pip install autogluon

Object detection example

We take the task of object detection as an example to demonstrate the simple interface of AutoGluon. In object detection, not only must the objects in the image be identified, but also bounding boxes must be used to locate them.

We will use AutoGluon to train an object detector on a data set for demonstration purposes (to ensure fast runtime). The data set is generated using the motorcycle category of the VOC data set. In the following Python code, we first import AutoGluon, assign object detection as a task, download the data to our machine, and finally load the data into Python:

import autogluon as ag
from autogluon import ObjectDetectionas task
url = &39;https://autogluon.s3.amazonaws.com/datasets/tiny_motorbike.zip&39;
data_dir = ag.unzip(ag.download(url))
dataset = task.Dataset(data_dir, classes=(&39;motorbike&39;,))

Next, we can train a detector model using AutoGluon by calling the fit() function:

detector = task.fit(dataset)

In this call to fit(), AutoGluon trains many models under different network configurations and optimized hyperparameters, and selects the best of them as the final return detector. In the absence of any user input, the call to fit() also automatically uses the latest deep learning techniques, such as the migration learning of the pre-trained YOLOv3 network. We can use the predict() method to test the trained detector on a new image:

url = &39;https://autogluon.s3.amazonaws.com/images/object_detection_example.png&39;
filename = ag. download(url)
index, probabilities, locations = detector.predict(filename)

AutoGluon's predict function automatically loads test images , And output the predicted object category, class probability and bounding box position of each detected object. The visualization image shown above will be automatically generated.

Table data example

The most common data format is a table data set. They consist of structured data and are usually located in CSV files or databases. In a tabular data set, each column represents a measurement value (also called a feature) of a variable, and each row represents a separate data point. AutoGluon can be used to train models that predict specific column values based on other columns in the same row, and can generalize to previously unseen examples.
The data set we will train is the adult income classification data set. The data set contains information about 48,000 individuals, including numerical characteristics (such as age) and classification characteristics (such as occupation). This data set is usually used to predict personal income. In this example, we will predict whether a person’s annual income exceeds $50,000. We will use 80% of the data to train and 20% of the data to test the generated AutoGluon predictor. With AutoGluon, there is no need to specify verification data. AutoGluon will use the provided training data to optimally allocate the validation set.
For example, in the Python code, first import AutoGluon and specify a task. In this task, we will use TabularPrediction to process tabular data. Then we load the dataset from the CSV file on S3. With just one call to fit(), AutoGluon can process the data and train a collection of ML models called "predictors" that can predict the "class" variables in the data. It will use other columns as predictive features, such as the individual’s age, occupation, and education. This collection of models includes tested algorithms in ML, such as LightGBM, CatBoost and deep neural networks, which are consistently superior to more traditional ML models such as logistic regression.
Note, we don’t need any data processing, feature engineering design, It is not even necessary to declare the type of prediction problem. AutoGluon automatically prepares the data and infers whether our problem is regression or classification (including whether it is binary or multivariate). The trained predictor model will be saved to the location specified in the task.fit() call.

from autogluon import TabularPrediction as task
train_path = &39;https://autogluon.s3.amazonaws.com/datasets/AdultIncomeBinaryClassification/train_data.csv&39;
train_data = task.Dataset(file)_path=train = task.fit(train_data, label=&39;class&39;, output_directory=&39;ag-example-out/&39;)

Now our predictor model has been trained , We will make predictions on previously invisible test data. We can use the returned predictor directly or load it from the specified output directory.

predictor = task.load(&39;ag-example-out/&39;)
test_path = &39;https://autogluon.s3.amazonaws.com/datasets/AdultIncomeBinaryClassification/test_data.csv&39;
test_data = task.Dataset(file_path=test_path)
y_test = test_data[&39;class&39;]
test_data_nolabel = test_data.drop(labels=[&39;class&39;],axis=1)
y_pred = predictor.predict(test_data_nolabel )
y_pred_proba = predictor.predict_proba(test_data_nolabel)
print(list(y_pred[:5]))
print(list(y_pred_proba[:5]))

11d2a1a7d#

#[&39;<=> 50K&39;, &39;<=>[0.077471, 0.0093894, 0.973065, 0.0021249, 0.001387]
Now let’s take a look at the model ranking:

leaderboard = predictor.leaderboard(test_data)

AutoGluon's model rankings

This ranking shows each model trained by AutoGluon, their scores on the test and validation data, and Training time (in seconds). It can be seen that weighted_ensemble performs best on the verification and test sets, reaching an accuracy of 87.76%.

Final

In this article, we introduce AutoGluon, which aims to provide ML experts and novices with the best machine learning and deep learning experience.

technology

The US has revised the chip ban, restricting the supply of chips to Huawei, which has also made Huawei mobile phones, which are one step away from reaching the top, lose their "soul". You should know that Huawei is the world's leading 5G technology. In the past decade, Huawei's c - DayDayNews

The US has revised the chip ban, restricting the supply of chips to Huawei, which has also made Huawei mobile phones, which are one step away from reaching the top, lose their "soul". You should know that Huawei is the world's leading 5G technology. In the past decade, Huawei's c

8.6 million pieces! Huawei released important data, foreign media: Is this the result of the US suppression?

07/04 1621

The annual "Double 11" online shopping carnival is coming. Major e-commerce platforms are making great efforts and merchants are preparing to sell specials. Of course, we consumers have also prepared "bullets" to clear the shopping cart as their ultimate goal. However, after year - DayDayNews

The annual "Double 11" online shopping carnival is coming. Major e-commerce platforms are making great efforts and merchants are preparing to sell specials. Of course, we consumers have also prepared "bullets" to clear the shopping cart as their ultimate goal. However, after year

Are the deposit and deposit the same when shopping online? Ant Manor Today's Answer

07/03 1527

[Mobile China News] At 8 o'clock tonight, shortly after the iQOO Double 11 strong event was launched, iQOO Mobile officially announced: a good start and won the first battle! - DayDayNews

[Mobile China News] At 8 o'clock tonight, shortly after the iQOO Double 11 strong event was launched, iQOO Mobile officially announced: a good start and won the first battle!

iQOO starts well: wins the top 2 sales of JD & Tmall Android mobile phone brands

07/03 1541

Everyone who buys sweeping robots should be for convenience and trouble saving, but how many people become more anxious after buying sweeping robots? The garbage has to be removed after cleaning a few times. The robot that can mop the floor has to manually wash the mop? Sometimes - DayDayNews

Everyone who buys sweeping robots should be for convenience and trouble saving, but how many people become more anxious after buying sweeping robots? The garbage has to be removed after cleaning a few times. The robot that can mop the floor has to manually wash the mop? Sometimes

Don't buy sweeping robots randomly. You should pay attention to these points, otherwise you will suffer the consequences.

07/03 1413

What era is now? Now is the era of the Internet, and even more so is the era of data. In today's work and life, many people have a necessity to upload, save, share and download data. What we share is not limited to a few pictures, songs, and novels. They are getting bigger and bi - DayDayNews

What era is now? Now is the era of the Internet, and even more so is the era of data. In today's work and life, many people have a necessity to upload, save, share and download data. What we share is not limited to a few pictures, songs, and novels. They are getting bigger and bi

How to buy Baidu Netdisk members to affordable

07/03 1538

technology

This year's iPhone 14 series is not so much updated as four models, but more than three models. Two of them are the slightly more obvious iPhone 14 Pro/Max series, which has a brand new Lingdong Island, A16 chip, 48-megapixel main camera, and an exclusive screen-off display on th

Price reduction is the strongest innovation! iPhone 14 severely squeezed toothpaste but won the first place in sales

07/03 1650

Air compressors are the most important power energy equipment in the industry today. How to select models to ensure the normal production of enterprises while saving costs. Today we will talk about the selection techniques of air compressors. 1. The exhaust pressure and exhaust v - DayDayNews

Air compressors are the most important power energy equipment in the industry today. How to select models to ensure the normal production of enterprises while saving costs. Today we will talk about the selection techniques of air compressors. 1. The exhaust pressure and exhaust v

How to scientifically select air compressors to truly achieve energy saving and reliability?

07/03 1411

This time, after the release of the new Redmi Note12 Pro series, it can be said that it caused a "uproar" in the industry. Because Redmi once again relies on defining mid-range mobile phones, it has successfully attracted the attention of almost all mobile phone manufacturers. I - DayDayNews

This time, after the release of the new Redmi Note12 Pro series, it can be said that it caused a "uproar" in the industry. Because Redmi once again relies on defining mid-range mobile phones, it has successfully attracted the attention of almost all mobile phone manufacturers. I

The popular mobile phone is finally on sale! The 210W charging phone is here, and Redmi's first tablet

07/03 1584

Diantao, also known as Taobao live broadcast app, is a specialized way to treat shopping choice difficulties and is also a treasure app that shoppers must not miss. Many anchor experts recommend good things in conscience, share life strategies, and increase their postures togethe - DayDayNews

Diantao, also known as Taobao live broadcast app, is a specialized way to treat shopping choice difficulties and is also a treasure app that shoppers must not miss. Many anchor experts recommend good things in conscience, share life strategies, and increase their postures togethe

How to make money on Diantao APP? The most complete tutorial on making money in Taobao

07/03 1289

The highly anticipated Double 11 promotion has begun. On October 31, Video Account announced that it will officially launch the "11.11 Carnival" event for video account live broadcast at 8 pm. - DayDayNews

The highly anticipated Double 11 promotion has begun. On October 31, Video Account announced that it will officially launch the "11.11 Carnival" event for video account live broadcast at 8 pm.

Tencent finally entered the game! Video account starts Double 11 live broadcast promotion tonight

07/03 1485