27. Text Classification with AutoGluon#

https://auto.gluon.ai/stable/index.html

This is an excellent library from AWS that may be used for multimodal machine learning in an automatic manner. It uses stack-ensembling and beats most kaggle competition winners. See the papers in the Guthub repo: awslabs/autogluon

from google.colab import drive
drive.mount('/content/drive')  # Add My Drive/<>

import os
os.chdir('drive/My Drive')
os.chdir('Books_Writings/NLPBook/')
Mounted at /content/drive
%%capture
# %pylab inline
import pandas as pd
import os
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

27.1. Use AutoGluon Tabular on News Dataset#

We need to first install Meta’s PyTorch framework and then install AutoGluon, which runs on top of PyTorch. This is an extensive installation, and will take some time.

AutoGluon installation instructions: https://auto.gluon.ai/stable/install.html

%%time
!pip install -U pip
!pip install -U setuptools wheel
# !pip install -U uv

# CPU version of pytorch has smaller footprint - see installation instructions in
# pytorch documentation - https://pytorch.org/get-started/locally/
# !uv pip install torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cpu --system

# !uv pip install autogluon --system
!pip install autogluon
Requirement already satisfied: pip in /usr/local/lib/python3.11/dist-packages (24.1.2)
Collecting pip
  Downloading pip-24.3.1-py3-none-any.whl.metadata (3.7 kB)
Downloading pip-24.3.1-py3-none-any.whl (1.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 21.4 MB/s eta 0:00:00
?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-24.3.1
Requirement already satisfied: setuptools in /usr/local/lib/python3.11/dist-packages (75.1.0)
Collecting setuptools
  Downloading setuptools-75.8.0-py3-none-any.whl.metadata (6.7 kB)
Requirement already satisfied: wheel in /usr/local/lib/python3.11/dist-packages (0.45.1)
Downloading setuptools-75.8.0-py3-none-any.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 20.8 MB/s eta 0:00:00
?25hInstalling collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 75.1.0
    Uninstalling setuptools-75.1.0:
      Successfully uninstalled setuptools-75.1.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ipython 7.34.0 requires jedi>=0.16, which is not installed.
Successfully installed setuptools-75.8.0
Collecting autogluon
  Downloading autogluon-1.2-py3-none-any.whl.metadata (11 kB)
Collecting autogluon.core==1.2 (from autogluon.core[all]==1.2->autogluon)
  Downloading autogluon.core-1.2-py3-none-any.whl.metadata (12 kB)
Collecting autogluon.features==1.2 (from autogluon)
  Downloading autogluon.features-1.2-py3-none-any.whl.metadata (11 kB)
Collecting autogluon.tabular==1.2 (from autogluon.tabular[all]==1.2->autogluon)
  Downloading autogluon.tabular-1.2-py3-none-any.whl.metadata (14 kB)
Collecting autogluon.multimodal==1.2 (from autogluon)
  Downloading autogluon.multimodal-1.2-py3-none-any.whl.metadata (12 kB)
Collecting autogluon.timeseries==1.2 (from autogluon.timeseries[all]==1.2->autogluon)
  Downloading autogluon.timeseries-1.2-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: numpy<2.1.4,>=1.25.0 in /usr/local/lib/python3.11/dist-packages (from autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (1.26.4)
Requirement already satisfied: scipy<1.16,>=1.5.4 in /usr/local/lib/python3.11/dist-packages (from autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (1.13.1)
Collecting scikit-learn<1.5.3,>=1.4.0 (from autogluon.core==1.2->autogluon.core[all]==1.2->autogluon)
  Downloading scikit_learn-1.5.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Requirement already satisfied: networkx<4,>=3.0 in /usr/local/lib/python3.11/dist-packages (from autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (3.4.2)
Requirement already satisfied: pandas<2.3.0,>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (2.2.2)
Requirement already satisfied: tqdm<5,>=4.38 in /usr/local/lib/python3.11/dist-packages (from autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (4.67.1)
Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (2.32.3)
Requirement already satisfied: matplotlib<3.11,>=3.7.0 in /usr/local/lib/python3.11/dist-packages (from autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (3.10.0)
Collecting boto3<2,>=1.10 (from autogluon.core==1.2->autogluon.core[all]==1.2->autogluon)
  Downloading boto3-1.36.4-py3-none-any.whl.metadata (6.6 kB)
Collecting autogluon.common==1.2 (from autogluon.core==1.2->autogluon.core[all]==1.2->autogluon)
  Downloading autogluon.common-1.2-py3-none-any.whl.metadata (11 kB)
Collecting ray<2.40,>=2.10.0 (from ray[default]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon)
  Downloading ray-2.39.0-cp311-cp311-manylinux2014_x86_64.whl.metadata (17 kB)
Requirement already satisfied: pyarrow>=15.0.0 in /usr/local/lib/python3.11/dist-packages (from autogluon.core[all]==1.2->autogluon) (17.0.0)
Requirement already satisfied: hyperopt<0.2.8,>=0.2.7 in /usr/local/lib/python3.11/dist-packages (from autogluon.core[all]==1.2->autogluon) (0.2.7)
Requirement already satisfied: Pillow<12,>=10.0.1 in /usr/local/lib/python3.11/dist-packages (from autogluon.multimodal==1.2->autogluon) (11.1.0)
Requirement already satisfied: torch<2.6,>=2.2 in /usr/local/lib/python3.11/dist-packages (from autogluon.multimodal==1.2->autogluon) (2.5.1+cu121)
Collecting lightning<2.6,>=2.2 (from autogluon.multimodal==1.2->autogluon)
  Downloading lightning-2.5.0.post0-py3-none-any.whl.metadata (40 kB)
Requirement already satisfied: transformers<5,>=4.38.0 in /usr/local/lib/python3.11/dist-packages (from transformers[sentencepiece]<5,>=4.38.0->autogluon.multimodal==1.2->autogluon) (4.47.1)
Collecting accelerate<1.0,>=0.34.0 (from autogluon.multimodal==1.2->autogluon)
  Downloading accelerate-0.34.2-py3-none-any.whl.metadata (19 kB)
Collecting jsonschema<4.22,>=4.18 (from autogluon.multimodal==1.2->autogluon)
  Downloading jsonschema-4.21.1-py3-none-any.whl.metadata (7.8 kB)
Collecting seqeval<1.3.0,>=1.2.2 (from autogluon.multimodal==1.2->autogluon)
  Downloading seqeval-1.2.2.tar.gz (43 kB)
  Preparing metadata (setup.py) ... ?25l?25hdone
Collecting evaluate<0.5.0,>=0.4.0 (from autogluon.multimodal==1.2->autogluon)
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting timm<1.0.7,>=0.9.5 (from autogluon.multimodal==1.2->autogluon)
  Downloading timm-1.0.3-py3-none-any.whl.metadata (43 kB)
Requirement already satisfied: torchvision<0.21.0,>=0.16.0 in /usr/local/lib/python3.11/dist-packages (from autogluon.multimodal==1.2->autogluon) (0.20.1+cu121)
Collecting scikit-image<0.25.0,>=0.19.1 (from autogluon.multimodal==1.2->autogluon)
  Downloading scikit_image-0.24.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (14 kB)
Requirement already satisfied: text-unidecode<1.4,>=1.3 in /usr/local/lib/python3.11/dist-packages (from autogluon.multimodal==1.2->autogluon) (1.3)
Collecting torchmetrics<1.3.0,>=1.2.0 (from autogluon.multimodal==1.2->autogluon)
  Downloading torchmetrics-1.2.1-py3-none-any.whl.metadata (20 kB)
Collecting omegaconf<2.3.0,>=2.1.1 (from autogluon.multimodal==1.2->autogluon)
  Downloading omegaconf-2.2.3-py3-none-any.whl.metadata (3.9 kB)
Collecting pytorch-metric-learning<2.4,>=1.3.0 (from autogluon.multimodal==1.2->autogluon)
  Downloading pytorch_metric_learning-2.3.0-py3-none-any.whl.metadata (17 kB)
Collecting nlpaug<1.2.0,>=1.1.10 (from autogluon.multimodal==1.2->autogluon)
  Downloading nlpaug-1.1.11-py3-none-any.whl.metadata (14 kB)
Collecting nltk<3.9,>=3.4.5 (from autogluon.multimodal==1.2->autogluon)
  Downloading nltk-3.8.1-py3-none-any.whl.metadata (2.8 kB)
Collecting openmim<0.4.0,>=0.3.7 (from autogluon.multimodal==1.2->autogluon)
  Downloading openmim-0.3.9-py2.py3-none-any.whl.metadata (16 kB)
Requirement already satisfied: defusedxml<0.7.2,>=0.7.1 in /usr/local/lib/python3.11/dist-packages (from autogluon.multimodal==1.2->autogluon) (0.7.1)
Requirement already satisfied: jinja2<3.2,>=3.0.3 in /usr/local/lib/python3.11/dist-packages (from autogluon.multimodal==1.2->autogluon) (3.1.5)
Requirement already satisfied: tensorboard<3,>=2.9 in /usr/local/lib/python3.11/dist-packages (from autogluon.multimodal==1.2->autogluon) (2.17.1)
Collecting pytesseract<0.3.11,>=0.3.9 (from autogluon.multimodal==1.2->autogluon)
  Downloading pytesseract-0.3.10-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-ml-py3==7.352.0 (from autogluon.multimodal==1.2->autogluon)
  Downloading nvidia-ml-py3-7.352.0.tar.gz (19 kB)
  Preparing metadata (setup.py) ... ?25l?25hdone
Collecting pdf2image<1.19,>=1.17.0 (from autogluon.multimodal==1.2->autogluon)
  Downloading pdf2image-1.17.0-py3-none-any.whl.metadata (6.2 kB)
Collecting catboost<1.3,>=1.2 (from autogluon.tabular[all]==1.2->autogluon)
  Downloading catboost-1.2.7-cp311-cp311-manylinux2014_x86_64.whl.metadata (1.2 kB)
Requirement already satisfied: spacy<3.8 in /usr/local/lib/python3.11/dist-packages (from autogluon.tabular[all]==1.2->autogluon) (3.7.5)
Requirement already satisfied: lightgbm<4.6,>=4.0 in /usr/local/lib/python3.11/dist-packages (from autogluon.tabular[all]==1.2->autogluon) (4.5.0)
Requirement already satisfied: einops<0.9,>=0.7 in /usr/local/lib/python3.11/dist-packages (from autogluon.tabular[all]==1.2->autogluon) (0.8.0)
Requirement already satisfied: xgboost<2.2,>=1.6 in /usr/local/lib/python3.11/dist-packages (from autogluon.tabular[all]==1.2->autogluon) (2.1.3)
Requirement already satisfied: fastai<2.8,>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from autogluon.tabular[all]==1.2->autogluon) (2.7.18)
Requirement already satisfied: huggingface-hub[torch] in /usr/local/lib/python3.11/dist-packages (from autogluon.tabular[all]==1.2->autogluon) (0.27.1)
Requirement already satisfied: joblib<2,>=1.1 in /usr/local/lib/python3.11/dist-packages (from autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (1.4.2)
Collecting pytorch-lightning (from autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading pytorch_lightning-2.5.0.post0-py3-none-any.whl.metadata (21 kB)
Collecting gluonts<0.17,>=0.15.0 (from autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading gluonts-0.16.0-py3-none-any.whl.metadata (9.8 kB)
Collecting statsforecast<1.8,>=1.7.0 (from autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading statsforecast-1.7.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (28 kB)
Collecting mlforecast==0.13.4 (from autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading mlforecast-0.13.4-py3-none-any.whl.metadata (12 kB)
Collecting utilsforecast<0.2.5,>=0.2.3 (from autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading utilsforecast-0.2.4-py3-none-any.whl.metadata (7.4 kB)
Collecting coreforecast==0.0.12 (from autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading coreforecast-0.0.12-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting fugue>=0.9.0 (from autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading fugue-0.9.1-py3-none-any.whl.metadata (18 kB)
Requirement already satisfied: orjson~=3.9 in /usr/local/lib/python3.11/dist-packages (from autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (3.10.14)
Requirement already satisfied: psutil<7.0.0,>=5.7.3 in /usr/local/lib/python3.11/dist-packages (from autogluon.common==1.2->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (5.9.5)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.11/dist-packages (from mlforecast==0.13.4->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (3.1.0)
Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from mlforecast==0.13.4->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (2024.10.0)
Requirement already satisfied: numba in /usr/local/lib/python3.11/dist-packages (from mlforecast==0.13.4->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (0.60.0)
Collecting optuna (from mlforecast==0.13.4->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading optuna-4.2.0-py3-none-any.whl.metadata (17 kB)
Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from mlforecast==0.13.4->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (24.2)
Collecting window-ops (from mlforecast==0.13.4->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading window_ops-0.0.15-py3-none-any.whl.metadata (6.8 kB)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.11/dist-packages (from accelerate<1.0,>=0.34.0->autogluon.multimodal==1.2->autogluon) (6.0.2)
Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.11/dist-packages (from accelerate<1.0,>=0.34.0->autogluon.multimodal==1.2->autogluon) (0.5.2)
Collecting botocore<1.37.0,>=1.36.4 (from boto3<2,>=1.10->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon)
  Downloading botocore-1.36.4-py3-none-any.whl.metadata (5.7 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3<2,>=1.10->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon)
  Downloading jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Collecting s3transfer<0.12.0,>=0.11.0 (from boto3<2,>=1.10->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon)
  Downloading s3transfer-0.11.1-py3-none-any.whl.metadata (1.7 kB)
Requirement already satisfied: graphviz in /usr/local/lib/python3.11/dist-packages (from catboost<1.3,>=1.2->autogluon.tabular[all]==1.2->autogluon) (0.20.3)
Requirement already satisfied: plotly in /usr/local/lib/python3.11/dist-packages (from catboost<1.3,>=1.2->autogluon.tabular[all]==1.2->autogluon) (5.24.1)
Requirement already satisfied: six in /usr/local/lib/python3.11/dist-packages (from catboost<1.3,>=1.2->autogluon.tabular[all]==1.2->autogluon) (1.17.0)
Collecting datasets>=2.0.0 (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==1.2->autogluon)
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting dill (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==1.2->autogluon)
  Downloading dill-0.3.9-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==1.2->autogluon)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==1.2->autogluon)
  Downloading multiprocess-0.70.17-py311-none-any.whl.metadata (7.2 kB)
Requirement already satisfied: pip in /usr/local/lib/python3.11/dist-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==1.2->autogluon) (24.3.1)
Requirement already satisfied: fastdownload<2,>=0.0.5 in /usr/local/lib/python3.11/dist-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==1.2->autogluon) (0.0.7)
Requirement already satisfied: fastcore<1.8,>=1.5.29 in /usr/local/lib/python3.11/dist-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==1.2->autogluon) (1.7.28)
Requirement already satisfied: fastprogress>=0.2.4 in /usr/local/lib/python3.11/dist-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==1.2->autogluon) (1.0.3)
Collecting triad>=0.9.7 (from fugue>=0.9.0->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading triad-0.9.8-py3-none-any.whl.metadata (6.3 kB)
Collecting adagio>=0.2.4 (from fugue>=0.9.0->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading adagio-0.2.6-py3-none-any.whl.metadata (1.8 kB)
Requirement already satisfied: pydantic<3,>=1.7 in /usr/local/lib/python3.11/dist-packages (from gluonts<0.17,>=0.15.0->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (2.10.5)
Requirement already satisfied: toolz~=0.10 in /usr/local/lib/python3.11/dist-packages (from gluonts<0.17,>=0.15.0->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (0.12.1)
Requirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.11/dist-packages (from gluonts<0.17,>=0.15.0->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (4.12.2)
Requirement already satisfied: future in /usr/local/lib/python3.11/dist-packages (from hyperopt<0.2.8,>=0.2.7->autogluon.core[all]==1.2->autogluon) (1.0.0)
Requirement already satisfied: py4j in /usr/local/lib/python3.11/dist-packages (from hyperopt<0.2.8,>=0.2.7->autogluon.core[all]==1.2->autogluon) (0.10.9.7)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2<3.2,>=3.0.3->autogluon.multimodal==1.2->autogluon) (3.0.2)
Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.11/dist-packages (from jsonschema<4.22,>=4.18->autogluon.multimodal==1.2->autogluon) (24.3.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.11/dist-packages (from jsonschema<4.22,>=4.18->autogluon.multimodal==1.2->autogluon) (2024.10.1)
Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.11/dist-packages (from jsonschema<4.22,>=4.18->autogluon.multimodal==1.2->autogluon) (0.35.1)
Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.11/dist-packages (from jsonschema<4.22,>=4.18->autogluon.multimodal==1.2->autogluon) (0.22.3)
Collecting lightning-utilities<2.0,>=0.10.0 (from lightning<2.6,>=2.2->autogluon.multimodal==1.2->autogluon)
  Downloading lightning_utilities-0.11.9-py3-none-any.whl.metadata (5.2 kB)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib<3.11,>=3.7.0->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib<3.11,>=3.7.0->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib<3.11,>=3.7.0->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (4.55.3)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib<3.11,>=3.7.0->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (1.4.8)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib<3.11,>=3.7.0->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (3.2.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib<3.11,>=3.7.0->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (2.8.2)
Requirement already satisfied: gdown>=4.0.0 in /usr/local/lib/python3.11/dist-packages (from nlpaug<1.2.0,>=1.1.10->autogluon.multimodal==1.2->autogluon) (5.2.0)
Requirement already satisfied: click in /usr/local/lib/python3.11/dist-packages (from nltk<3.9,>=3.4.5->autogluon.multimodal==1.2->autogluon) (8.1.8)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.11/dist-packages (from nltk<3.9,>=3.4.5->autogluon.multimodal==1.2->autogluon) (2024.11.6)
Collecting antlr4-python3-runtime==4.9.* (from omegaconf<2.3.0,>=2.1.1->autogluon.multimodal==1.2->autogluon)
  Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
  Preparing metadata (setup.py) ... ?25l?25hdone
Collecting colorama (from openmim<0.4.0,>=0.3.7->autogluon.multimodal==1.2->autogluon)
  Downloading colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Collecting model-index (from openmim<0.4.0,>=0.3.7->autogluon.multimodal==1.2->autogluon)
  Downloading model_index-0.1.11-py3-none-any.whl.metadata (3.9 kB)
Collecting opendatalab (from openmim<0.4.0,>=0.3.7->autogluon.multimodal==1.2->autogluon)
  Downloading opendatalab-0.0.10-py3-none-any.whl.metadata (6.4 kB)
Requirement already satisfied: rich in /usr/local/lib/python3.11/dist-packages (from openmim<0.4.0,>=0.3.7->autogluon.multimodal==1.2->autogluon) (13.9.4)
Requirement already satisfied: tabulate in /usr/local/lib/python3.11/dist-packages (from openmim<0.4.0,>=0.3.7->autogluon.multimodal==1.2->autogluon) (0.9.0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas<2.3.0,>=2.0.0->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas<2.3.0,>=2.0.0->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (2024.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from ray<2.40,>=2.10.0->ray[default]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (3.16.1)
Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /usr/local/lib/python3.11/dist-packages (from ray<2.40,>=2.10.0->ray[default]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (1.1.0)
Requirement already satisfied: protobuf!=3.19.5,>=3.15.3 in /usr/local/lib/python3.11/dist-packages (from ray<2.40,>=2.10.0->ray[default]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (4.25.5)
Requirement already satisfied: aiosignal in /usr/local/lib/python3.11/dist-packages (from ray<2.40,>=2.10.0->ray[default]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (1.3.2)
Requirement already satisfied: frozenlist in /usr/local/lib/python3.11/dist-packages (from ray<2.40,>=2.10.0->ray[default]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (1.5.0)
Requirement already satisfied: aiohttp>=3.7 in /usr/local/lib/python3.11/dist-packages (from ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (3.11.11)
Collecting aiohttp-cors (from ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon)
  Downloading aiohttp_cors-0.7.0-py3-none-any.whl.metadata (20 kB)
Collecting colorful (from ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon)
  Downloading colorful-0.5.6-py2.py3-none-any.whl.metadata (16 kB)
Collecting py-spy>=0.2.0 (from ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon)
  Downloading py_spy-0.4.0-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (16 kB)
Collecting opencensus (from ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon)
  Downloading opencensus-0.11.4-py2.py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: prometheus-client>=0.7.1 in /usr/local/lib/python3.11/dist-packages (from ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (0.21.1)
Requirement already satisfied: smart-open in /usr/local/lib/python3.11/dist-packages (from ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (7.1.0)
Collecting virtualenv!=20.21.1,>=20.0.24 (from ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon)
  Downloading virtualenv-20.29.1-py3-none-any.whl.metadata (4.5 kB)
Requirement already satisfied: grpcio>=1.42.0 in /usr/local/lib/python3.11/dist-packages (from ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (1.69.0)
Collecting memray (from ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon)
  Downloading memray-1.15.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
Collecting tensorboardX>=1.9 (from ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon)
  Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl.metadata (5.8 kB)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (2024.12.14)
Requirement already satisfied: imageio>=2.33 in /usr/local/lib/python3.11/dist-packages (from scikit-image<0.25.0,>=0.19.1->autogluon.multimodal==1.2->autogluon) (2.36.1)
Requirement already satisfied: tifffile>=2022.8.12 in /usr/local/lib/python3.11/dist-packages (from scikit-image<0.25.0,>=0.19.1->autogluon.multimodal==1.2->autogluon) (2024.12.12)
Requirement already satisfied: lazy-loader>=0.4 in /usr/local/lib/python3.11/dist-packages (from scikit-image<0.25.0,>=0.19.1->autogluon.multimodal==1.2->autogluon) (0.4)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn<1.5.3,>=1.4.0->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon) (3.5.0)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.11/dist-packages (from spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (3.0.12)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.11/dist-packages (from spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.11/dist-packages (from spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (1.0.11)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.11/dist-packages (from spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (2.0.10)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.11/dist-packages (from spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (3.0.9)
Requirement already satisfied: thinc<8.3.0,>=8.2.2 in /usr/local/lib/python3.11/dist-packages (from spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (8.2.5)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.11/dist-packages (from spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (1.1.3)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.11/dist-packages (from spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (2.5.0)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.11/dist-packages (from spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (2.0.10)
Requirement already satisfied: weasel<0.5.0,>=0.1.0 in /usr/local/lib/python3.11/dist-packages (from spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (0.4.1)
Requirement already satisfied: typer<1.0.0,>=0.3.0 in /usr/local/lib/python3.11/dist-packages (from spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (0.15.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.11/dist-packages (from spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (75.8.0)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.11/dist-packages (from spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (3.5.0)
Requirement already satisfied: statsmodels>=0.13.2 in /usr/local/lib/python3.11/dist-packages (from statsforecast<1.8,>=1.7.0->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (0.14.4)
Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.11/dist-packages (from tensorboard<3,>=2.9->autogluon.multimodal==1.2->autogluon) (1.4.0)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.11/dist-packages (from tensorboard<3,>=2.9->autogluon.multimodal==1.2->autogluon) (3.7)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.11/dist-packages (from tensorboard<3,>=2.9->autogluon.multimodal==1.2->autogluon) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from tensorboard<3,>=2.9->autogluon.multimodal==1.2->autogluon) (3.1.3)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.11/dist-packages (from torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (12.1.105)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.11/dist-packages (from torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (12.1.105)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.11/dist-packages (from torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (12.1.105)
Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.11/dist-packages (from torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (9.1.0.70)
Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.11/dist-packages (from torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (12.1.3.1)
Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.11/dist-packages (from torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (11.0.2.54)
Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.11/dist-packages (from torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (10.3.2.106)
Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.11/dist-packages (from torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (11.4.5.107)
Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.11/dist-packages (from torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (12.1.0.106)
Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (2.21.5)
Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.11/dist-packages (from torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (12.1.105)
Requirement already satisfied: triton==3.1.0 in /usr/local/lib/python3.11/dist-packages (from torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (3.1.0)
Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (1.13.1)
Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.11/dist-packages (from nvidia-cusolver-cu12==11.4.5.107->torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (12.6.85)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch<2.6,>=2.2->autogluon.multimodal==1.2->autogluon) (1.3.0)
Requirement already satisfied: tokenizers<0.22,>=0.21 in /usr/local/lib/python3.11/dist-packages (from transformers<5,>=4.38.0->transformers[sentencepiece]<5,>=4.38.0->autogluon.multimodal==1.2->autogluon) (0.21.0)
Requirement already satisfied: sentencepiece!=0.1.92,>=0.1.91 in /usr/local/lib/python3.11/dist-packages (from transformers[sentencepiece]<5,>=4.38.0->autogluon.multimodal==1.2->autogluon) (0.2.0)
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp>=3.7->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (2.4.4)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.11/dist-packages (from aiohttp>=3.7->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (6.1.0)
Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp>=3.7->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (0.2.1)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp>=3.7->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (1.18.3)
Collecting dill (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==1.2->autogluon)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting multiprocess (from evaluate<0.5.0,>=0.4.0->autogluon.multimodal==1.2->autogluon)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec (from mlforecast==0.13.4->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.11/dist-packages (from gdown>=4.0.0->nlpaug<1.2.0,>=1.1.10->autogluon.multimodal==1.2->autogluon) (4.12.3)
Requirement already satisfied: language-data>=1.2 in /usr/local/lib/python3.11/dist-packages (from langcodes<4.0.0,>=3.2.0->spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (1.3.0)
Requirement already satisfied: llvmlite<0.44,>=0.43.0dev0 in /usr/local/lib/python3.11/dist-packages (from numba->mlforecast==0.13.4->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (0.43.0)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<3,>=1.7->gluonts<0.17,>=0.15.0->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (0.7.0)
Requirement already satisfied: pydantic-core==2.27.2 in /usr/local/lib/python3.11/dist-packages (from pydantic<3,>=1.7->gluonts<0.17,>=0.15.0->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (2.27.2)
Requirement already satisfied: patsy>=0.5.6 in /usr/local/lib/python3.11/dist-packages (from statsmodels>=0.13.2->statsforecast<1.8,>=1.7.0->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (1.0.1)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.11/dist-packages (from thinc<8.3.0,>=8.2.2->spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (0.7.11)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.11/dist-packages (from thinc<8.3.0,>=8.2.2->spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (0.1.5)
Collecting fs (from triad>=0.9.7->fugue>=0.9.0->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading fs-2.4.16-py2.py3-none-any.whl.metadata (6.3 kB)
Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.11/dist-packages (from typer<1.0.0,>=0.3.0->spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (1.5.4)
Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.11/dist-packages (from rich->openmim<0.4.0,>=0.3.7->autogluon.multimodal==1.2->autogluon) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.11/dist-packages (from rich->openmim<0.4.0,>=0.3.7->autogluon.multimodal==1.2->autogluon) (2.18.0)
Collecting distlib<1,>=0.3.7 (from virtualenv!=20.21.1,>=20.0.24->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon)
  Downloading distlib-0.3.9-py2.py3-none-any.whl.metadata (5.2 kB)
Requirement already satisfied: platformdirs<5,>=3.9.1 in /usr/local/lib/python3.11/dist-packages (from virtualenv!=20.21.1,>=20.0.24->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (4.3.6)
Requirement already satisfied: cloudpathlib<1.0.0,>=0.7.0 in /usr/local/lib/python3.11/dist-packages (from weasel<0.5.0,>=0.1.0->spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (0.20.0)
Requirement already satisfied: wrapt in /usr/local/lib/python3.11/dist-packages (from smart-open->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (1.17.0)
Collecting textual>=0.41.0 (from memray->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon)
  Downloading textual-1.0.0-py3-none-any.whl.metadata (9.0 kB)
Collecting ordered-set (from model-index->openmim<0.4.0,>=0.3.7->autogluon.multimodal==1.2->autogluon)
  Downloading ordered_set-4.1.0-py3-none-any.whl.metadata (5.3 kB)
Collecting opencensus-context>=0.1.3 (from opencensus->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon)
  Downloading opencensus_context-0.1.3-py2.py3-none-any.whl.metadata (3.3 kB)
Requirement already satisfied: google-api-core<3.0.0,>=1.0.0 in /usr/local/lib/python3.11/dist-packages (from opencensus->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (2.19.2)
Collecting pycryptodome (from opendatalab->openmim<0.4.0,>=0.3.7->autogluon.multimodal==1.2->autogluon)
  Downloading pycryptodome-3.21.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting openxlab (from opendatalab->openmim<0.4.0,>=0.3.7->autogluon.multimodal==1.2->autogluon)
  Downloading openxlab-0.1.2-py3-none-any.whl.metadata (3.8 kB)
Collecting alembic>=1.5.0 (from optuna->mlforecast==0.13.4->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading alembic-1.14.1-py3-none-any.whl.metadata (7.4 kB)
Collecting colorlog (from optuna->mlforecast==0.13.4->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: sqlalchemy>=1.4.2 in /usr/local/lib/python3.11/dist-packages (from optuna->mlforecast==0.13.4->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (2.0.37)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.11/dist-packages (from plotly->catboost<1.3,>=1.2->autogluon.tabular[all]==1.2->autogluon) (9.0.0)
Collecting Mako (from alembic>=1.5.0->optuna->mlforecast==0.13.4->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading Mako-1.3.8-py3-none-any.whl.metadata (2.9 kB)
Requirement already satisfied: googleapis-common-protos<2.0.dev0,>=1.56.2 in /usr/local/lib/python3.11/dist-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (1.66.0)
Requirement already satisfied: proto-plus<2.0.0dev,>=1.22.3 in /usr/local/lib/python3.11/dist-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (1.25.0)
Requirement already satisfied: google-auth<3.0.dev0,>=2.14.1 in /usr/local/lib/python3.11/dist-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (2.27.0)
Requirement already satisfied: marisa-trie>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spacy<3.8->autogluon.tabular[all]==1.2->autogluon) (1.2.1)
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.11/dist-packages (from markdown-it-py>=2.2.0->rich->openmim<0.4.0,>=0.3.7->autogluon.multimodal==1.2->autogluon) (0.1.2)
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.11/dist-packages (from sqlalchemy>=1.4.2->optuna->mlforecast==0.13.4->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon) (3.1.1)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.11/dist-packages (from beautifulsoup4->gdown>=4.0.0->nlpaug<1.2.0,>=1.1.10->autogluon.multimodal==1.2->autogluon) (2.6)
Collecting appdirs~=1.4.3 (from fs->triad>=0.9.7->fugue>=0.9.0->autogluon.timeseries==1.2->autogluon.timeseries[all]==1.2->autogluon)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting filelock (from ray<2.40,>=2.10.0->ray[default]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon)
  Downloading filelock-3.14.0-py3-none-any.whl.metadata (2.8 kB)
Collecting oss2~=2.17.0 (from openxlab->opendatalab->openmim<0.4.0,>=0.3.7->autogluon.multimodal==1.2->autogluon)
  Downloading oss2-2.17.0.tar.gz (259 kB)
  Preparing metadata (setup.py) ... ?25l?25hdone
Collecting pytz>=2020.1 (from pandas<2.3.0,>=2.0.0->autogluon.core==1.2->autogluon.core[all]==1.2->autogluon)
  Downloading pytz-2023.4-py2.py3-none-any.whl.metadata (22 kB)
INFO: pip is looking at multiple versions of openxlab to determine which version is compatible with other requirements. This could take a while.
Collecting openxlab (from opendatalab->openmim<0.4.0,>=0.3.7->autogluon.multimodal==1.2->autogluon)
  Downloading openxlab-0.1.1-py3-none-any.whl.metadata (3.8 kB)
  Downloading openxlab-0.1.0-py3-none-any.whl.metadata (3.8 kB)
  Downloading openxlab-0.0.38-py3-none-any.whl.metadata (3.8 kB)
  Downloading openxlab-0.0.37-py3-none-any.whl.metadata (3.8 kB)
  Downloading openxlab-0.0.36-py3-none-any.whl.metadata (3.8 kB)
  Downloading openxlab-0.0.35-py3-none-any.whl.metadata (3.8 kB)
  Downloading openxlab-0.0.34-py3-none-any.whl.metadata (3.8 kB)
INFO: pip is still looking at multiple versions of openxlab to determine which version is compatible with other requirements. This could take a while.
  Downloading openxlab-0.0.33-py3-none-any.whl.metadata (3.8 kB)
  Downloading openxlab-0.0.32-py3-none-any.whl.metadata (3.8 kB)
  Downloading openxlab-0.0.31-py3-none-any.whl.metadata (3.8 kB)
  Downloading openxlab-0.0.30-py3-none-any.whl.metadata (3.8 kB)
  Downloading openxlab-0.0.29-py3-none-any.whl.metadata (3.8 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
  Downloading openxlab-0.0.28-py3-none-any.whl.metadata (3.7 kB)
  Downloading openxlab-0.0.27-py3-none-any.whl.metadata (3.7 kB)
  Downloading openxlab-0.0.26-py3-none-any.whl.metadata (3.7 kB)
  Downloading openxlab-0.0.25-py3-none-any.whl.metadata (3.7 kB)
  Downloading openxlab-0.0.24-py3-none-any.whl.metadata (3.7 kB)
  Downloading openxlab-0.0.23-py3-none-any.whl.metadata (3.7 kB)
  Downloading openxlab-0.0.22-py3-none-any.whl.metadata (3.7 kB)
  Downloading openxlab-0.0.21-py3-none-any.whl.metadata (3.7 kB)
  Downloading openxlab-0.0.20-py3-none-any.whl.metadata (3.7 kB)
  Downloading openxlab-0.0.19-py3-none-any.whl.metadata (3.7 kB)
  Downloading openxlab-0.0.18-py3-none-any.whl.metadata (3.7 kB)
  Downloading openxlab-0.0.17-py3-none-any.whl.metadata (3.7 kB)
  Downloading openxlab-0.0.16-py3-none-any.whl.metadata (3.8 kB)
  Downloading openxlab-0.0.15-py3-none-any.whl.metadata (3.8 kB)
  Downloading openxlab-0.0.14-py3-none-any.whl.metadata (3.8 kB)
  Downloading openxlab-0.0.13-py3-none-any.whl.metadata (4.5 kB)
  Downloading openxlab-0.0.12-py3-none-any.whl.metadata (4.5 kB)
  Downloading openxlab-0.0.11-py3-none-any.whl.metadata (4.3 kB)
Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /usr/local/lib/python3.11/dist-packages (from requests[socks]->gdown>=4.0.0->nlpaug<1.2.0,>=1.1.10->autogluon.multimodal==1.2->autogluon) (1.7.1)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (5.5.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.11/dist-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (0.4.1)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.11/dist-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (4.9)
Requirement already satisfied: linkify-it-py<3,>=1 in /usr/local/lib/python3.11/dist-packages (from markdown-it-py[linkify,plugins]>=2.1.0->textual>=0.41.0->memray->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (2.0.3)
Requirement already satisfied: mdit-py-plugins in /usr/local/lib/python3.11/dist-packages (from markdown-it-py[linkify,plugins]>=2.1.0->textual>=0.41.0->memray->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (0.4.2)
Requirement already satisfied: uc-micro-py in /usr/local/lib/python3.11/dist-packages (from linkify-it-py<3,>=1->markdown-it-py[linkify,plugins]>=2.1.0->textual>=0.41.0->memray->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (1.0.3)
Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in /usr/local/lib/python3.11/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3.0.dev0,>=2.14.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[default,tune]<2.40,>=2.10.0; extra == "all"->autogluon.core[all]==1.2->autogluon) (0.6.1)
Downloading autogluon-1.2-py3-none-any.whl (9.6 kB)
Downloading autogluon.core-1.2-py3-none-any.whl (266 kB)
Downloading autogluon.features-1.2-py3-none-any.whl (64 kB)
Downloading autogluon.multimodal-1.2-py3-none-any.whl (429 kB)
Downloading autogluon.tabular-1.2-py3-none-any.whl (352 kB)
Downloading autogluon.timeseries-1.2-py3-none-any.whl (174 kB)
Downloading autogluon.common-1.2-py3-none-any.whl (68 kB)
Downloading coreforecast-0.0.12-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (196 kB)
Downloading mlforecast-0.13.4-py3-none-any.whl (70 kB)
Downloading accelerate-0.34.2-py3-none-any.whl (324 kB)
Downloading boto3-1.36.4-py3-none-any.whl (139 kB)
Downloading catboost-1.2.7-cp311-cp311-manylinux2014_x86_64.whl (98.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.7/98.7 MB 52.7 MB/s eta 0:00:00
?25hDownloading evaluate-0.4.3-py3-none-any.whl (84 kB)
Downloading fugue-0.9.1-py3-none-any.whl (278 kB)
Downloading gluonts-0.16.0-py3-none-any.whl (1.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 69.7 MB/s eta 0:00:00
?25hDownloading jsonschema-4.21.1-py3-none-any.whl (85 kB)
Downloading lightning-2.5.0.post0-py3-none-any.whl (815 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 815.2/815.2 kB 39.1 MB/s eta 0:00:00
?25hDownloading nlpaug-1.1.11-py3-none-any.whl (410 kB)
Downloading nltk-3.8.1-py3-none-any.whl (1.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 5.8 MB/s eta 0:00:00
?25hDownloading omegaconf-2.2.3-py3-none-any.whl (79 kB)
Downloading openmim-0.3.9-py2.py3-none-any.whl (52 kB)
Downloading pdf2image-1.17.0-py3-none-any.whl (11 kB)
Downloading pytesseract-0.3.10-py3-none-any.whl (14 kB)
Downloading pytorch_metric_learning-2.3.0-py3-none-any.whl (115 kB)
Downloading ray-2.39.0-cp311-cp311-manylinux2014_x86_64.whl (66.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.4/66.4 MB 46.6 MB/s eta 0:00:00
?25hDownloading scikit_image-0.24.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.9/14.9 MB 108.5 MB/s eta 0:00:00
?25hDownloading scikit_learn-1.5.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.3/13.3 MB 98.3 MB/s eta 0:00:00
?25hDownloading statsforecast-1.7.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (315 kB)
Downloading timm-1.0.3-py3-none-any.whl (2.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 81.3 MB/s eta 0:00:00
?25hDownloading torchmetrics-1.2.1-py3-none-any.whl (806 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 806.1/806.1 kB 35.7 MB/s eta 0:00:00
?25hDownloading utilsforecast-0.2.4-py3-none-any.whl (40 kB)
Downloading pytorch_lightning-2.5.0.post0-py3-none-any.whl (819 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 819.3/819.3 kB 41.1 MB/s eta 0:00:00
?25hDownloading adagio-0.2.6-py3-none-any.whl (19 kB)
Downloading botocore-1.36.4-py3-none-any.whl (13.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.3/13.3 MB 163.4 MB/s eta 0:00:00
?25hDownloading datasets-3.2.0-py3-none-any.whl (480 kB)
Downloading dill-0.3.8-py3-none-any.whl (116 kB)
Downloading fsspec-2024.9.0-py3-none-any.whl (179 kB)
Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)
Downloading lightning_utilities-0.11.9-py3-none-any.whl (28 kB)
Downloading multiprocess-0.70.16-py311-none-any.whl (143 kB)
Downloading py_spy-0.4.0-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (2.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.7/2.7 MB 106.6 MB/s eta 0:00:00
?25hDownloading s3transfer-0.11.1-py3-none-any.whl (84 kB)
Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl (101 kB)
Downloading triad-0.9.8-py3-none-any.whl (62 kB)
Downloading virtualenv-20.29.1-py3-none-any.whl (4.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 MB 118.4 MB/s eta 0:00:00
?25hDownloading aiohttp_cors-0.7.0-py3-none-any.whl (27 kB)
Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Downloading colorful-0.5.6-py2.py3-none-any.whl (201 kB)
Downloading memray-1.15.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.4/8.4 MB 105.0 MB/s eta 0:00:00
?25hDownloading model_index-0.1.11-py3-none-any.whl (34 kB)
Downloading opencensus-0.11.4-py2.py3-none-any.whl (128 kB)
Downloading opendatalab-0.0.10-py3-none-any.whl (29 kB)
Downloading optuna-4.2.0-py3-none-any.whl (383 kB)
Downloading window_ops-0.0.15-py3-none-any.whl (15 kB)
Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
Downloading alembic-1.14.1-py3-none-any.whl (233 kB)
Downloading distlib-0.3.9-py2.py3-none-any.whl (468 kB)
Downloading opencensus_context-0.1.3-py2.py3-none-any.whl (5.1 kB)
Downloading textual-1.0.0-py3-none-any.whl (660 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 660.5/660.5 kB 32.4 MB/s eta 0:00:00
?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading fs-2.4.16-py2.py3-none-any.whl (135 kB)
Downloading openxlab-0.0.11-py3-none-any.whl (55 kB)
Downloading ordered_set-4.1.0-py3-none-any.whl (7.6 kB)
Downloading pycryptodome-3.21.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 87.7 MB/s eta 0:00:00
?25hDownloading appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Downloading Mako-1.3.8-py3-none-any.whl (78 kB)
Building wheels for collected packages: nvidia-ml-py3, antlr4-python3-runtime, seqeval
  Building wheel for nvidia-ml-py3 (setup.py) ... ?25l?25hdone
  Created wheel for nvidia-ml-py3: filename=nvidia_ml_py3-7.352.0-py3-none-any.whl size=19207 sha256=6a870e78e28999dbafc4492e4548d5ee6f4d9973570c41e921429858727cf28d
  Stored in directory: /root/.cache/pip/wheels/47/50/9e/29dc79037d74c3c1bb4a8661fb608e8674b7e4260d6a3f8f51
  Building wheel for antlr4-python3-runtime (setup.py) ... ?25l?25hdone
  Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144591 sha256=1094f5480fbe98d044e7ecea3136c65a4e23982b80842eb60451648cdddecf57
  Stored in directory: /root/.cache/pip/wheels/1a/97/32/461f837398029ad76911109f07047fde1d7b661a147c7c56d1
  Building wheel for seqeval (setup.py) ... ?25l?25hdone
  Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16217 sha256=6794a5d2c39f901ecd92514987fca3b151a66d7b943c20eca2deb3119f5ef23b
  Stored in directory: /root/.cache/pip/wheels/bc/92/f0/243288f899c2eacdfa8c5f9aede4c71a9bad0ee26a01dc5ead
Successfully built nvidia-ml-py3 antlr4-python3-runtime seqeval
Installing collected packages: py-spy, opencensus-context, nvidia-ml-py3, distlib, colorful, appdirs, antlr4-python3-runtime, xxhash, virtualenv, tensorboardX, pytesseract, pycryptodome, pdf2image, ordered-set, openxlab, omegaconf, nltk, Mako, lightning-utilities, jmespath, fsspec, fs, dill, coreforecast, colorlog, colorama, window-ops, scikit-learn, scikit-image, multiprocess, model-index, botocore, alembic, utilsforecast, triad, seqeval, s3transfer, optuna, opendatalab, jsonschema, gluonts, catboost, aiohttp-cors, torchmetrics, textual, ray, pytorch-metric-learning, openmim, opencensus, nlpaug, mlforecast, datasets, boto3, adagio, accelerate, timm, pytorch-lightning, memray, fugue, evaluate, autogluon.common, statsforecast, lightning, autogluon.features, autogluon.core, autogluon.tabular, autogluon.multimodal, autogluon.timeseries, autogluon
  Attempting uninstall: nltk
    Found existing installation: nltk 3.9.1
    Uninstalling nltk-3.9.1:
      Successfully uninstalled nltk-3.9.1
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2024.10.0
    Uninstalling fsspec-2024.10.0:
      Successfully uninstalled fsspec-2024.10.0
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 1.6.0
    Uninstalling scikit-learn-1.6.0:
      Successfully uninstalled scikit-learn-1.6.0
  Attempting uninstall: scikit-image
    Found existing installation: scikit-image 0.25.0
    Uninstalling scikit-image-0.25.0:
      Successfully uninstalled scikit-image-0.25.0
  Attempting uninstall: jsonschema
    Found existing installation: jsonschema 4.23.0
    Uninstalling jsonschema-4.23.0:
      Successfully uninstalled jsonschema-4.23.0
  Attempting uninstall: accelerate
    Found existing installation: accelerate 1.2.1
    Uninstalling accelerate-1.2.1:
      Successfully uninstalled accelerate-1.2.1
  Attempting uninstall: timm
    Found existing installation: timm 1.0.13
    Uninstalling timm-1.0.13:
      Successfully uninstalled timm-1.0.13
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible.
Successfully installed Mako-1.3.8 accelerate-0.34.2 adagio-0.2.6 aiohttp-cors-0.7.0 alembic-1.14.1 antlr4-python3-runtime-4.9.3 appdirs-1.4.4 autogluon-1.2 autogluon.common-1.2 autogluon.core-1.2 autogluon.features-1.2 autogluon.multimodal-1.2 autogluon.tabular-1.2 autogluon.timeseries-1.2 boto3-1.36.4 botocore-1.36.4 catboost-1.2.7 colorama-0.4.6 colorful-0.5.6 colorlog-6.9.0 coreforecast-0.0.12 datasets-3.2.0 dill-0.3.8 distlib-0.3.9 evaluate-0.4.3 fs-2.4.16 fsspec-2024.9.0 fugue-0.9.1 gluonts-0.16.0 jmespath-1.0.1 jsonschema-4.21.1 lightning-2.5.0.post0 lightning-utilities-0.11.9 memray-1.15.0 mlforecast-0.13.4 model-index-0.1.11 multiprocess-0.70.16 nlpaug-1.1.11 nltk-3.8.1 nvidia-ml-py3-7.352.0 omegaconf-2.2.3 opencensus-0.11.4 opencensus-context-0.1.3 opendatalab-0.0.10 openmim-0.3.9 openxlab-0.0.11 optuna-4.2.0 ordered-set-4.1.0 pdf2image-1.17.0 py-spy-0.4.0 pycryptodome-3.21.0 pytesseract-0.3.10 pytorch-lightning-2.5.0.post0 pytorch-metric-learning-2.3.0 ray-2.39.0 s3transfer-0.11.1 scikit-image-0.24.0 scikit-learn-1.5.2 seqeval-1.2.2 statsforecast-1.7.8 tensorboardX-2.6.2.2 textual-1.0.0 timm-1.0.3 torchmetrics-1.2.1 triad-0.9.8 utilsforecast-0.2.4 virtualenv-20.29.1 window-ops-0.0.15 xxhash-3.5.0
CPU times: user 592 ms, sys: 176 ms, total: 769 ms
Wall time: 1min 18s
from autogluon.tabular import TabularPredictor
import pandas as pd
from sklearn.model_selection import train_test_split
# Read data
df = pd.read_csv('NLP_data/Sentences_AllAgree.txt', sep=".@", header=None, engine='python', encoding = "ISO-8859-1")  # Finbert data
# df = pd.read_csv('NLP_data/Sentences_AllAgree.txt', sep=".@", header=None, engine='python', encoding = "utf-8")  # Finbert data
# tmp = pd.read_csv('NLP_data/Sentences_75Agree.txt', sep=".@", header=None, engine='python')
# df = pd.concat([df,tmp])
# tmp = pd.read_csv('NLP_data/Sentences_66Agree.txt', sep=".@", header=None, engine='python')
# df = pd.concat([df,tmp])
# tmp = pd.read_csv('NLP_data/Sentences_50Agree.txt', sep=".@", header=None, engine='python')
# df = pd.concat([df,tmp])
df.columns = ["Text","Label"]
print(df.shape)
df.head()
(2264, 2)
Text Label
0 According to Gran , the company has no plans t... neutral
1 For the last quarter of 2010 , Componenta 's n... positive
2 In the third quarter of 2010 , net sales incre... positive
3 Operating profit rose to EUR 13.1 mn from EUR ... positive
4 Operating profit totalled EUR 21.1 mn , up fro... positive
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='Label', data=df)
plt.show()
_images/39c8e561d69fa4728b2fdadea809ec724f8f6cef9cca95f7207cde74c63a3b6e.png

27.2. Fit the model#

The next few lines of code are all that are needed to train the model. It is remarkable in its parsimony!

The vectorization of the text adjusts the size of the vocabulary so that it uses the available memory efficiently.

!pip install dask[dataframe]
Requirement already satisfied: dask[dataframe] in /usr/local/lib/python3.11/dist-packages (2024.10.0)
Requirement already satisfied: click>=8.1 in /usr/local/lib/python3.11/dist-packages (from dask[dataframe]) (8.1.8)
Requirement already satisfied: cloudpickle>=3.0.0 in /usr/local/lib/python3.11/dist-packages (from dask[dataframe]) (3.1.0)
Requirement already satisfied: fsspec>=2021.09.0 in /usr/local/lib/python3.11/dist-packages (from dask[dataframe]) (2024.9.0)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from dask[dataframe]) (24.2)
Requirement already satisfied: partd>=1.4.0 in /usr/local/lib/python3.11/dist-packages (from dask[dataframe]) (1.4.2)
Requirement already satisfied: pyyaml>=5.3.1 in /usr/local/lib/python3.11/dist-packages (from dask[dataframe]) (6.0.2)
Requirement already satisfied: toolz>=0.10.0 in /usr/local/lib/python3.11/dist-packages (from dask[dataframe]) (0.12.1)
Requirement already satisfied: importlib-metadata>=4.13.0 in /usr/local/lib/python3.11/dist-packages (from dask[dataframe]) (8.5.0)
Requirement already satisfied: pandas>=2.0 in /usr/local/lib/python3.11/dist-packages (from dask[dataframe]) (2.2.2)
Collecting dask-expr<1.2,>=1.1 (from dask[dataframe])
  Downloading dask_expr-1.1.21-py3-none-any.whl.metadata (2.6 kB)
INFO: pip is looking at multiple versions of dask-expr to determine which version is compatible with other requirements. This could take a while.
  Downloading dask_expr-1.1.20-py3-none-any.whl.metadata (2.6 kB)
  Downloading dask_expr-1.1.19-py3-none-any.whl.metadata (2.6 kB)
  Downloading dask_expr-1.1.18-py3-none-any.whl.metadata (2.6 kB)
  Downloading dask_expr-1.1.16-py3-none-any.whl.metadata (2.5 kB)
Requirement already satisfied: pyarrow>=14.0.1 in /usr/local/lib/python3.11/dist-packages (from dask-expr<1.2,>=1.1->dask[dataframe]) (17.0.0)
Requirement already satisfied: zipp>=3.20 in /usr/local/lib/python3.11/dist-packages (from importlib-metadata>=4.13.0->dask[dataframe]) (3.21.0)
Requirement already satisfied: numpy>=1.23.2 in /usr/local/lib/python3.11/dist-packages (from pandas>=2.0->dask[dataframe]) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas>=2.0->dask[dataframe]) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas>=2.0->dask[dataframe]) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas>=2.0->dask[dataframe]) (2024.2)
Requirement already satisfied: locket in /usr/local/lib/python3.11/dist-packages (from partd>=1.4.0->dask[dataframe]) (1.0.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.8.2->pandas>=2.0->dask[dataframe]) (1.17.0)
Downloading dask_expr-1.1.16-py3-none-any.whl (243 kB)
Installing collected packages: dask-expr
Successfully installed dask-expr-1.1.16
%%time
#TRAIN THE MODEL

train_data, test_data = train_test_split(df, test_size=0.3, random_state=42)
print("Train size =",train_data.shape," | Test size =",test_data.shape)

predictor = TabularPredictor(label='Label').fit(train_data=train_data) #,    hyperparameters='multimodal')

# predictor = task.fit(train_data=train_data, label='Label')
performance = predictor.evaluate(train_data)
No path specified. Models will be saved in: "AutogluonModels/ag-20250123_044506"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.2
Python Version:     3.11.11
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024
CPU Count:          2
Memory Avail:       10.72 GB / 12.67 GB (84.6%)
Disk Space Avail:   76.49 GB / 112.64 GB (67.9%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Train size = (1584, 2)  | Test size = (680, 2)
Beginning AutoGluon training ...
AutoGluon will save models to "/content/drive/MyDrive/Books_Writings/NLPBook/AutogluonModels/ag-20250123_044506"
Train Data Rows:    1584
Train Data Columns: 1
Label Column:       Label
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object).
	3 unique label values:  ['neutral', 'positive', 'negative']
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       multiclass
Preprocessing data ...
Train Data Class Count: 3
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    10982.39 MB
	Train Data (Original)  Memory Usage: 0.27 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
		Fitting TextSpecialFeatureGenerator...
			Fitting BinnedFeatureGenerator...
			Fitting DropDuplicatesFeatureGenerator...
		Fitting TextNgramFeatureGenerator...
			Fitting CountVectorizer for text features: ['Text']
			CountVectorizer fit with vocabulary size = 186
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('object', ['text']) : 1 | ['Text']
	Types of features in processed data (raw dtype, special dtypes):
		('category', ['text_as_category'])  :   1 | ['Text']
		('int', ['binned', 'text_special']) :  20 | ['Text.char_count', 'Text.word_count', 'Text.capital_ratio', 'Text.lower_ratio', 'Text.digit_ratio', ...]
		('int', ['text_ngram'])             : 180 | ['__nlp__.000', '__nlp__.10', '__nlp__.11', '__nlp__.12', '__nlp__.20', ...]
	2.9s = Fit runtime
	1 features in original data used to generate 201 features in processed data.
	Train Data (Processed) Memory Usage: 0.58 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 2.98s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 1267, Val Rows: 317
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{}],
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, {'learning_rate': 0.03, 'num_leaves': 128, 'feature_fraction': 0.9, 'min_data_in_leaf': 3, 'ag_args': {'name_suffix': 'Large', 'priority': 0, 'hyperparameter_tune_kwargs': None}}],
	'CAT': [{}],
	'XGB': [{}],
	'FASTAI': [{}],
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}
Fitting 13 L1 models, fit_strategy="sequential" ...
Fitting model: KNeighborsUnif ...
	0.7634	 = Validation score   (accuracy)
	0.05s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: KNeighborsDist ...
	0.776	 = Validation score   (accuracy)
	0.04s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: NeuralNetFastAI ...
	0.7382	 = Validation score   (accuracy)
	5.66s	 = Training   runtime
	0.02s	 = Validation runtime
Fitting model: LightGBMXT ...
	0.8801	 = Validation score   (accuracy)
	3.79s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: LightGBM ...
	0.8549	 = Validation score   (accuracy)
	0.97s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: RandomForestGini ...
	0.8612	 = Validation score   (accuracy)
	1.34s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: RandomForestEntr ...
	0.8549	 = Validation score   (accuracy)
	1.55s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: CatBoost ...
	0.858	 = Validation score   (accuracy)
	4.1s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: ExtraTreesGini ...
	0.8549	 = Validation score   (accuracy)
	0.8s	 = Training   runtime
	0.07s	 = Validation runtime
Fitting model: ExtraTreesEntr ...
	0.8612	 = Validation score   (accuracy)
	0.91s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: XGBoost ...
	0.858	 = Validation score   (accuracy)
	2.16s	 = Training   runtime
	0.02s	 = Validation runtime
Fitting model: NeuralNetTorch ...
	0.7224	 = Validation score   (accuracy)
	7.84s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: LightGBMLarge ...
	0.8644	 = Validation score   (accuracy)
	2.17s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	Ensemble Weights: {'LightGBMXT': 1.0}
	0.8801	 = Validation score   (accuracy)
	0.13s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 36.02s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 29143.1 rows/s (317 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/content/drive/MyDrive/Books_Writings/NLPBook/AutogluonModels/ag-20250123_044506")
CPU times: user 29.9 s, sys: 2.23 s, total: 32.1 s
Wall time: 36.5 s
# TEST OUT-OF-SAMPLE

y_test = test_data['Label']
test_data_nolabel = test_data.drop(labels=['Label'],axis=1)
y_pred = predictor.predict(test_data_nolabel)
y_prob = predictor.predict(test_data_nolabel)
perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=True)
print(perf)
{'accuracy': 0.8720588235294118, 'balanced_accuracy': 0.7908290922121237, 'mcc': 0.7572923133729487}

27.3. Metrics#

https://en.wikipedia.org/wiki/Receiver_operating_characteristic

https://srdas.github.io/MLBook2/3_MachineLearningOverview.html

https://srdas.github.io/MLBook2/3_MachineLearningOverview.html#ROC-and-AUC

27.4. Movie Reviews, one more time, with AG-Tabular#

train_data = pd.read_csv("NLP_data/movie_review_train.txt", sep = " ", header=None)
test_data = pd.read_csv("NLP_data/movie_review_test.txt", sep = " ", header=None)
train_data.columns = ['Label','Text']
test_data.columns = ['Label','Text']
print(train_data.shape, test_data.shape)
train_data.head()
(4001, 2) (1000, 2)
Label Text
0 __label__0 Homelessness (or Houselessness as George Carlin stated) has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school, work, or vote for the matter. Most people think of the homeless as just a lost cause while worrying about things such as racism, the war on Iraq, pressuring kids to succeed, technology, the elections, inflation, or worrying if they'll be next to end up on the streets.<br /><br />But what if you were given a bet to live on the streets for a month without the luxuries you once had from a home,...
1 __label__1 This film lacked something I couldn't put my finger on at first: charisma on the part of the leading actress. This inevitably translated to lack of chemistry when she shared the screen with her leading man. Even the romantic scenes came across as being merely the actors at play. It could very well have been the director who miscalculated what he needed from the actors. I just don't know.<br /><br />But could it have been the screenplay? Just exactly who was the chef in love with? He seemed more enamored of his culinary skills and restaurant, and ultimately of himself and his youthful explo...
2 __label__1 \"It appears that many critics find the idea of a Woody Allen drama unpalatable.\" And for good reason: they are unbearably wooden and pretentious imitations of Bergman. And let's not kid ourselves: critics were mostly supportive of Allen's Bergman pretensions, Allen's whining accusations to the contrary notwithstanding. What I don't get is this: why was Allen generally applauded for his originality in imitating Bergman, but the contemporaneous Brian DePalma was excoriated for \"ripping off\" Hitchcock in his suspense/horror films? In Robin Wood's view, it's a strange form of cultural snob...
3 __label__0 This isn't the comedic Robin Williams, nor is it the quirky/insane Robin Williams of recent thriller fame. This is a hybrid of the classic drama without over-dramatization, mixed with Robin's new love of the thriller. But this isn't a thriller, per se. This is more a mystery/suspense vehicle through which Williams attempts to locate a sick boy and his keeper.<br /><br />Also starring Sandra Oh and Rory Culkin, this Suspense Drama plays pretty much like a news report, until William's character gets close to achieving his goal.<br /><br />I must say that I was highly entertained, though this...
4 __label__1 I don't know who to blame, the timid writers or the clueless director. It seemed to be one of those movies where so much was paid to the stars (Angie, Charlie, Denise, Rosanna and Jon) that there wasn't enough left to really make a movie. This could have been very entertaining, but there was a veil of timidity, even cowardice, that hung over each scene. Since it got an R rating anyway why was the ubiquitous bubble bath scene shot with a 70-year-old woman and not Angie Harmon? Why does Sheen sleepwalk through potentially hot relationships WITH TWO OF THE MOST BEAUTIFUL AND SEXY ACTRESSES in...
%%time
#TRAIN THE MODEL

print("Train size =",train_data.shape," | Test size =",test_data.shape)

predictor = TabularPredictor(label='Label').fit(train_data=train_data) #,    hyperparameters='multimodal')
performance = predictor.evaluate(train_data)
No path specified. Models will be saved in: "AutogluonModels/ag-20250120_220737"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.2
Python Version:     3.11.11
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024
CPU Count:          2
Memory Avail:       10.91 GB / 12.67 GB (86.1%)
Disk Space Avail:   76.38 GB / 112.64 GB (67.8%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/content/drive/My Drive/Books_Writings/NLPBook/AutogluonModels/ag-20250120_220737"
Train Data Rows:    4001
Train Data Columns: 1
Label Column:       Label
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
Train size = (4001, 2)  | Test size = (1000, 2)
	2 unique label values:  ['__label__0', '__label__1']
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       binary
Preprocessing data ...
Selected class <--> label mapping:  class 1 = __label__1, class 0 = __label__0
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive (__label__1) vs negative (__label__0) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    11179.74 MB
	Train Data (Original)  Memory Usage: 5.25 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
		Fitting TextSpecialFeatureGenerator...
			Fitting BinnedFeatureGenerator...
			Fitting DropDuplicatesFeatureGenerator...
		Fitting TextNgramFeatureGenerator...
			Fitting CountVectorizer for text features: ['Text']
			CountVectorizer fit with vocabulary size = 5515
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('object', ['text']) : 1 | ['Text']
	Types of features in processed data (raw dtype, special dtypes):
		('category', ['text_as_category'])  :    1 | ['Text']
		('int', ['binned', 'text_special']) :   30 | ['Text.char_count', 'Text.word_count', 'Text.capital_ratio', 'Text.lower_ratio', 'Text.digit_ratio', ...]
		('int', ['text_ngram'])             : 5437 | ['__nlp__.000', '__nlp__.10', '__nlp__.10 10', '__nlp__.100', '__nlp__.11', ...]
	57.5s = Fit runtime
	1 features in original data used to generate 5468 features in processed data.
	Train Data (Processed) Memory Usage: 41.61 MB (0.4% of available memory)
Data preprocessing and feature engineering runtime = 58.39s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.12496875781054737, Train Rows: 3501, Val Rows: 500
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{}],
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, {'learning_rate': 0.03, 'num_leaves': 128, 'feature_fraction': 0.9, 'min_data_in_leaf': 3, 'ag_args': {'name_suffix': 'Large', 'priority': 0, 'hyperparameter_tune_kwargs': None}}],
	'CAT': [{}],
	'XGB': [{}],
	'FASTAI': [{}],
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}
Fitting 13 L1 models, fit_strategy="sequential" ...
Fitting model: KNeighborsUnif ...
	0.59	 = Validation score   (accuracy)
	1.67s	 = Training   runtime
	0.49s	 = Validation runtime
Fitting model: KNeighborsDist ...
	0.59	 = Validation score   (accuracy)
	2.31s	 = Training   runtime
	0.79s	 = Validation runtime
Fitting model: LightGBMXT ...
	0.88	 = Validation score   (accuracy)
	11.28s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: LightGBM ...
	0.868	 = Validation score   (accuracy)
	13.74s	 = Training   runtime
	0.04s	 = Validation runtime
Fitting model: RandomForestGini ...
	0.856	 = Validation score   (accuracy)
	12.95s	 = Training   runtime
	0.08s	 = Validation runtime
Fitting model: RandomForestEntr ...
	0.846	 = Validation score   (accuracy)
	13.67s	 = Training   runtime
	0.09s	 = Validation runtime
Fitting model: CatBoost ...
	0.858	 = Validation score   (accuracy)
	43.61s	 = Training   runtime
	0.18s	 = Validation runtime
Fitting model: ExtraTreesGini ...
	0.842	 = Validation score   (accuracy)
	15.13s	 = Training   runtime
	0.08s	 = Validation runtime
Fitting model: ExtraTreesEntr ...
	0.858	 = Validation score   (accuracy)
	15.09s	 = Training   runtime
	0.08s	 = Validation runtime
Fitting model: NeuralNetFastAI ...
No improvement since epoch 7: early stopping
	0.62	 = Validation score   (accuracy)
	5.65s	 = Training   runtime
	0.02s	 = Validation runtime
Fitting model: XGBoost ...
	0.85	 = Validation score   (accuracy)
	115.04s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: NeuralNetTorch ...
	0.628	 = Validation score   (accuracy)
	8.11s	 = Training   runtime
	0.02s	 = Validation runtime
Fitting model: LightGBMLarge ...
	0.842	 = Validation score   (accuracy)
	44.96s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	Ensemble Weights: {'LightGBMXT': 1.0}
	0.88	 = Validation score   (accuracy)
	0.28s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 366.2s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 10726.4 rows/s (500 batch size)
Disabling decision threshold calibration for metric `accuracy` due to having fewer than 10000 rows of validation data for calibration, to avoid overfitting (500 rows).
	`accuracy` is generally not improved through threshold calibration. Force calibration via specifying `calibrate_decision_threshold=True`.
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/content/drive/My Drive/Books_Writings/NLPBook/AutogluonModels/ag-20250120_220737")
CPU times: user 6min 13s, sys: 5.68 s, total: 6min 18s
Wall time: 6min 10s
%%time
# TEST OUT-OF-SAMPLE

y_test = test_data['Label']
test_data_nolabel = test_data.drop(labels=['Label'],axis=1)
y_pred = predictor.predict(test_data_nolabel)
y_prob = predictor.predict_proba(test_data_nolabel)
perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=True)
print(perf)
{'accuracy': 0.842, 'balanced_accuracy': 0.8420313681254725, 'mcc': 0.6843694036793613, 'f1': 0.8397565922920892, 'precision': 0.8536082474226804, 'recall': 0.8263473053892215}
CPU times: user 2.94 s, sys: 17.6 ms, total: 2.96 s
Wall time: 3.14 s
#ROC, AUC
import numpy as np
from sklearn.metrics import roc_curve, auc
y_score = [1 if y_prob.loc[i][1]>y_prob.loc[i][0] else 0 for i in range(len(y_prob)) ]
y_true = np.array([1 if j=="__label__1" else 0 for j in y_test])
fpr, tpr, _ = roc_curve(y_true, y_score)

plt.title('ROC curve')
plt.xlabel('FPR (Precision)')
plt.ylabel('TPR (Recall)')

plt.plot(fpr,tpr)
plt.plot((0,1), ls='dashed',color='black')
plt.show()
print('Area under curve (AUC): ', auc(fpr,tpr))
<ipython-input-13-be9385ae72ab>:4: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  y_score = [1 if y_prob.loc[i][1]>y_prob.loc[i][0] else 0 for i in range(len(y_prob)) ]
_images/8c4bbd9ae98a32875fb88c176c39579a2ce4fe9886f599aceed4d36121695217.png
Area under curve (AUC):  0.8420313681254725
predictor.leaderboard(test_data, silent=True)
model score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 ExtraTreesGini 0.848 0.842 accuracy 0.190400 0.082387 15.128048 0.190400 0.082387 15.128048 1 True 8
1 LightGBMXT 0.842 0.880 accuracy 0.127215 0.045832 11.276500 0.127215 0.045832 11.276500 1 True 3
2 WeightedEnsemble_L2 0.842 0.880 accuracy 0.132827 0.046614 11.560580 0.005612 0.000782 0.284079 2 True 14
3 ExtraTreesEntr 0.841 0.858 accuracy 0.221819 0.081660 15.087257 0.221819 0.081660 15.087257 1 True 9
4 LightGBM 0.833 0.868 accuracy 0.128563 0.037030 13.740872 0.128563 0.037030 13.740872 1 True 4
5 RandomForestGini 0.828 0.856 accuracy 0.205604 0.081731 12.945693 0.205604 0.081731 12.945693 1 True 5
6 XGBoost 0.819 0.850 accuracy 0.129552 0.061628 115.041127 0.129552 0.061628 115.041127 1 True 11
7 CatBoost 0.819 0.858 accuracy 0.201900 0.182935 43.606833 0.201900 0.182935 43.606833 1 True 7
8 RandomForestEntr 0.815 0.846 accuracy 0.192870 0.093459 13.665560 0.192870 0.093459 13.665560 1 True 6
9 LightGBMLarge 0.805 0.842 accuracy 0.126158 0.061159 44.964239 0.126158 0.061159 44.964239 1 True 13
10 NeuralNetFastAI 0.567 0.620 accuracy 0.029828 0.019140 5.647088 0.029828 0.019140 5.647088 1 True 10
11 KNeighborsDist 0.567 0.590 accuracy 1.165190 0.787996 2.311469 1.165190 0.787996 2.311469 1 True 2
12 KNeighborsUnif 0.567 0.590 accuracy 1.173159 0.494338 1.674805 1.173159 0.494338 1.674805 1 True 1
13 NeuralNetTorch 0.541 0.628 accuracy 0.028997 0.015831 8.111764 0.028997 0.015831 8.111764 1 True 12

The hyperparameter option multimodal below will also do hyperparameter tuning, which takes considerable time, so wait for it to finish. Maybe best to skip over this code segment.

%%time

# predictor = TabularPredictor(label='Label').fit(train_data=train_data, hyperparameters='multimodal')
# y_test = test_data['Label']
# test_data_nolabel = test_data.drop(labels=['Label'],axis=1)
# y_pred = predictor.predict(test_data_nolabel)
# perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=True)
CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 6.91 µs

27.5. Multimodal Extension#

AutoGluon can also handle images in addition to text and here is an example from their library from: https://auto.gluon.ai/stable/tutorials/multimodal/multimodal_prediction/beginner_multimodal.html

import os
import numpy as np
import warnings
warnings.filterwarnings('ignore')
np.random.seed(123)
%%time
download_dir = './ag_automm_tutorial'
zip_file = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip'
from autogluon.core.utils.loaders import load_zip
load_zip.unzip(zip_file, unzip_dir=download_dir)
Unzipping ./ag_automm_tutorial/file.zip to ./ag_automm_tutorial
CPU times: user 1.66 s, sys: 322 ms, total: 1.98 s
Wall time: 6min 42s
import pandas as pd
dataset_path = download_dir + '/petfinder_for_tutorial'
train_data = pd.read_csv(f'{dataset_path}/train.csv', index_col=0)
test_data = pd.read_csv(f'{dataset_path}/test.csv', index_col=0)
label_col = 'AdoptionSpeed'
train_data.head()
Type Name Age Breed1 Breed2 Gender Color1 Color2 Color3 MaturitySize ... Quantity Fee State RescuerID VideoAmt Description PetID PhotoAmt AdoptionSpeed Images
0 2 Yumi Hamasaki 4 292 265 2 1 5 7 2 ... 1 0 41326 bcc4e1b9557a8b3aaf545ea8e6e86991 0 I rescued Yumi Hamasaki at a food stall far away in Kelantan. At that time i was on my way back to KL, she was suffer from stomach problem and looking very2 sick.. I send her to vet & get the treatment + vaccinated and right now she's very2 healthy.. About yumi : - love to sleep with ppl - she will keep on meowing if she's hugry - very2 active, always seeking for people to accompany her playing - well trained (poo+pee in her own potty) - easy to bathing - I only feed her with these brands : IAMS, Kittenbites, Pro-formance Reason why i need someone to adopt Yumi: I just married and need to ... 7d7a39d71 3.0 0 images/7d7a39d71-1.jpg
1 2 Nene/ Kimie 12 285 0 2 5 6 7 2 ... 1 0 41326 f0450bf0efe0fa3ff9321d0b827b1237 0 Has adopted by a friend with new pet name Kimie 0e107c82f 3.0 0 images/0e107c82f-1.jpg
2 2 Mattie 12 266 0 2 1 7 0 2 ... 1 0 41401 9b52af6d48a4521fd01d4028eb5879a3 0 I rescued Mattie with a broken leg. After surgery with pin inserted in her leg, she's made a full recovery. 1a8fd6707 5.0 0 images/1a8fd6707-1.jpg
3 1 NaN 1 189 307 2 1 2 0 2 ... 1 0 41401 88da1210e021a5cf43480b074778f3bc 0 She born on 30 September . I really hope the animal lovers can adopt her. bca8b44ae 3.0 0 images/bca8b44ae-1.jpg
4 2 Coco 6 276 285 2 2 4 7 2 ... 1 100 41326 227d7b1bcfaffb5f9882bf57b5ee8fab 0 Calico Tame and easy going Diet RC Kitten Supplement - brewer yeast + VCO *11.7.17 - Coco had found her new home. 2def67952 1.0 0 images/2def67952-1.jpg

5 rows × 25 columns

# Expand image paths for loading in training

image_col = 'Images'
train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0]) # Use the first image for a quick tutorial
test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0])


def path_expander(path, base_folder):
    path_l = path.split(';')
    return ';'.join([os.path.abspath(os.path.join(base_folder, path)) for path in path_l])

train_data[image_col] = train_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
test_data[image_col] = test_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))

train_data[image_col].iloc[0]
'/content/drive/MyDrive/Books_Writings/NLPBook/ag_automm_tutorial/petfinder_for_tutorial/images/7d7a39d71-1.jpg'
example_row = train_data.iloc[0]

example_row
0
Type 2
Name Yumi Hamasaki
Age 4
Breed1 292
Breed2 265
Gender 2
Color1 1
Color2 5
Color3 7
MaturitySize 2
FurLength 2
Vaccinated 1
Dewormed 3
Sterilized 2
Health 1
Quantity 1
Fee 0
State 41326
RescuerID bcc4e1b9557a8b3aaf545ea8e6e86991
VideoAmt 0
Description I rescued Yumi Hamasaki at a food stall far away in Kelantan. At that time i was on my way back to KL, she was suffer from stomach problem and looking very2 sick.. I send her to vet & get the treatment + vaccinated and right now she's very2 healthy.. About yumi : - love to sleep with ppl - she will keep on meowing if she's hugry - very2 active, always seeking for people to accompany her playing - well trained (poo+pee in her own potty) - easy to bathing - I only feed her with these brands : IAMS, Kittenbites, Pro-formance Reason why i need someone to adopt Yumi: I just married and need to ...
PetID 7d7a39d71
PhotoAmt 3.0
AdoptionSpeed 0
Images /content/drive/MyDrive/Books_Writings/NLPBook/ag_automm_tutorial/petfinder_for_tutorial/images/7d7a39d71-1.jpg

example_image = example_row[image_col]

from IPython.display import Image, display
pil_img = Image(filename=example_image)
display(pil_img)
_images/2c51d770716edcf63f3988523cbc934d4fc245ed0f8ee91864e7d7da6f3f1c81.jpg
%%time

from autogluon.multimodal import MultiModalPredictor
predictor = MultiModalPredictor(label=label_col)
predictor.fit(
    train_data=train_data,
    time_limit=120, # seconds
)
No path specified. Models will be saved in: "AutogluonModels/ag-20250120_222048"
=================== System Info ===================
AutoGluon Version:  1.2
Python Version:     3.11.11
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024
CPU Count:          2
Pytorch Version:    2.5.1+cu121
CUDA Version:       12.1
Memory Avail:       10.03 GB / 12.67 GB (79.1%)
Disk Space Avail:   76.02 GB / 112.64 GB (67.5%)
===================================================
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [0, 1]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])

AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /content/drive/MyDrive/Books_Writings/NLPBook/AutogluonModels/ag-20250120_222048
    ```

INFO: Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
GPU 0 Name: Tesla T4
GPU 0 Memory: 0.25GB/15.0GB (Used/Total)

INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO: 
  | Name              | Type                | Params | Mode 
------------------------------------------------------------------
0 | model             | MultimodalFusionMLP | 207 M  | train
1 | validation_metric | BinaryAUROC         | 0      | train
2 | loss_func         | CrossEntropyLoss    | 0      | train
------------------------------------------------------------------
207 M     Trainable params
0         Non-trainable params
207 M     Total params
828.307   Total estimated model params size (MB)
946       Modules in train mode
225       Modules in eval mode
INFO: Epoch 0, global step 1: 'val_roc_auc' reached 0.56250 (best 0.56250), saving model to '/content/drive/MyDrive/Books_Writings/NLPBook/AutogluonModels/ag-20250120_222048/epoch=0-step=1.ckpt' as top 3
INFO: Epoch 0, global step 4: 'val_roc_auc' reached 0.78194 (best 0.78194), saving model to '/content/drive/MyDrive/Books_Writings/NLPBook/AutogluonModels/ag-20250120_222048/epoch=0-step=4.ckpt' as top 3
INFO: Time limit reached. Elapsed time is 0:03:28. Signaling Trainer to stop.
Start to fuse 2 checkpoints via the greedy soup algorithm.
AutoMM has created your model. 🎉🎉🎉

To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/content/drive/MyDrive/Books_Writings/NLPBook/AutogluonModels/ag-20250120_222048")
    ```

If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).
CPU times: user 55.7 s, sys: 37.3 s, total: 1min 33s
Wall time: 6min 2s
<autogluon.multimodal.predictor.MultiModalPredictor at 0x788db21d6790>
scores = predictor.evaluate(test_data, metrics=["roc_auc"])
scores
{'roc_auc': 0.8608}
predictions = predictor.predict(test_data.drop(columns=label_col))
print(predictions[:5])

print(test_data[label_col][:5])
8     1
70    1
82    1
28    0
63    1
Name: AdoptionSpeed, dtype: int64
8     0
70    1
82    1
28    0
63    1
Name: AdoptionSpeed, dtype: int64
probas = predictor.predict_proba(test_data.drop(columns=label_col))
probas[:5]
0 1
8 0.471467 0.528533
70 0.407569 0.592431
82 0.010966 0.989034
28 0.585131 0.414869
63 0.093797 0.906203