48. AWS SageMaker JumpStart#
JumpStart is a collection of models to enable machine learning aspirants get started on their ML journey. Here is a brief introduction through a blog from the launch in December 2020.
A year later JumpStart was extended to financial modeling and launched in Q3/Q4 2021: https://aws.amazon.com/about-aws/whats-new/2021/09/amazon-sagemaker-jumpstart-multimodal-financial-analysis-tools/
JumpStart Finance now comprises a new SDK, example notebooks, blogs, solutions (end to end deployable models), financial transformers, with complete documentation. https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart-industry.html
from google.colab import drive
drive.mount('/content/drive') # Add My Drive/<>
import os
os.chdir('drive/My Drive')
os.chdir('Books_Writings/NLPBook/')
Mounted at /content/drive
%%capture
%pylab inline
import pandas as pd
import os
!pip install ipypublish
from ipypublish import nb_setup
JumpStart offers pre-built models and machine learning solutions through notebooks that can be used as-is or modified by the user to adapt them to their own applications. These solutions are “low-code” in that very little programming is needed to adapt the code and no code needs to be written from scratch. Further, many of the models and solutions on JumpStart also offer easy deployment of the trained ML models to an endpoint so that they may be used in production.
48.1. SageMaker#
SageMaker is AWS’s machine learning service.
Amazon SageMaker supports the leading machine learning frameworks, toolkits, and programming languages such as Python, R, Jupyter, MXNet, PyTorch, TensorFlow, Scikit-Learn, HuggingFace, etc.
SageMaker may be thought of as a service that fully supports the data preparation, model building, training and tuning, and deployment pipeline for machine learning, as shown by the various components of SageMaker.
nb_setup.images_hconcat(["NLP_images/prep_build_train_deploy.png"], width=900)

We will be using JumpStart that is available through the SageMaker interface, essentially a Jupter Lab IDE, known as SageMaker Studio, which we will refer to as simply Studio
.
After logging into the AWS Console, and navigating to SageMaker, you can choose Studio
and navigate to JumpStart, to see the collection of model and solution cards offered there as shown here.
nb_setup.images_hconcat(["NLP_images/sagemaker_jumpstart.png"], width=900)

Several items are offered:
Solutions: End to end applications that may be deployed via an endpoint.
Text models: Various applications based on language models such as embedding generators, text classifiers, question-answering, summarization, etc., based on industry standard models like BERT, RoBERTa, etc. These may also be deployed using HuggingFace inside SageMaker. There are over a hundred text models. You may want to review these as they may be useful for your class projects.
Vision models: Over 200 models like ResNet, VGG, etc.
Tabular models: For classification and regression use cases, showing applications of boosting, gradient boosting, etc.
Algorithms: Optimized industrial strength ML algorithms that are deployed as APIs, such as linear learner, XGBoost, anomaly detection, multiclass classification, PCA, time-series forecasting, kNN, factorization machines, etc.
Notebooks: A collection of example notebooks that showcase using APIs for simple applications in SageMaker.
Blogs: showcasing interesting applications.
Videos: these offer introductory tutorials to get started with JumpStart and are very useful.
48.2. Financial Toolkit within JumpStart#
Recently, AWS developed a collection of solutions and models for financial use cases, such as credit ratings, and also emphasized the usefulness of multimodal ML models in finance.
See this presentation of multimodal machine learning in finance.
You can use the new set of multimodal financial analysis tools within Amazon SageMaker JumpStart. With these new tools (https://aws.amazon.com/about-aws/whats-new/2021/09/amazon-sagemaker-jumpstart-multimodal-financial-analysis-tools/), you can enhance your tabular ML workflows with new insights from financial text documents and potentially help save up to weeks of programming time. Using the new SageMaker JumpStart Industry SDK (https://sagemaker-jumpstart-industry-pack.readthedocs.io/en/latest/), you can easily retrieve common public financial documents, including SEC filings, and further process financial text documents with features such as summarization and scoring for sentiment, litigiousness, risk, readability etc. In addition, you can access pre-trained language models trained on financial text for transfer learning, and use example notebooks for data retrieval, text feature engineering, multimodal classification and regression models. Lastly, you can access a solution for corporate credit scoring, which is fully customizable and showcases the use of AWS CloudFormation templates and reference architectures so you can accelerate your machine learning journey.
48.3. JumpStart Financial Blogs#
To get started, you can peruse these blogs:
Use SEC text for ratings classification using multimodal ML in Amazon SageMaker JumpStart : https://aws.amazon.com/blogs/machine-learning/use-sec-text-for-ratings-classification-using-multimodal-ml-in-amazon-sagemaker-jumpstart/
Use pre-trained financial language models for transfer learning in Amazon SageMaker JumpStart: https://aws.amazon.com/blogs/machine-learning/use-pre-trained-financial-language-models-for-transfer-learning-in-amazon-sagemaker-jumpstart/
Create a dashboard with SEC text for financial NLP in Amazon SageMaker JumpStart: https://aws.amazon.com/blogs/machine-learning/create-a-dashboard-with-sec-text-for-financial-nlp-in-amazon-sagemaker-jumpstart/
Build a corporate credit ratings classifier using graph machine learning in Amazon SageMaker JumpStart: https://aws.amazon.com/blogs/machine-learning/build-a-corporate-credit-ratings-classifier-using-graph-machine-learning-in-amazon-sagemaker-jumpstart/
48.4. Technical Documentation#
For the financial industry, the relevant documentation is provided here:
Smjsindustry SDK: https://pypi.org/project/smjsindustry/
ReadTheDocs: https://sagemaker-jumpstart-industry-pack.readthedocs.io/en/latest/notebooks/index.html
Github Repo: aws/sagemaker-jumpstart-industry-pack
Official SageMaker doc: https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart-industry.html
48.6. Teaching Examples from AWS SageMaker JumpStart#
This section will demonstrate hands-on multimodal ML. It will require an AWS account with access to SageMaker. Related documentation and blogs are in the sections above.
We will work through the following models in AWS SageMaker JumpStart:
Financial TabText Data Construction. For downloading and parsing SEC filings, summarizing the text of the filings, and scoring of the text for sentiment, readability, positivity, litigiousness, fraud, etc. This creates a TabText, i.e., a data frame with columns of text and tabular fields, for multimodal machine learning.
NLP Score Dashboard for SEC Text. This extends the previous solution to parsing out sections of the 10-K, 10-Q, and 8-K SEC filings. This enables the creation of a dashboard for the main SEC filings.
Corporate Credit Rating Prediction. This solution showcases how standard credit rating models that are based on tabular data can be enhanced with SEC filings data through multimodal machine learning.
RoBERTa-SEC Wiki Base. Shows how to use a special pre-trained langauge model that has been trained on Wiki text and SEC filings for fine-tuning to specific financial text classification tasks.
48.7. Beyond Finance#
JumpStart has hundreds of additional use cases beyond finance. You may want to try many of the text and image models by searching for them on JumpStart. More recently, generative AI models have become very popular and these are based on large language models (LLMs), a subset of a broader class of Fundation Models (FMs).
For examples of chatbots using generative AIs, tuned for dialog applications, see https://beta.character.ai, based on LaMDA (Language Models for Dialog Applications), https://arxiv.org/abs/2201.08239.
AI21 Studio (https://studio.ai21.com/overview) offers an interesting collection of NLU and NLG models.
Text generation: https://aws.amazon.com/blogs/machine-learning/run-text-generation-with-gpt-and-bloom-models-on-amazon-sagemaker-jumpstart/
APIs for models: https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-jumpstart-models-and-algorithms-now-available-via-api/
You can find additional useful resources on AWS by searching here: https://aiexplorer.aws.amazon.com/