23. Summarization#

Can a machine summarize a document?

from google.colab import drive
drive.mount('/content/drive')  # Add My Drive/<>

import os
os.chdir('drive/My Drive')
os.chdir('Books_Writings/NLPBook/')
Mounted at /content/drive
%%capture
%pylab inline
import pandas as pd
import os
%load_ext rpy2.ipython
import textwrap

23.1. Types of Summarization#

There are two broad types of text summarization:

  1. Extractive: provide the most meaningful extracted subsample from the text.

  2. Abstractive: generate new language that explains the document more briefly.

There are some metrics for the quality of summarization, see: http://nlpprogress.com/english/summarization.html

But now we have “Generative” summarization using LLMs. Ask yourself when this is better and when it is worse.

23.2. Jaccard Summarizer#

Here we present a simple approach to extractive summarization.

A document \(D\) is comprised of \(m\) sentences \(s_i,i=1,2,...,m\), where each \(s_i\) is a set of words. We compute the pairwise overlap between sentences using the Jaccard similarity index:

\[ J_{ij} = J(s_i,s_j)=\frac{|s_i \cap s_j|}{|s_i \cup s_j|} = J_{ji} \]

The overlap is the ratio of the size of the intersect of the two word sets in sentences \(s_i\) and \(s_j\), divided by the size of the union of the two sets. The similarity score of each sentence is computed as the row sums of the Jaccard similarity matrix.

\[ S_i=\sum_{j=1}^m J_{ij} \]

23.2.1. Generating the summary#

Once the row sums are obtained, they are sorted and the summary is the first \(n\) sentences based on the \(S_i\) values.

%%R
# FUNCTION TO RETURN n SENTENCE SUMMARY
# Input: array of sentences (text)
# Output: n most common intersecting sentences
text_summary = function(text, n) {
  m = length(text)  # No of sentences in input
  jaccard = matrix(0,m,m)  #Store match index
  for (i in 1:m) {
    for (j in i:m) {
      a = text[i]; aa = unlist(strsplit(a," "))
      b = text[j]; bb = unlist(strsplit(b," "))
      jaccard[i,j] = length(intersect(aa,bb))/
                          length(union(aa,bb))
      jaccard[j,i] = jaccard[i,j]
    }
  }
  similarity_score = rowSums(jaccard)
  res = sort(similarity_score, index.return=TRUE,
          decreasing=TRUE)
  idx = res$ix[1:n]
  summary = text[idx]
}
UsageError: Cell magic `%%R` not found.

23.3. One Function to Rule All Text in R#

Also, a quick introduction to the tm package in R: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf

Install (if needed from the command line): conda install -c r r-tm or install it as shown below.

%%R
install.packages("tm", quiet=TRUE)
# ! conda install -c conda-forge r-tm -y
# ! conda install -c r r-tm -y
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: also installing the dependencies ‘NLP’, ‘slam’, ‘BH’
%%R
library(tm)
library(stringr)
#READ IN TEXT FOR ANALYSIS, PUT IT IN A CORPUS, OR ARRAY, OR FLAT STRING
#cstem=1, if stemming needed
#cstop=1, if stopwords to be removed
#ccase=1 for lower case, ccase=2 for upper case
#cpunc=1, if punctuation to be removed
#cflat=1 for flat text wanted, cflat=2 if text array, else returns corpus
read_web_page = function(url,cstem=0,cstop=0,ccase=0,cpunc=0,cflat=0) {
    text = readLines(url)
    text = text[setdiff(seq(1,length(text)),grep("<",text))]
    text = text[setdiff(seq(1,length(text)),grep(">",text))]
    text = text[setdiff(seq(1,length(text)),grep("]",text))]
    text = text[setdiff(seq(1,length(text)),grep("}",text))]
    text = text[setdiff(seq(1,length(text)),grep("_",text))]
    text = text[setdiff(seq(1,length(text)),grep("\\/",text))]
    ctext = Corpus(VectorSource(text))
    if (cstem==1) { ctext = tm_map(ctext, stemDocument) }
    if (cstop==1) { ctext = tm_map(ctext, removeWords, stopwords("english"))}
    if (cpunc==1) { ctext = tm_map(ctext, removePunctuation) }
    if (ccase==1) { ctext = tm_map(ctext, tolower) }
    if (ccase==2) { ctext = tm_map(ctext, toupper) }
    text = ctext
    #CONVERT FROM CORPUS IF NEEDED
    if (cflat>0) {
        text = NULL
        for (j in 1:length(ctext)) {
            temp = ctext[[j]]$content
            if (temp!="") { text = c(text,temp) }
        }
        text = as.array(text)
    }
    if (cflat==1) {
        text = paste(text,collapse="\n")
        text = str_replace_all(text, "[\r\n]" , " ")
    }
    result = text
}
WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Loading required package: NLP

23.4. Example: Summarization#

We will use a sample of text that I took from Bloomberg news. It is about the need for data scientists.

%%R
url = "NLP_data/dstext_sample.txt"   #You can put any text file or URL here
text = read_web_page(url,cstem=0,cstop=0,ccase=0,cpunc=0,cflat=1)
print(length(text[[1]]))
[1] 1
text = %Rget text
text = text[0]
print(textwrap.fill(text, width=80))
THERE HAVE BEEN murmurings that we are now in the “trough of disillusionment” of
big data, the hype around it having surpassed the reality of what it can
deliver.  Gartner suggested that the “gravitational pull of big data is now so
strong that even people who haven’t a clue as to what it’s all about report that
they’re running big data projects.”  Indeed, their research with business
decision makers suggests that organisations are struggling to get value from big
data. Data scientists were meant to be the answer to this issue. Indeed, Hal
Varian, Chief Economist at Google famously joked that “The sexy job in the next
10 years will be statisticians.” He was clearly right as we are now used to
hearing that data scientists are the key to unlocking the value of big data.
This has created a huge market for people with these skills. US recruitment
agency, Glassdoor, report that the average salary for a data scientist is
$118,709 versus $64,537 for a skilled programmer. And a McKinsey study predicts
that by 2018, the United States alone faces a shortage of 140,000 to 190,000
people with analytical expertise and a 1.5 million shortage of managers with the
skills to understand and make decisions based on analysis of big data.  It’s no
wonder that companies are keen to employ data scientists when, for example, a
retailer using big data can reportedly increase their margin by more than 60%.
However, is it really this simple? Can data scientists actually justify earning
their salaries when brands seem to be struggling to realize the promise of big
data? Perhaps we are expecting too much of data scientists. May be we are
investing too much in a relatively small number of individuals rather than
thinking about how we can design organisations to help us get the most from data
assets. The focus on the data scientist often implies a centralized approach to
analytics and decision making; we implicitly assume that a small team of highly
skilled individuals can meet the needs of the organisation as a whole. This
theme of centralized vs. decentralized decision-making is one that has long been
debated in the management literature.  For many organisations a centralized
structure helps maintain control over a vast international operation, plus
ensures consistency of customer experience. Others, meanwhile, may give managers
at a local level decision-making power particularly when it comes to tactical
needs.   But the issue urgently needs revisiting in the context of big data as
the way in which organisations manage themselves around data may well be a key
factor for brands in realizing the value of their data assets. Economist and
philosopher Friedrich Hayek took the view that organisations should consider the
purpose of the information itself. Centralized decision-making can be more cost-
effective and co-ordinated, he believed, but decentralization can add speed and
local information that proves more valuable, even if the bigger picture is less
clear.  He argued that organisations thought too highly of centralized
knowledge, while ignoring ‘knowledge of the particular circumstances of time and
place’. But it is only relatively recently that economists are starting to
accumulate data that allows them to gauge how successful organisations organize
themselves. One such exercise reported by Tim Harford was carried out by Harvard
Professor Julie Wulf and the former chief economist of the International
Monetary Fund, Raghuram Rajan. They reviewed the workings of large US
organisations over fifteen years from the mid-80s. What they found was
successful companies were often associated with a move towards decentralisation,
often driven by globalisation and the need to react promptly to a diverse and
swiftly-moving range of markets, particularly at a local level. Their research
indicated that decentralisation pays. And technological advancement often goes
hand-in-hand with decentralization. Data analytics is starting to filter down to
the department layer, where executives are increasingly eager to trawl through
the mass of information on offer. Cloud computing, meanwhile, means that line
managers no longer rely on IT teams to deploy computer resources. They can do it
themselves, in just minutes.  The decentralization trend is now impacting on
technology spending. According to Gartner, chief marketing officers have been
given the same purchasing power in this area as IT managers and, as their
spending rises, so that of data centre managers is falling. Tim Harford makes a
strong case for the way in which this decentralization is important given that
the environment in which we operate is so unpredictable. Innovation typically
comes, he argues from a “swirling mix of ideas not from isolated minds.” And he
cites Jane Jacobs, writer on urban planning– who suggested we find innovation in
cities rather than on the Pacific islands. But this approach is not necessarily
always adopted. For example, research by academics Donald Marchand and Joe
Peppard discovered that there was still a tendency for brands to approach big
data projects the same way they would existing IT projects: i.e. using
centralized IT specialists with a focus on building and deploying technology on
time, to plan, and within budget. The problem with a centralized ‘IT-style’
approach is that it ignores the human side of the process of considering how
people create and use information i.e. how do people actually deliver value from
data assets. Marchand and Peppard suggest (among other recommendations) that
those who need to be able to create meaning from data should be at the heart of
any initiative. As ever then, the real value from data comes from asking the
right questions of the data. And the right questions to ask only emerge if you
are close enough to the business to see them. Are data scientists earning their
salary? In my view they are a necessary but not sufficient part of the solution;
brands need to be making greater investment in working with a greater range of
users to help them ask questions of the data. Which probably means that data
scientists’ salaries will need to take a hit in the process.
%%R
text2 = strsplit(text,". ",fixed=TRUE)  #Special handling of the period.
text2 = text2[[1]]
print(text2)
 [1] "THERE HAVE BEEN murmurings that we are now in the “trough of disillusionment” of big data, the hype around it having surpassed the reality of what it can deliver"                                                                                                                                                     
 [2] " Gartner suggested that the “gravitational pull of big data is now so strong that even people who haven’t a clue as to what it’s all about report that they’re running big data projects.”  Indeed, their research with business decision makers suggests that organisations are struggling to get value from big data"
 [3] "Data scientists were meant to be the answer to this issue"                                                                                                                                                                                                                                                             
 [4] "Indeed, Hal Varian, Chief Economist at Google famously joked that “The sexy job in the next 10 years will be statisticians.” He was clearly right as we are now used to hearing that data scientists are the key to unlocking the value of big data"                                                                   
 [5] "This has created a huge market for people with these skills"                                                                                                                                                                                                                                                           
 [6] "US recruitment agency, Glassdoor, report that the average salary for a data scientist is $118,709 versus $64,537 for a skilled programmer"                                                                                                                                                                             
 [7] "And a McKinsey study predicts that by 2018, the United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and a 1.5 million shortage of managers with the skills to understand and make decisions based on analysis of big data"                                                     
 [8] " It’s no wonder that companies are keen to employ data scientists when, for example, a retailer using big data can reportedly increase their margin by more than 60%"                                                                                                                                                  
 [9] " However, is it really this simple? Can data scientists actually justify earning their salaries when brands seem to be struggling to realize the promise of big data? Perhaps we are expecting too much of data scientists"                                                                                            
[10] "May be we are investing too much in a relatively small number of individuals rather than thinking about how we can design organisations to help us get the most from data assets"                                                                                                                                      
[11] "The focus on the data scientist often implies a centralized approach to analytics and decision making; we implicitly assume that a small team of highly skilled individuals can meet the needs of the organisation as a whole"                                                                                         
[12] "This theme of centralized vs"                                                                                                                                                                                                                                                                                          
[13] "decentralized decision-making is one that has long been debated in the management literature"                                                                                                                                                                                                                          
[14] " For many organisations a centralized structure helps maintain control over a vast international operation, plus ensures consistency of customer experience"                                                                                                                                                           
[15] "Others, meanwhile, may give managers at a local level decision-making power particularly when it comes to tactical needs"                                                                                                                                                                                              
[16] "  But the issue urgently needs revisiting in the context of big data as the way in which organisations manage themselves around data may well be a key factor for brands in realizing the value of their data assets"                                                                                                  
[17] "Economist and philosopher Friedrich Hayek took the view that organisations should consider the purpose of the information itself"                                                                                                                                                                                      
[18] "Centralized decision-making can be more cost-effective and co-ordinated, he believed, but decentralization can add speed and local information that proves more valuable, even if the bigger picture is less clear"                                                                                                    
[19] " He argued that organisations thought too highly of centralized knowledge, while ignoring ‘knowledge of the particular circumstances of time and place’"                                                                                                                                                               
[20] "But it is only relatively recently that economists are starting to accumulate data that allows them to gauge how successful organisations organize themselves"                                                                                                                                                         
[21] "One such exercise reported by Tim Harford was carried out by Harvard Professor Julie Wulf and the former chief economist of the International Monetary Fund, Raghuram Rajan"                                                                                                                                           
[22] "They reviewed the workings of large US organisations over fifteen years from the mid-80s"                                                                                                                                                                                                                              
[23] "What they found was successful companies were often associated with a move towards decentralisation, often driven by globalisation and the need to react promptly to a diverse and swiftly-moving range of markets, particularly at a local level"                                                                     
[24] "Their research indicated that decentralisation pays"                                                                                                                                                                                                                                                                   
[25] "And technological advancement often goes hand-in-hand with decentralization"                                                                                                                                                                                                                                           
[26] "Data analytics is starting to filter down to the department layer, where executives are increasingly eager to trawl through the mass of information on offer"                                                                                                                                                          
[27] "Cloud computing, meanwhile, means that line managers no longer rely on IT teams to deploy computer resources"                                                                                                                                                                                                          
[28] "They can do it themselves, in just minutes"                                                                                                                                                                                                                                                                            
[29] " The decentralization trend is now impacting on technology spending"                                                                                                                                                                                                                                                   
[30] "According to Gartner, chief marketing officers have been given the same purchasing power in this area as IT managers and, as their spending rises, so that of data centre managers is falling"                                                                                                                         
[31] "Tim Harford makes a strong case for the way in which this decentralization is important given that the environment in which we operate is so unpredictable"                                                                                                                                                            
[32] "Innovation typically comes, he argues from a “swirling mix of ideas not from isolated minds.” And he cites Jane Jacobs, writer on urban planning– who suggested we find innovation in cities rather than on the Pacific islands"                                                                                       
[33] "But this approach is not necessarily always adopted"                                                                                                                                                                                                                                                                   
[34] "For example, research by academics Donald Marchand and Joe Peppard discovered that there was still a tendency for brands to approach big data projects the same way they would existing IT projects: i.e"                                                                                                              
[35] "using centralized IT specialists with a focus on building and deploying technology on time, to plan, and within budget"                                                                                                                                                                                                
[36] "The problem with a centralized ‘IT-style’ approach is that it ignores the human side of the process of considering how people create and use information i.e"                                                                                                                                                          
[37] "how do people actually deliver value from data assets"                                                                                                                                                                                                                                                                 
[38] "Marchand and Peppard suggest (among other recommendations) that those who need to be able to create meaning from data should be at the heart of any initiative"                                                                                                                                                        
[39] "As ever then, the real value from data comes from asking the right questions of the data"                                                                                                                                                                                                                              
[40] "And the right questions to ask only emerge if you are close enough to the business to see them"                                                                                                                                                                                                                        
[41] "Are data scientists earning their salary? In my view they are a necessary but not sufficient part of the solution; brands need to be making greater investment in working with a greater range of users to help them ask questions of the data"                                                                        
[42] "Which probably means that data scientists’ salaries will need to take a hit in the process."                                                                                                                                                                                                                           
%%R
res = text_summary(text2,5)
print(res)
[1] " Gartner suggested that the “gravitational pull of big data is now so strong that even people who haven’t a clue as to what it’s all about report that they’re running big data projects.”  Indeed, their research with business decision makers suggests that organisations are struggling to get value from big data"
[2] "The focus on the data scientist often implies a centralized approach to analytics and decision making; we implicitly assume that a small team of highly skilled individuals can meet the needs of the organisation as a whole"                                                                                         
[3] "May be we are investing too much in a relatively small number of individuals rather than thinking about how we can design organisations to help us get the most from data assets"                                                                                                                                      
[4] "The problem with a centralized ‘IT-style’ approach is that it ignores the human side of the process of considering how people create and use information i.e"                                                                                                                                                          
[5] "Which probably means that data scientists’ salaries will need to take a hit in the process."                                                                                                                                                                                                                           

23.5. Text Summarization with Python#

This is a approach that distills a document down to its most important sentences. The idea is very simple. The algorithm simply focuses on the essence of a document. The customer use case is that the quantity of reading is too high and a smaller pithy version would be great to have.

However, in the absence of an article/document, I have some examples where we download an article using selector gadget, Beautiful Soup, and extract the text of the article. But the summarizer/compressor assumes that the article is clean flat file text.

https://www.dataquest.io/blog/web-scraping-tutorial-python/

Install these if needed:

!pip install lxml
!pip install cssselect
!pip install nltk
Requirement already satisfied: lxml in /usr/local/lib/python3.11/dist-packages (5.3.0)
Collecting cssselect
  Downloading cssselect-1.2.0-py2.py3-none-any.whl.metadata (2.2 kB)
Downloading cssselect-1.2.0-py2.py3-none-any.whl (18 kB)
Installing collected packages: cssselect
Successfully installed cssselect-1.2.0
Requirement already satisfied: nltk in /usr/local/lib/python3.11/dist-packages (3.9.1)
Requirement already satisfied: click in /usr/local/lib/python3.11/dist-packages (from nltk) (8.1.8)
Requirement already satisfied: joblib in /usr/local/lib/python3.11/dist-packages (from nltk) (1.4.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.11/dist-packages (from nltk) (2024.11.6)
Requirement already satisfied: tqdm in /usr/local/lib/python3.11/dist-packages (from nltk) (4.67.1)
# Read in the news article from the URL and extract only the title and text of the article.
# Some examples provided below.

import requests
from lxml.html import fromstring
url = "https://www.theverge.com/2023/10/4/23903986/sam-bankman-fried-opening-statements-trial-fraud"
# url = "https://www.nytimes.com/2023/10/03/us/politics/kevin-mccarthy-speaker.html"
# url = "https://www.theatlantic.com/technology/archive/2022/04/doxxing-meaning-libs-of-tiktok/629643/"
# url = 'https://economictimes.indiatimes.com/news/economy/policy/a-tax-cut-for-you-in-budget-wont-give-india-the-boost-it-needs/articleshow/73476138.cms?utm_source=Colombia&utm_medium=C1&utm_campaign=CTN_ET_hp&utm_content=18'
html = requests.get(url, timeout=10).text

#See: http://infohost.nmt.edu/~shipman/soft/pylxml/web/etree-fromstring.html
doc = fromstring(html)

#http://lxml.de/cssselect.html#the-cssselect-method
doc.cssselect(".lg\:max-w-none")
# doc.cssselect(".evys1bk0") # nytimes
# doc.cssselect(".Normal")  #economic times
# doc.cssselect(".ArticleParagraph_root__wy3UI")   #Atlantic
[<Element div at 0x7d5b68370b40>, <Element div at 0x7d5b68370cd0>]
#economic times
# x = doc.cssselect(".Normal")
# news = x[0].text_content()
# print(news)

# Verge
x = doc.cssselect(".lg\:max-w-none")

#nytimes
# x = doc.cssselect(".StoryBodyCompanionColumn")

# Atlantic
# x = doc.cssselect(".ArticleParagraph_root__wy3UI")
news = " ".join([x[j].text_content() for j in range(len(x))])

Make sure the text you extracted is in string form. Then convert the article into individual sentences. Put the individual sentences into a list. Use BeautifulSoup for this.

from bs4 import BeautifulSoup
news = BeautifulSoup(news,'lxml').get_text()
print(textwrap.fill(news, width=80))
type(news)
TechIs Sam Bankman-Fried’s defense even trying to win?The prosecution came out
swinging. Oddly, Bankman-Fried’s defense didn’t.By  Elizabeth Lopatto, a
reporter who writes about tech, money, and human behavior. She joined The Verge
in 2014 as science editor. Previously, she was a reporter at Bloomberg.  Oct 4,
2023, 11:02 PM UTCShare this storyThreadsEven the defense’s opening statement
was a bad look for Sam Bankman-Fried Photo Illustration by Cath Virginia / The
Verge I have never seen Sam Bankman-Fried so still as he was during the
prosecution’s opening statement. The characteristic leg-jiggling was absent. He
barely moved as the prosecutor listed the evidence against him: internal company
files, what customers were told, the testimony of his co-conspirators and his
own words.His hair was shorn, the result of a haircut from a fellow prisoner,
the Wall Street Journal reported. He wore a suit bought at a discount at Macy’s,
per the Journal; it hung on him. He appeared to have lost some weight.“All of
that was built on lies.”RelatedFTX’s Sam Bankman-Fried is on trial for fraud and
conspiracyBankman-Fried, at this time last year, had a luxury lifestyle as the
CEO of crypto exchange FTX, said the assistant US attorney, Thane Rehn, in the
cadence of a high schooler delivering his lines in a student play. Bankman-Fried
hung out with Tom Brady. He was on magazine covers, lived in a $30 million
penthouse, and spent time with world politicians. “All of that was built on
lies,” Rehn said.In his opening statement, Rehn dodged explaining cryptocurrency
to the jury. Instead, he punched hard on Bankman-Fried lying and
stealing. Bankman-Fried sat almost motionless, occasionally glancing at Rehn, as
the prosecutor told the jury that Bankman-Fried sold stock in FTX and borrowed
millions from lenders by lying. The story Rehn told is familiar to anyone
following the news. In May and June of 2022, Alameda Research — the crypto
trading company ostensibly helmed by Caroline Ellison — didn’t have enough to
pay its bills, so it pulled customer money to repay loans. By September, the
hole in the FTX balance sheet was so big that customers could never be
repaid.FTX “didn’t have a chief risk officer, which became an issue when the
storm hit.”When CoinDesk published its article in November 2022, people realized
FTX was a house of cards, Rehn said. Meanwhile, Bankman-Fried tweeted. “FTX is
fine. Assets are fine” and “We don’t invest customer assets even in
treasuries.”Pointing at Bankman-Fried, Rehn said, “This man stole billions of
dollars from thousands of people.” So how was the defense going to follow it up?
I was very curious, having learned yesterday that Bankman-Fried had never been
offered a plea deal since he and his attorneys had told the government they
wouldn’t negotiate. Surely there would be some manner of evidence, some
something, that would have made him so confident.There was, instead, a
metaphor.Defense attorney Mark Cohen, with the energy of a patient father
telling his obnoxious children a bedtime story, assured us that working at a
startup was like building a plane while flying it, and that FTX the plane had
flown right into the perfect storm: the crypto crash. Except, uh, he also said
this: FTX “didn’t have a chief risk officer, which became an issue when the
storm hit.”I couldn’t stop thinking about the missing risk officerThe problem
with this metaphor is that if FTX was a plane, it was a plane flying with a key
component missing — namely, the risk officer, an executive whose job it is to,
well, manage risk. This is sort of an important thing, as risks can be anything
from reputational to regulatory to financial. FTX was named such as it was
because it was a futures exchange, which, to borrow a phrase from Bloomberg’s
Matt Levine, “sits between the winners and losers of bets.” That means FTX can’t
pay out what it owes the winners unless the losers pay up. Risk management is a
crucial part of the business; risk officers exist to identify business’
potential risks, monitor, and mitigate them. This is to say nothing of the
regulatory risks around crypto. As Cohen droned on about airplanes, I couldn’t
stop thinking about the missing risk officer. Bringing it up, I thought, was a
tremendous mistake. The prosecution hadn’t mentioned it. Either Bankman-Fried is
stupid — unlikely — or he deliberately didn’t hire a risk officer. Was he
worried about what one might find? Sure, as Cohen put it, Bankman-Fried was a
math nerd who didn’t party. That paints a picture of someone who’s pretty
deliberate, particularly since he immediately left MIT and went to work on Wall
Street. If he had been a party-hardy trainwreck, I could see overlooking a risk
officer in order to do another line, or a supermodel, or something else
important. Why was the defense bringing this up?But as Cohen tried to tell me
that FTX’s and Alameda’s business relationships were “reasonable under the
circumstances,” the lack of risk officer kept elbowing me in the ribs. “Sam
acted in good faith and took reasonable business measures” is a pretty hard pill
to swallow with that in mind. Man, it’s no good when your defense lawyer has
just made you sound worse than the prosecution already did. And while Cohen
tried to make the common white-collar defense argument that Bankman-Fried, as
CEO, was simply too busy to oversee what everyone did every day, he just made me
more suspicious. That’s why you hire a risk officer and delegate! That’s the
whole point! I could barely even hear Cohen blaming Caroline Ellison and
Changpeng “CZ” Zhao for the debacle over the “no risk officer” ringing in my
ears.Following the defense’s opening statements, things got still worse for
Bankman-Fried. The prosecution called its first witness, Marc-Antoine Julliard,
whose money got stuck on FTX. Juilliard, who was born in Paris and lives in
London, testified that he trusted FTX because Bankman-Fried came across as a
leading figure of the industry. When he was evaluating the exchange, he thought
the sheer volume of users was important, too — at the time, FTX was among the
top three biggest exchanges. Plus, major VC firms had invested, and “they don’t
commit hundreds of millions without doing due diligence, checking the books, the
accountancy of the firm, going through several compliance process[es], so that
was a vote of confidence for me,” Juilliard said. (Evidently he had not paid
attention to the Elizabeth Holmes trial.)He also noted FTX’s glossy ads —
featuring Gisele Bündchen, for instance —  suggested a very high budget. It
wouldn’t make sense to spend that much money unless FTX had very strong
financials, Juilliard figured. He opened an account, transferred in both regular
money and cryptocurrency, and used the exchange to execute his plan: buying
Bitcoin to sell back in five to ten years at higher prices.It is a thankless
task to cross-examine a customer whose money is gone In November 2022, things
went bad for Julliard. He followed Bankman-Fried on Twitter, and read aloud the
“FTX is fine. Assets are fine” tweets, along with “FTX has enough to cover all
client holdings. We don’t invest client assets” and a few others, which gave
Julliard the impression that his money was there — the problem might have been
technical (anti-spam measures) or regulatory. When he tried to get his money out
on November 8th, it was too late. We saw screenshots of his withdrawal attempts:
$20,000 USD and about 4 Bitcoin, which were worth about $20,000 at the time:
about $100,000 money, inaccessible.It is a thankless task to cross-examine a
customer whose money is gone, but Cohen tried anyway. He noted that Julliard was
a licensed commodities broker, who was trading in crypto because he didn’t have
to disclose it; that Julliard knew that crypto was new and risky, and that
Julliard didn’t review the terms of service agreement he’d assented to when
making his FTX account. Well, sure, but so what? The next witness called was
Bankman-Fried’s former college (and FTX) roommate, Adam Yedidia, about whom I
expect I will have much more to say tomorrow. When the jury was dismissed,
Bankman-Fried’s lawyers told the judge that he wasn’t getting his full Adderall
doses in prison. The defense appeared to be setting up the grounds for an appeal
— it’s previously argued that the prison withholding Adderall made it difficult
for Bankman-Fried to prepare his defense. Given what I saw today, setting up an
appeal seems wise. It is, at minimum, risk management.Most PopularMost
PopularNFL teams can’t use BlueskyGoogle’s Gemini is already winning the next-
gen assistant warsStar Trek: Section 31 is firing on all cylindersYouTube
Premium gets more experimental features that can now be tested all at onceNvidia
GeForce RTX 5090 review: a new king of 4K is here Verge Deals / Sign up for
Verge Deals to get deals on products we've tested sent to your inbox
weekly.Email (required)Sign upBy submitting your email, you agree to our Terms
and Privacy Notice. This site is protected by reCAPTCHA and the Google Privacy
Policy and Terms of Service apply.From our sponsorAdvertiser Content From
str
import nltk
nltk.download("punkt")
nltk.download("punkt_tab")
from nltk.tokenize import sent_tokenize   # To get separate sentences
sentences = sent_tokenize(news)
print("Number of sentences =", len(sentences))
for s in sentences:
    print(textwrap.fill(s, width=80), end="\n\n")
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
Number of sentences = 63
TechIs Sam Bankman-Fried’s defense even trying to win?The prosecution came out
swinging.

Oddly, Bankman-Fried’s defense didn’t.By  Elizabeth Lopatto, a reporter who
writes about tech, money, and human behavior.

She joined The Verge in 2014 as science editor.

Previously, she was a reporter at Bloomberg.

Oct 4, 2023, 11:02 PM UTCShare this storyThreadsEven the defense’s opening
statement was a bad look for Sam Bankman-Fried Photo Illustration by Cath
Virginia / The Verge I have never seen Sam Bankman-Fried so still as he was
during the prosecution’s opening statement.

The characteristic leg-jiggling was absent.

He barely moved as the prosecutor listed the evidence against him: internal
company files, what customers were told, the testimony of his co-conspirators
and his own words.His hair was shorn, the result of a haircut from a fellow
prisoner, the Wall Street Journal reported.

He wore a suit bought at a discount at Macy’s, per the Journal; it hung on him.

He appeared to have lost some weight.“All of that was built on
lies.”RelatedFTX’s Sam Bankman-Fried is on trial for fraud and
conspiracyBankman-Fried, at this time last year, had a luxury lifestyle as the
CEO of crypto exchange FTX, said the assistant US attorney, Thane Rehn, in the
cadence of a high schooler delivering his lines in a student play.

Bankman-Fried hung out with Tom Brady.

He was on magazine covers, lived in a $30 million penthouse, and spent time with
world politicians.

“All of that was built on lies,” Rehn said.In his opening statement, Rehn dodged
explaining cryptocurrency to the jury.

Instead, he punched hard on Bankman-Fried lying and stealing.

Bankman-Fried sat almost motionless, occasionally glancing at Rehn, as the
prosecutor told the jury that Bankman-Fried sold stock in FTX and borrowed
millions from lenders by lying.

The story Rehn told is familiar to anyone following the news.

In May and June of 2022, Alameda Research — the crypto trading company
ostensibly helmed by Caroline Ellison — didn’t have enough to pay its bills, so
it pulled customer money to repay loans.

By September, the hole in the FTX balance sheet was so big that customers could
never be repaid.FTX “didn’t have a chief risk officer, which became an issue
when the storm hit.”When CoinDesk published its article in November 2022, people
realized FTX was a house of cards, Rehn said.

Meanwhile, Bankman-Fried tweeted.

“FTX is fine.

Assets are fine” and “We don’t invest customer assets even in
treasuries.”Pointing at Bankman-Fried, Rehn said, “This man stole billions of
dollars from thousands of people.” So how was the defense going to follow it up?

I was very curious, having learned yesterday that Bankman-Fried had never been
offered a plea deal since he and his attorneys had told the government they
wouldn’t negotiate.

Surely there would be some manner of evidence, some something, that would have
made him so confident.There was, instead, a metaphor.Defense attorney Mark
Cohen, with the energy of a patient father telling his obnoxious children a
bedtime story, assured us that working at a startup was like building a plane
while flying it, and that FTX the plane had flown right into the perfect storm:
the crypto crash.

Except, uh, he also said this: FTX “didn’t have a chief risk officer, which
became an issue when the storm hit.”I couldn’t stop thinking about the missing
risk officerThe problem with this metaphor is that if FTX was a plane, it was a
plane flying with a key component missing — namely, the risk officer, an
executive whose job it is to, well, manage risk.

This is sort of an important thing, as risks can be anything from reputational
to regulatory to financial.

FTX was named such as it was because it was a futures exchange, which, to borrow
a phrase from Bloomberg’s Matt Levine, “sits between the winners and losers of
bets.” That means FTX can’t pay out what it owes the winners unless the losers
pay up.

Risk management is a crucial part of the business; risk officers exist to
identify business’ potential risks, monitor, and mitigate them.

This is to say nothing of the regulatory risks around crypto.

As Cohen droned on about airplanes, I couldn’t stop thinking about the missing
risk officer.

Bringing it up, I thought, was a tremendous mistake.

The prosecution hadn’t mentioned it.

Either Bankman-Fried is stupid — unlikely — or he deliberately didn’t hire a
risk officer.

Was he worried about what one might find?

Sure, as Cohen put it, Bankman-Fried was a math nerd who didn’t party.

That paints a picture of someone who’s pretty deliberate, particularly since he
immediately left MIT and went to work on Wall Street.

If he had been a party-hardy trainwreck, I could see overlooking a risk officer
in order to do another line, or a supermodel, or something else important.

Why was the defense bringing this up?But as Cohen tried to tell me that FTX’s
and Alameda’s business relationships were “reasonable under the circumstances,”
the lack of risk officer kept elbowing me in the ribs.

“Sam acted in good faith and took reasonable business measures” is a pretty hard
pill to swallow with that in mind.

Man, it’s no good when your defense lawyer has just made you sound worse than
the prosecution already did.

And while Cohen tried to make the common white-collar defense argument that
Bankman-Fried, as CEO, was simply too busy to oversee what everyone did every
day, he just made me more suspicious.

That’s why you hire a risk officer and delegate!

That’s the whole point!

I could barely even hear Cohen blaming Caroline Ellison and Changpeng “CZ” Zhao
for the debacle over the “no risk officer” ringing in my ears.Following the
defense’s opening statements, things got still worse for Bankman-Fried.

The prosecution called its first witness, Marc-Antoine Julliard, whose money got
stuck on FTX.

Juilliard, who was born in Paris and lives in London, testified that he trusted
FTX because Bankman-Fried came across as a leading figure of the industry.

When he was evaluating the exchange, he thought the sheer volume of users was
important, too — at the time, FTX was among the top three biggest exchanges.

Plus, major VC firms had invested, and “they don’t commit hundreds of millions
without doing due diligence, checking the books, the accountancy of the firm,
going through several compliance process[es], so that was a vote of confidence
for me,” Juilliard said.

(Evidently he had not paid attention to the Elizabeth Holmes trial.

)He also noted FTX’s glossy ads — featuring Gisele Bündchen, for instance — 
suggested a very high budget.

It wouldn’t make sense to spend that much money unless FTX had very strong
financials, Juilliard figured.

He opened an account, transferred in both regular money and cryptocurrency, and
used the exchange to execute his plan: buying Bitcoin to sell back in five to
ten years at higher prices.It is a thankless task to cross-examine a customer
whose money is gone In November 2022, things went bad for Julliard.

He followed Bankman-Fried on Twitter, and read aloud the “FTX is fine.

Assets are fine” tweets, along with “FTX has enough to cover all client
holdings.

We don’t invest client assets” and a few others, which gave Julliard the
impression that his money was there — the problem might have been technical
(anti-spam measures) or regulatory.

When he tried to get his money out on November 8th, it was too late.

We saw screenshots of his withdrawal attempts: $20,000 USD and about 4 Bitcoin,
which were worth about $20,000 at the time: about $100,000 money,
inaccessible.It is a thankless task to cross-examine a customer whose money is
gone, but Cohen tried anyway.

He noted that Julliard was a licensed commodities broker, who was trading in
crypto because he didn’t have to disclose it; that Julliard knew that crypto was
new and risky, and that Julliard didn’t review the terms of service agreement
he’d assented to when making his FTX account.

Well, sure, but so what?

The next witness called was Bankman-Fried’s former college (and FTX) roommate,
Adam Yedidia, about whom I expect I will have much more to say tomorrow.

When the jury was dismissed, Bankman-Fried’s lawyers told the judge that he
wasn’t getting his full Adderall doses in prison.

The defense appeared to be setting up the grounds for an appeal — it’s
previously argued that the prison withholding Adderall made it difficult for
Bankman-Fried to prepare his defense.

Given what I saw today, setting up an appeal seems wise.

It is, at minimum, risk management.Most PopularMost PopularNFL teams can’t use
BlueskyGoogle’s Gemini is already winning the next-gen assistant warsStar Trek:
Section 31 is firing on all cylindersYouTube Premium gets more experimental
features that can now be tested all at onceNvidia GeForce RTX 5090 review: a new
king of 4K is here Verge Deals / Sign up for Verge Deals to get deals on
products we've tested sent to your inbox weekly.Email (required)Sign upBy
submitting your email, you agree to our Terms and Privacy Notice.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of
Service apply.From our sponsorAdvertiser Content From
# Python Summarizer
import re
# Pass in a list of sentences, returns a n sentence summary
def text_summarizer(sentences, n_summary):
    n = len(sentences)
    x = [re.split('[ ,.]',j) for j in sentences]
    jaccsim = array(zeros(n*n)).reshape((n,n))
    for i in range(n):
        for j in range(i,n):
            jaccsim[i,j] = len(set(x[i]).intersection(set(x[j])))/len(set(x[i]).union(set(x[j])))
            jaccsim[j,i] = jaccsim[i,j]
    #Summary
    idx = argsort(sum(jaccsim, axis=0))[::-1][:n_summary]  #reverse sort
    summary = [sentences[j] for j in list(idx)]
    #Anomalies
    idx = argsort(sum(jaccsim, axis=0))[:n_summary]
    anomalies = [sentences[j] for j in list(idx)]
    return summary, anomalies
# Get the summary and the anomaly sentences
summary, anomalies = text_summarizer(sentences, int(len(sentences)/4))
summ = "  ".join(summary)
print(textwrap.fill(summ, width=80))
Juilliard, who was born in Paris and lives in London, testified that he trusted
FTX because Bankman-Fried came across as a leading figure of the industry.  He
noted that Julliard was a licensed commodities broker, who was trading in crypto
because he didn’t have to disclose it; that Julliard knew that crypto was new
and risky, and that Julliard didn’t review the terms of service agreement he’d
assented to when making his FTX account.  “All of that was built on lies,” Rehn
said.In his opening statement, Rehn dodged explaining cryptocurrency to the
jury.  He appeared to have lost some weight.“All of that was built on
lies.”RelatedFTX’s Sam Bankman-Fried is on trial for fraud and
conspiracyBankman-Fried, at this time last year, had a luxury lifestyle as the
CEO of crypto exchange FTX, said the assistant US attorney, Thane Rehn, in the
cadence of a high schooler delivering his lines in a student play.  I was very
curious, having learned yesterday that Bankman-Fried had never been offered a
plea deal since he and his attorneys had told the government they wouldn’t
negotiate.  He followed Bankman-Fried on Twitter, and read aloud the “FTX is
fine.  This is to say nothing of the regulatory risks around crypto.  Risk
management is a crucial part of the business; risk officers exist to identify
business’ potential risks, monitor, and mitigate them.  Why was the defense
bringing this up?But as Cohen tried to tell me that FTX’s and Alameda’s business
relationships were “reasonable under the circumstances,” the lack of risk
officer kept elbowing me in the ribs.  When he tried to get his money out on
November 8th, it was too late.  FTX was named such as it was because it was a
futures exchange, which, to borrow a phrase from Bloomberg’s Matt Levine, “sits
between the winners and losers of bets.” That means FTX can’t pay out what it
owes the winners unless the losers pay up.  The story Rehn told is familiar to
anyone following the news.  Sure, as Cohen put it, Bankman-Fried was a math nerd
who didn’t party.  Assets are fine” and “We don’t invest customer assets even in
treasuries.”Pointing at Bankman-Fried, Rehn said, “This man stole billions of
dollars from thousands of people.” So how was the defense going to follow it up?
We don’t invest client assets” and a few others, which gave Julliard the
impression that his money was there — the problem might have been technical
(anti-spam measures) or regulatory.
for a in anomalies:
    print(a)
Was he worried about what one might find?
That’s the whole point!
That’s why you hire a risk officer and delegate!
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.From our sponsorAdvertiser Content From
)He also noted FTX’s glossy ads — featuring Gisele Bündchen, for instance —  suggested a very high budget.
Given what I saw today, setting up an appeal seems wise.
Well, sure, but so what?
Assets are fine” tweets, along with “FTX has enough to cover all client holdings.
Man, it’s no good when your defense lawyer has just made you sound worse than the prosecution already did.
It is, at minimum, risk management.Most PopularMost PopularNFL teams can’t use BlueskyGoogle’s Gemini is already winning the next-gen assistant warsStar Trek: Section 31 is firing on all cylindersYouTube Premium gets more experimental features that can now be tested all at onceNvidia GeForce RTX 5090 review: a new king of 4K is here Verge Deals / Sign up for Verge Deals to get deals on products we've tested sent to your inbox weekly.Email (required)Sign upBy submitting your email, you agree to our Terms and Privacy Notice.
The prosecution called its first witness, Marc-Antoine Julliard, whose money got stuck on FTX.
She joined The Verge in 2014 as science editor.
The next witness called was Bankman-Fried’s former college (and FTX) roommate, Adam Yedidia, about whom I expect I will have much more to say tomorrow.
TechIs Sam Bankman-Fried’s defense even trying to win?The prosecution came out swinging.
In May and June of 2022, Alameda Research — the crypto trading company ostensibly helmed by Caroline Ellison — didn’t have enough to pay its bills, so it pulled customer money to repay loans.

23.6. Modern Methods#

!pip install transformers
Requirement already satisfied: transformers in /usr/local/lib/python3.11/dist-packages (4.47.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from transformers) (3.16.1)
Requirement already satisfied: huggingface-hub<1.0,>=0.24.0 in /usr/local/lib/python3.11/dist-packages (from transformers) (0.27.1)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.11/dist-packages (from transformers) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from transformers) (24.2)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.11/dist-packages (from transformers) (6.0.2)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.11/dist-packages (from transformers) (2024.11.6)
Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from transformers) (2.32.3)
Requirement already satisfied: tokenizers<0.22,>=0.21 in /usr/local/lib/python3.11/dist-packages (from transformers) (0.21.0)
Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib/python3.11/dist-packages (from transformers) (0.5.2)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.11/dist-packages (from transformers) (4.67.1)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.11/dist-packages (from huggingface-hub<1.0,>=0.24.0->transformers) (2024.10.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.11/dist-packages (from huggingface-hub<1.0,>=0.24.0->transformers) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->transformers) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->transformers) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->transformers) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->transformers) (2024.12.14)
from transformers import pipeline
summarizer = pipeline("summarization")
No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
Device set to use cpu
# All in one example
html = requests.get(url, timeout=10).text
doc = fromstring(html)
# x = doc.cssselect(".ArticleParagraph_root__wy3UI")
x = doc.cssselect(".lg\:max-w-none")
news = " ".join([x[j].text_content() for j in range(len(x))])
news = BeautifulSoup(news,'lxml').get_text()
print(len(news))
if len(news)>1024:   # max seq length
    news = news[:1024]
summ = summarizer(news, max_length=int(len(news)/4), min_length=25)
print(summ)
Your max_length is set to 256, but your input_length is only 254. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=127)
9131
[{'summary_text': " Sam Bankman-Fried's hair was shorn, the result of a haircut from a fellow prisoner . He wore a suit bought at a discount at Macy's, it hung on him ."}]

Try this additional blog post for more on the T5 (text to text transfer transformer) summarizer.

https://towardsdatascience.com/simple-abstractive-text-summarization-with-pretrained-t5-text-to-text-transfer-transformer-10f6d602c426

This is a nice web site explaining Hugging Face transformers: https://zenodo.org/record/3733180#.X40RxEJKjlx

And the paper: https://arxiv.org/pdf/1910.10683.pdf

And here is a nice application of the same: https://towardsdatascience.com/summarization-has-gotten-commoditized-thanks-to-bert-9bb73f2d6922

23.7. Long document summarization#

This is not feasible unless we break up the text into maximal chunk sizes and do the summary piecemeal.

html = requests.get(url, timeout=10).text
doc = fromstring(html)
# x = doc.cssselect(".ArticleParagraph_root__wy3UI")
x = doc.cssselect(".lg\:max-w-none")
news = " ".join([x[j].text_content() for j in range(len(x))])
news = BeautifulSoup(news,'lxml').get_text()
print("Size of article =",len(news)," | #Chunks =",int(len(news)/1024))
for j in range(0,len(news),1024):
    print(summarizer(news[j:j+1024], max_length=int(len(news)/4), min_length=25))
Your max_length is set to 2282, but your input_length is only 254. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=127)
Size of article = 9131  | #Chunks = 8
Your max_length is set to 2282, but your input_length is only 254. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=127)
[{'summary_text': " Sam Bankman-Fried's hair was shorn, the result of a haircut from a fellow prisoner . He wore a suit bought at a discount at Macy's, it hung on him ."}]
Your max_length is set to 2282, but your input_length is only 249. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=124)
[{'summary_text': ' FTX CEO Sam Bankman-Fried is on trial for fraud and conspiracy . Assistant US attorney Thane Rehn told the jury the FTX exec sold stock in FTX and borrowed millions from lenders by lying . Rehn dodged explaining cryptocurrency to the jury .'}]
Your max_length is set to 2282, but your input_length is only 246. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=123)
[{'summary_text': ' Rehn: FTX "didn’t have a chief risk officer, which became an issue when the storm hit" Bankman-Fried tweeted, "FTX is fine.  customer money to repay loans"'}]
Your max_length is set to 2282, but your input_length is only 247. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=123)
[{'summary_text': ' FTX was named such as it was because it was a futures exchange, which sits between the winners and losers of bets . That means FTX can’t pay out what it owes the winners unless the losers pay up . Risk officers exist to identify business’ potential risks, monitor, and mitigate them .'}]
Your max_length is set to 2282, but your input_length is only 246. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=123)
[{'summary_text': " Bankman-Fried was a math nerd who didn’t party, Cohen said . He said the defense brought up the missing risk officer, but the prosecution hadn't mentioned it . If he had been a party-hardy trainwreck, he could see overlooking a risk officer to do another line ."}]
Your max_length is set to 2282, but your input_length is only 250. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=125)
[{'summary_text': ' The prosecution called its first witness, Marc-Antoine Julliard, whose money got stuck on FTX . Juilliard testified that he trusted FTX because Bankman-Fried came across as a leading figure of the industry .'}]
Your max_length is set to 2282, but your input_length is only 261. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=130)
[{'summary_text': ' Julliard followed Bankman-Fried on Twitter, and read aloud the “FTX is fine” tweets, along with ‘FTX has enough to cover all client holdings” and “We don’t invest client assets” In November 2022, things went bad .'}]
Your max_length is set to 2282, but your input_length is only 210. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=105)
[{'summary_text': ' Bankman-Fried’s lawyers told the judge that he wasn’t getting  his money back . The jury was dismissed . The next witness called was the former college (and FTX) roommate Adam Yedidia .'}]
[{'summary_text': ' The defense appeared to be setting up the grounds for an appeal . Given what I saw today, setting up an appeal seems wise . It is, at minimum, risk management .'}]