After some thought I agree with you that this is the wrong problem to solve.
I took a narrative detour I wanted to share:
Suppose we make the analogue of a scientific paper to a piece of mineral ore (in terms of their raw content, and without written symbols in them for the sake of the analogy) extracted from some mine or quarry. This ore is somehow useful to someone, even if its value is structural: the shingles on an academic roof or a heavyweight desk. What a summarizer attempts to do is use a generic refinement process that will grind up the ore and then separate the components of interest such as Iron, Uranium, or Gold.
Anyone thinking that all of metallurgy reduces to simply throwing the slab into a machine and have it spew out the precious metals will find, instead, more complexity than they bargained for, and have more questions on machines or methods to resolve. Gold, Iron, Uranium, all have different extraction process.
I believe this approach may give some insight in what problems to solve instead with AI: focus on those discoveries that have helped advance "metallurgy", those of discovering and understanding the structure of the mineral ore and contents (scientific papers) and their relation with current technologies at the time, not on the philosopher's stone of 'summarizing' process more akin to a hammer that makes everything seem like a nail.
Highly intelligent human beings have a natural ability to summarize big ideas into TLDRs. Are humans basically a bunch of "summarizers"? Probably not. Is this ability to summarize or compress big ideas into smaller, more condensed pieces of information, important to the human race? Yes, I would say that they are. So to me, this is certainly one of those problems that we correctly attempt to solve.
Speaking of abstracts, did you read the abstract from the paper you're commenting about? I don't think you did, because it outlines how this approach is different, and maybe if there was something better than the abstract, you would have read it and not assume it's the same as what we're using abstracts for today.
In short, here's the major differences:
> SciTLDR contains both author-written and expert-derived TLDRs
> CATTS improves upon strong baselines under both automated metrics and human evaluations
Abstracts are important (and clearly key in generating these TLDRs), but when it comes to ranking and recommending other papers (not to mention noting whether a new paper has content that can actually push a field forward) an abstract just isn't enough.
Everybody * scans papers for the piece of info they happen to be searching for, before reading it in any detail. The abstract should contain that, but might not. And nobody * reads the entire abstract anyway.
A clinician scanning a medical paper is looking for patient relevance: should they use the approach described? The statistical details are too intimidating, the preamble is irrelevant, they know the scope of the problem already.
This is not what "should" happen, but it is what actually happens.
The gap between published findings and clinical practice is several years. The peer review and publication process are way out of touch with clinical reality.
On top of this, people find articles using Google and read them on their phones. (In reality, they read summarised opinion pieces found via Google.)
A systematic reviewer may read papers in full. But even they scan papers for inclusion/exclusion criteria first. The deeper the information is buried, the greater the risk of misclassification. I'm not suggesting that TLDRs will fix this, it's just another data point in why we're seeing TLDRs being created.
* "Everybody" and "nobody" here excludes researchers :)
In some situations abstracts serve as bibliographic metadata rather than a summary of the content. Examples includes cases where the content is hidden behind a paywall or, in defence, when a paper's content is classified in some way but the existence of the paper itself is not. In both cases, the abstract may help you decide whether it is worth accessing the full paper, but on its own won't give you an answer. E.g "we studied X" but not "and concluded Y".
Obviously abstracts can include a content summary as well as bibliographic metadata, but not all do.
This kind of effort serves the function of helping people to _approximately_ "know what is known", but it's really not very useful to the more important part of research efforts, which is to know what is not known.
A large part of research is spent on the understanding of what is known; parsing papers is part and parcel for professors, grad students, and corporate R+D alike.
No idea if their approach is useful, but they are tackling a worthwhile problem.
Consider the paper's abstract: "We introduce TLDR generation, a new form of extreme summarization, for scientific papers. TLDR generation involves high source compression and requires expert background knowledge and understanding of complex domain-specific language. To facilitate study on this task, we introduce SciTLDR, a new multi-target dataset of 5.4K TLDRs over 3.2K papers. SciTLDR contains both author-written and expert-derived TLDRs, where the latter are collected using a novel annotation protocol that produces high-quality summaries while minimizing annotation burden. We propose CATTS, a simple yet effective learning strategy for generating TLDRs that exploits titles as an auxiliary training signal. CATTS improves upon strong baselines under both automated metrics and human evaluations. Data and code are publicly available at this https URL."
The algorithm summarizes it as:
“We introduce TLDR generation, a new form of extreme summarization, for scientific papers that produces high-quality summaries while minimizing annotation burden.”
This is very neat work, and I would never have predicted that abstractive summarization would end up advancing so much more quickly in general than extractive summarization did from transformers being introduced. Makes me wish that simple highlighting of a document at the word-level was actually a sorta "solved" (gives compelling output more often then not) problem like condensed abstractive summarization is...
Yeah. I happen to have been looking at this problem in my spare time recently. I tried a bunch of abstractive AIs and approaches, and none produce consistently usable results.
I'm sticking with extractive approaches plus a bunch of hard-coded general and domain-specific rules for now.
Not sure what I was expecting. It gave me back the first line of the abstract as response. (For anyone wondering, the paper I tried was: "A Heterarchy of values determined by the topology of nervous nets" by Warren S. McCulloch.)
(hey peter!) I did get very excited about it when I heard about it, but cooled significantly when I learned that it was trained only on cs papers and required the full abstract plus paper text. nice proof of concept but going to need significant work to generalize for, say, a newsletter business like yours
This is potentially a godsend for me - I was faced with having to write a 1 paragraph summary of 70 student dissertations for an accreditation process next week. Abstracts are too long. Fingers crossed it works as advertised!
I played around with this on the demo page and found that while the generated "TLDR" are pretty good, they tend to generate sentences composed of fragments of existing sentences. Basically, it seems vaguely extractive in nature. Never did I see it summarize a concept in new words, or try to dumb down a complicated concept further than the original paper did. Given the results of GPT3 I would think that it should be possible to do much better by now, at least with enough data and compute time.
I think you’re underestimating how hard what you’re describing is. GPT-3 can mimic the language of reasoning but that doesn’t mean it’s capable of higher order reasoning.
It’s impressive but doesn’t the need for a “good” prompt kind of show it doesn’t have the strong reasoning required to do the task you’re describing? Also there’s some interesting critique here https://www.lesswrong.com/posts/ZHrpjDc3CepSeeBuE/gpt-3-a-di...
Being very efficient at mostly extractive summarization and abstaining from abstractive summarization does seem a better bet though, because fewer things can go wrong and it is easier to check the summaries against the full text.
While there are lots of TLDR websites out there, I want to know how this one is different from them. I get it; many scientific papers are to some extent bs, and many are just wrong. For PhDs, it's a hassle to go through all of that bs to find something that is actually true. I feel like PhDs basically have to spend hundreds of hours reading papers that don't really benefit them. Tools like this could probably help with that, but as long as scientific success is measured by how may papers you've published and/or how long your papers are, I don't see any hope of actually doing science in the coming years when the academia will be essentially "saturated" with papers.
In this particular case it's excusable as English is the Lingua Franca of science today. Used to be Latin, then French and German, now it's English. No big deal IMO.
In fact I kind of like the way this is going since it represents a fantastic opportunity for NL researchers to stand out simply by publishing research and corpora focused exclusively on low-resource languages and non-English/Mandarin in general.
It is also important to note that most of the ML research in the field is pretty much language agnostic and is concerned with general concept such as efficient en-/decoding [1], training methods [2], and even stealing pre-trained weights from APIs (like GPT-2 or even 3) without paying for training [3] :)
It's just easier to get your hands on and verify English corpora, results and pre-trained models for reproducibility than say Mongolian or Gaelic so that's a factor, too.
I am afraid we will still need humans actually going through papers to get nuances, original contributions, if any, and wider narratives? (plug: the content project in my profile)