Giant, free index to world’s research papers released online

Catalogue of billions of phrases from 107 million papers could ease computerized searching of the literature.

Catalogue at: The General Index : Public Resource : Free Download, Borrow, and Streaming : Internet Archive

The General Index consists of 3 tables derived from 107,233,728 journal articles. A table of n-grams, ranging from unigrams to 5-grams, is extracted using SpaCy. Each of the 355,279,820,087 rows of the n-gram table consists of an n-gram coupled with a journal article id. A second table is constructed using Yake and consists of 19,740,906,314 rows, each with a keywords and an article id. A third table associates an article id with metadata.


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

This is pretty cool. Thanks for posting.