GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks across diverse domains. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text. GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization-all without task-specific training.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |