- data
- text
- document
- bin
- idx
- pile
- ArXivDataset
- BookCorpusDataset
- Books3Dataset
- DMMathDataset
- EnronEmailsDataset
- EuroParlDataset
- ExPorterDataset
- FreeLawDataset
- GithubDataset
- GutenbergDataset
- HackerNewsDataset
- OpenWebText2Dataset
- OpensubtitlesDataset
- PhilPapersDataset
- PubMedCentralDataset
- PubMedDataset
- StackExchangeDataset
- USPTODataset
- UbuntuIRCDataset