High-quality training datasets, made available in one place.
A place to easily discover and discuss open datasets.

We are adding new datasets daily! To contribute click here.


Open Source Biometric Recog...

A communal biometrics framework supporting the development of open algorithms and reproducible evaluations. OpenBR is a framework for investigating new...

face detection, biometric, age estimation, gender estimation

Netflix Prize

Netflix released an anonymized version of their movie rating dataset; it consists of 100 million ratings, done by 480,000 users who have rated between 1...

ranking, movie

Uber 2B trip data

Uber Movement provides anonymized data from over two billion trips to help urban planning around the world. You need to sign up to download this data.

uber, urban planning, trips

MNIST handwritten digits

MNIST: handwritten digits: The most commonly used sanity check. Dataset of 25x25, centered, B&W; handwritten digits. It is an easy taskjust because some...


Google Audioset

AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTu...

google, vehicle, music, speech


The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikip...

language modeling, wiki

Searchable Machine Learning Datasets

It is hard to find the relevant datasets for a machine learning problem you are working on. We believe researchers should focus on improving the models and innovating in AI. We are here to help you with the time consuming peice of finding datasets.


1000+ Datasets

Launching with 1000+ datasets across multiple fields with a goal to continously increase this number.


A community for AI researchers & developers to learn and share​ ​their ​knowledge of datasets and models.

Better Search

We are focussed on increasing the searchability of datasets, both by better underlying metadata and better UI.


Coming Soon! We are exploring an option to create a dataset marketplace to further simply the process of acquiring datasets.

If you have any feedback or any features you will like us build on, please send an email to


Oxford Audiovisual Segmenta...

This dataset consists of RGB-D videos in indoor scenes, and has dense, per-frame segmentation labels for both object and material categories. Moreover, ...

Depth, Objects, Semantic Segmentation, Materials, RGB-D, Scene Understanding, Audio-visual, Audio, Places, Scenes

Mapillary Vistas

Mapillary Vistas is the currently largest, publicly available street view image dataset. Stats: 25,000 Images | 100 Categories | 60 Instance-wise C...

test dataset

testing something



DukeMTMC-reID is a subset of the DukeMTMC for image-based re-identification, in the format of the Market-1501 dataset. There are 16,522 training images ...

person re-ID, person re-identification, person search, pedestrian retrieval

Gastrointestinal Lesions in...

This dataset contains the features extracted from a database of colonoscopic videos showing gastrointestinal lesions. It also contains the ground truth ...

multivariate, classification

TTC-3600: Benchmark dataset...

The dataset consists of a total of 3600 documents including 600 news/texts from six categories economy, culture-arts, health, politics, sports and tech...

text, classification, clustering


Signup to start to collaborating, contributing and participating in the dataset discussions! You can always browse the datasets without signing up.