MACHINE LEARNING DATASETS

High-quality training datasets, made available in one place.
A place to easily discover and discuss open datasets.

We are adding new datasets daily! To contribute click here.

POPULAR DATASETS

Open Source Biometric Recog...

A communal biometrics framework supporting the development of open algorithms and reproducible evaluations. OpenBR is a framework for investigating new...

face detection, biometric, age estimation, gender estimation

Netflix Prize

Netflix released an anonymized version of their movie rating dataset; it consists of 100 million ratings, done by 480,000 users who have rated between 1...

ranking, movie

Uber 2B trip data

Uber Movement provides anonymized data from over two billion trips to help urban planning around the world. You need to sign up to download this data.

uber, urban planning, trips

MNIST handwritten digits

MNIST: handwritten digits: The most commonly used sanity check. Dataset of 25x25, centered, B&W; handwritten digits. It is an easy taskjust because some...

natural-image

Google Audioset

AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTu...

google, vehicle, music, speech

WikiText

The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikip...

language modeling, wiki

Searchable Machine Learning Datasets

It is hard to find the relevant datasets for a machine learning problem you are working on. We believe researchers should focus on improving the models and innovating in AI. We are here to help you with the time consuming peice of finding datasets.

image

1000+ Datasets

Launching with 1000+ datasets across multiple fields with a goal to continously increase this number.

Collaborate

A community for AI researchers & developers to learn and share​ ​their ​knowledge of datasets and models.

Better Search

We are focussed on increasing the searchability of datasets, both by better underlying metadata and better UI.

Marketplace

Coming Soon! We are exploring an option to create a dataset marketplace to further simply the process of acquiring datasets.

If you have any feedback or any features you will like us build on, please send an email to hello@classif.ai

NEWEST DATASETS

UNIMIB2016 Food Database

This database can be used for food recognition and segmentation. The database is composed of 1,027 tray images with multiple foods and containing 73 foo...

food segmentation, Food recognition

RawFooT DB: Raw Food Textur...

The Raw Food Texture database (RawFooT) has been specially designed to investigate the robustness of descriptors and classification methods with respect...

texture, food

Oxford Audiovisual Segmenta...

This dataset consists of RGB-D videos in indoor scenes, and has dense, per-frame segmentation labels for both object and material categories. Moreover, ...

Depth, Objects, Semantic Segmentation, Materials, RGB-D, Scene Understanding, Audio-visual, Audio, Places, Scenes

Mapillary Vistas

Mapillary Vistas is the currently largest, publicly available street view image dataset. Stats: 25,000 Images | 100 Categories | 60 Instance-wise C...

test dataset

testing something

ad

DukeMTMC-reID

DukeMTMC-reID is a subset of the DukeMTMC for image-based re-identification, in the format of the Market-1501 dataset. There are 16,522 training images ...

person re-ID, person re-identification, person search, pedestrian retrieval

GET STARTED!

Signup to start to collaborating, contributing and participating in the dataset discussions! You can always browse the datasets without signing up.