Lists of Datasets
- DATASET Create with AI
- 画像・動画データセットリスト - Qiita
- arXivTimes/datasets at master · arXivTimes/arXivTimes · GitHub
Web access
- UCI Machine Learning Repository: Amazon Access Samples Data Set
- Publicly available access.log datasets · GitHub
- Common Crawl
- ICWSM Weblog Dataset
Type of data
- user behavioral data
- page view log
- click log
- system log
- log generated by middleware or machines which consists of the system such as syslog,
Images
- GitHub - openimages/dataset: The Open Images dataset
- COCO - Common Objects in Context
- SUN Database
- Scene recoginitionのdataset
- MIT Places Database for Scene Recognition
- Creative Commons
- Places2とLicenseの文章が異なる
- Scene recognitionのdatasetで大規模な者
- Places Database
- Creative Commons
- Places2: A Large-Scale Database for Scene Understanding
- Creative Commons
- PlacesとLicenseの文章が異なる
- scene recognition
- Placesの改良版
- 10 million dataset
- Creative Commons
Annotation tool
Food
- im2recipe
- registration required
- no commercial usage
- 商用利用は禁止で、educational purpose, reserachのみ
- redistribution is not permitted
- 配布は基本的に禁止、同僚に研究目的で配布する場合はTerms and conditionに同意が必要
- emailはorganizationのものしか利用できない
- emails which do not belong to any organizations such as (gmail, live.com) is not allowed
- Food-101 – Mining Discriminative Components with Random Forests
- Food Image Dataset MMSPG
Geographical
Weather
Advertisement
- UCI Machine Learning Repository: Internet Advertisements Data Set
- The features encode the geometry of the image (if available) as well as phrases occuring in the URL,
- the image’s URL and alt text,
- the anchor text,
- words occuring near the anchor text
- The task is to predict whether an image is an advertisement (“ad”) or not (“nonad”).
- Webscope | Yahoo Labs
- Outreach - Criteo AI Lab
- Kaggle Display Advertising Challenge Dataset - Criteo Labs
Reference
- 機械学習に使える、オープンデータ一覧 ※随時更新 - Beginning AI
- Quandl Financial, Economic and Alternative Data
- DataSet - 機械学習の「朱鷺の杜Wiki」
- GitHub - caesar0301/awesome-public-datasets: An awesome list of high-quality open datasets in public domains (on-going). By everyone, for everyone!
- Discover Published Data — Google Genomics v1 documentation
- Datasets for Data Mining and Data Science
- Open Datasets | Skymind