Top 10 Open Source Datasets for Computer Vision projects

NOV 03, 2021

These are some of the widely used open-source datasets for computer vision.

Computer vision is currently one of the most exciting domains in the tech world. It is a major component of several AI and ML-driven applications and platforms and is revolutionizing almost every industry. The traditional methods of how businesses and machines work have been upgraded by this well-established area of computer science. There have been diverse applications of computer science in autonomous object detection, image captioning, video frame analytics, and medical image analysis to name a few. In this article, we discuss the most widely used open-source datasets for computer vision.

• ImageNet: It is an ongoing research effort aiming to provide researchers with an accessible image database. It is one of the most well-known image databases that is liked by researchers and learners alike. ImageNet aims to provide an average of 1000 images to illustrate each synset.

• CIFAR-10 and CIFAR-100: CIFAR-10 and CIFAR-100 are a collection of images that are used to train machine learning and computer vision algorithms by beginners working in the field. These are also some of the most popular datasets for machine learning for quick comparison of algorithms as it captures the weaknesses and strengths without putting much burden on the parameter tuning process.

• MS COCO: The MS COCO dataset, also known as the Microsoft Common Objects in Context, consists of 328K images. It annotates for object detection, key points detection, panoptic segmentation, captioning, and dense human pose estimation.

• MPII Human Pose: This dataset is used for the evaluation of articulated human pose estimation. It consists of around 25K images comprising over 40K people with annotated body joints. Each image is extracted from different YouTube videos and is provided with preceding. Overall, the dataset covers around 410 humans, and each image is labelled with a different activity.

• Barkley DeepDrive: It is a dataset that is mostly used for autonomous vehicle training. It comprises over 100K video sequences with diverse kinds of annotations like object bounding boxes, drivable areas, image-level tagging, lane markings, and much more. Furthermore, the dataset presents a wide variety in representing various geographic, environmental, and weather conditions.

• CityScapes: It is a database containing a diverse set of stereo and video sequences recorded in the street scenes from 50 different cities. It also includes semantic, instance-wise, and dense pixel annotations for 30 divisions grouped in 8 categories. CityScapes provides pixel-level annotations for 5000 frames and 20,000 coarsely annotated frames.

• LabelMe: LabelMe is a project created to provide a dataset of digital images with annotations. This platform is dynamic, free to use, and open to public contribution, as well. The tool can be accessed anonymously or by logging in to a free account. Users must have access to a compatible web browser with JavaScript support.

• CheXpert: CheXpert is a large dataset of chest X-rays and competition for automated chest X-ray interpretation. It features uncertainty labels and radiologist-labeled references for standard evaluation sets. Since chest radiography is the most common imaging examination for the management of life-threatening diseases, this platform proves efficient for several reasons.

• Flickr-30K: The Flick-30K has become a benchmark for image captioning. The annotations provided by these datasets enable the localization of textual entities that are mentioned in an image. The dataset contains 31,000 images collected from Flickr, together with 5 reference sentences provided by human annotators.

• LSUN: LSUN, also known as the Large-Scale Scene Understanding, contains close to 1 million labelled images, aiming to provide a different benchmark for large-scale scene classification and understanding. It also contains 10-scene categories such as dining room, bedroom, outdoor church, and so on.