A Benchmark and Large Dataset for Trademark Retrieval
METU Dataset

v1 (see here for v2)

Introduction

The METU Dataset is a large dataset (the largest publicly available logo dataset as of 2014), which is composed of more than 900K real logos belonging to real companies worldwide. The dataset also includes query sets of varying difficulties, allowing Trademark Retrieval researchers to benchmark their methods against other methods to progress the field.

Trademark Dataset

The Dataset includes 930,053 logo images of different types: text only logos, figure only logos and text+figure combined logos. See Figure 1 for samples.

Figure 1: trademark samples from main dataset

Query Set

Our dataset includes very challenging queries that existing Computer Vision, Pattern Recognition and Image Retrieval methodologies have difficulties with (study is ongoing). See Figure 2 or query_set for some sample queries and similarities that are expected to be discovered.

Table 1: Details of METU dataset

Aspect Value
trademarks930,372
unique register firms409,834
unique trademarks691,149
trademarks containing text only589,562
trademarks containing figure only19,394
trademarks containing figure and text312,154
other trademarks8,942
image formatJPEG
Max Resolution1800x1800(pixel)
Min Resolution30x30(pixel)

Why Another Dataset?

The literature already has a few logo datasets: MPEG 7 Shape Matching Dataset, UMD Logo Dataset, UMD Logo Dataset , Tobacco800 Document Image Database . Although these datasets have been very useful in logo retrieval and matching studies, they are limited in the number of images and the types of queries that can be performed - see Table 2. Therefore, to be able to advance the logo retrieval field, a challenging large dataset is required, and we hope that METU Dataset will fill in this gap.

Table 2: Comparison of existing logo datasets.

Logo dataset Number of images
MPEG 7 Shape Matching Dataset 1,400
UMD Logo Dataset 106
BelgaLogos Dataset 10,000
Tobacco800 Document Image Database 1,290
METU Dataset (This set) 930,053

Figure 2: similar trademark set samples from query dataset

Download Instructions

For the time being, the dataset is available per request only. If you are interested in the dataset for research purposes, please contact Sinan Kalkan with your intention.

Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Citation

Please cite the following paper if you use this dataset:

O. Tursun, S. Kalkan, "A Challenging Big Dataset for Benchmarking Trademark Retrieval", 14th IAPR Conference on Machine Vision and Applications, 2015.

Acknowledgements

This work is partially funded by the Ministry of Science, Turkey, under the project SANTEZ-0029.STZ.2013-1. We would like to thank Usta Bilgi Sistemleri A.Ş. and Grup Ofis Marka Patent A.Ş.for their contributions to the project, by especially making the dataset available for us and the whole community.

Contacts