Recently, a href="http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/" target="_blank">Caltech pedestrian dataset is often used as a benchmark for Computer Vision. However, this dataset is in an extraordinary format, and so it is not easy to handle it. You can handle it easier by using Matlab, but it is troublesome if you intend to convert it for Python for the sake of, for example, deep learning. I developed conversion tools, so I publish them here.
Pedestrian detection problem, especially this dataset, is known as a difficult problem/benchmark. This dataset is much larger than other pedestrian databases, and thus it is suited when very many data is required, such as deep learning cases.
Video conversion
Caltech video is in so-called "seq" format. A program that converts it to a format readable by Python programs is available at the following URL. reading .seq files from caltech pedestrian dataset
I used this program and found that it cannot read the last frame of each file correctly, so an error occurs. However, other frames can be read correctly, so I imitate this program for handling the files.
Conversion of bounding boxes
Caltech dataset includes a file of annotations. This file contains bounding box information; that is, rectangles that enclose pedestrians. They are in so-called "vbb" format, which is a binary Matlab format. A binary format is difficult to be handled, so I converted the files into text format. The program called code3 in Matlab (two functions in file named "vbb.m" in the directory), which is linked from the Caltech dataset page, can be used for converting them. The two functions are one that reads binary vbb file (A = vbbLoad(file)) and one that writes text-format vbb file (vbbSaveTxt(A, textFileName, timeStamp)).
The files converted to text format can be handled (further converted) by my programs. Because pattern matching in Python is complicated, I used Perl for converting them to Python format. The following Perl program generates a Python program.
### Bounding box extractor for textual VBB file ### # # Public domain program # coded by Yasusi Kanada # 2015-6-22 open(input, "annotations/${ARGV[0]}/${ARGV[1]}vbb.txt"); print "${ARGV[0]}_${ARGV[1]}=[\n"; while (<input>) { if (/^lbl='(person(-fa|\?)?|people)'\s+str=(\d+)\s+end=(\d+)\s+hide=(\d+)/) { $type = $1; $str = $3; $end = $4; $hide = $5; $pos = ''; $posv = ''; $occl = ''; $lock = ''; } if (/^(pos|posv)\s*=\s*\[(([-\d\w\.\;\s])*)\]/) { $name = $1; $text = $2; $text =~ s/;\s+/\] \[/g; $text =~ s/\s+/, /g; $text = "[[${text}]]"; $text =~ s/, \[\]//; if ($name eq 'pos') { $pos = $text; } else { $posv = $text; } } elsif (/^(occl|lock)\s*=\s*\[(([-\d\w\.\s])*)\]/) { $name = $1; $text = $2; $text =~ s/\s+/, /g; $text = "[${text}]"; if ($name eq 'occl') { $occl = $text; } else { $lock = $text; print " \{'type':'$type', 'firstFrame':$str, 'lastFrame':$end, 'hide':$hide,\n"; print " 'pos':$pos,\n 'posv':$posv,\n"; print " 'occluded':$occl,\n 'lock':$lock\},\n"; } } } print "]\n";
The generated program contains a list per pedestrian. The original file contains separated pedestrian (person) data, and also contains data of people and "person-fa". For each pedestrian, this file contains the first frame number, the last frame number, and the bounding box information of the frames in between: the location (x and y) and the size (width and height). However, this program still have problem (bug); it cannot read some of the files.