TY - GEN
T1 - Osmanlica kelimeleri eşleme
AU - Ataer, Esra
AU - Duygulu, Pinar
PY - 2007
Y1 - 2007
N2 - Large archives of Ottoman documents are challenging to many historians all over the world. However, these archives remain inaccessible since manual transcription of such a huge volume is difficult. Automatic transcription is required, but due to the characteristics of Ottoman documents, character recognition based systems may not yield satisfactory results. It is also desirable to store the documents in image form since the documents may contain important drawings, especially the signatures. Due to these reasons, in this study we treat the problem as an image retrieval problem with the view that Ottoman words are images, and we propose a solution based on image matching techniques. The bag-of-visterms approach, which is shown to be successful to classify objects and scenes, is adapted for matching word images. Each word image is represented by a set of visual terms which are obtained by vector quantization of SIFT descriptors extracted from salient points. Similar words are then matched based on the similarity of the distributions of the visual terms. The experiments are carried out on printed and handwritten documents which included over 10,000 words. The results show that, the proposed system is able to retrieve words with high accuracies, and capture the semantic similarities between words.
AB - Large archives of Ottoman documents are challenging to many historians all over the world. However, these archives remain inaccessible since manual transcription of such a huge volume is difficult. Automatic transcription is required, but due to the characteristics of Ottoman documents, character recognition based systems may not yield satisfactory results. It is also desirable to store the documents in image form since the documents may contain important drawings, especially the signatures. Due to these reasons, in this study we treat the problem as an image retrieval problem with the view that Ottoman words are images, and we propose a solution based on image matching techniques. The bag-of-visterms approach, which is shown to be successful to classify objects and scenes, is adapted for matching word images. Each word image is represented by a set of visual terms which are obtained by vector quantization of SIFT descriptors extracted from salient points. Similar words are then matched based on the similarity of the distributions of the visual terms. The experiments are carried out on printed and handwritten documents which included over 10,000 words. The results show that, the proposed system is able to retrieve words with high accuracies, and capture the semantic similarities between words.
UR - https://www.scopus.com/pages/publications/50249128754
U2 - 10.1109/SIU.2007.4298650
DO - 10.1109/SIU.2007.4298650
M3 - Konferans katkısı
AN - SCOPUS:50249128754
SN - 1424407192
SN - 9781424407194
T3 - 2007 IEEE 15th Signal Processing and Communications Applications, SIU
BT - 2007 IEEE 15th Signal Processing and Communications Applications, SIU
T2 - 2007 IEEE 15th Signal Processing and Communications Applications, SIU
Y2 - 11 June 2007 through 13 June 2007
ER -