EVALUATION OF THESAURUS SIMILARITY USING EXPANSION OF QUERY IN INFORMATION RETRIEVAL SYSTEMS SPEAK INDONESIAN
EVALUASI PENGGUNAAN SIMILARITY THESAURUS TERHADAP EKSPANSI KUERI DALAM SISTEM TEMU KEMBALI INFORMASI BERBAHASA INDONESIA
Keywords:
query expansion, similarity thesauruAbstract
Terms or tokens are the main component in information retrieval system. The use of them as index and
query affects the performance of the system. This research is conducted to observe how similarity thesaurus
improves the performance of the Indonesian information retrieval system through query expansion. By using 30
sets of query and 1.000 documents, series of tests are conducted using different weight of terms in query to
measure the performance of the system before and after query expansion. By using cosine as similarity
measurement and the weight of the query terms, the terms used in query expansion can be determined. Two
treatments that were used are by taking 5 (TH5) and 10 (TH10) terms that has the biggest similarity value with
the query. It is found that overall the query expansion improve the performance of the system compared to the
one without query expansion (NoTH). However, it also depends on the weight of the terms in the query. On three
experiment combined with NoTH, TH5, and TH10, the results show that idf is proved to be better used as weight
of the terms in query in order to improve the performance of the system, either using query expansion or without
query expansion.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Fridolin Febrianto Paiki
This work is licensed under a Creative Commons Attribution 4.0 International License.