Classification and similarity detection of Indonesian scientific journal articles

Nyimas Sabilina Cahyani, Deris Stiawan, Abdiansah Abdiansah, Nurul Afifah, Dendi Renaldo Permana

Abstract


The development of technology is accelerating in finding references to scientific articles or journals related to research topics. One of the sources of national aggregator services to find references is Garba Rujukan Digital (GARUDA), developed by the Ministry of Education, Culture, Research, and Technology (Kemendikbudristek) of the Republic of Indonesia. The naïve Bayes method classifies articles into several categories based on titles and abstracts. The system achieves an F1-score of 98%, which indicates high classification accuracy, and the classification process takes less than 60 minutes. Article similarity detection is done using the cosine similarity method, and a similarity score of 0.071 reflects the degree of similarity between the title and the abstract that has been concatenated, while a score close to 1 indicates a higher similarity. Searching for similar scientific articles based on title and abstract, sort articles based on the results of the highest similarity score are the most similar articles, and generating article categories. The results of the research show that the proposed method significantly improves the classification and search processes in GARUDA, as well as accurate and efficient similarity detection.

Keywords


Classification; Cosine similarity; GARUDA; Naïve Bayes; Similarity

Full Text:

PDF


DOI: https://doi.org/10.11591/csit.v6i2.p147-158

Refbacks

  • There are currently no refbacks.


Computer Science and Information Technologies
p-ISSN: 2722-323X, e-ISSN: 2722-3221
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Universitas Ahmad Dahlan (UAD).

CSIT Visitor Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.