Detecting Context Similarity Over Multiple Documents using Linguistic Features
Presenting a method of comparing multiple documents and determining if both documents have the same context.
Context are defined by the setting of an event, statement, and ideas in which each document can be understood and assessed.
The language handled by this study, will be in the Indonesian language. 3 different approaches will be used to determine
context similarity. 2 of the approaches would be adopted from TF*IDF method, while the last one would extract information
via evaluating Indonesian language’s forms. By using these methods, keywords would be automatically generated in the
algorithm, requiring minimal human participation to get the desired result. Evaluation of the algorithm shows discerning
results between matching and non matching documents.
Index Terms- Context Similarity, Linguistic, Similarity of Documents