Statistical Analysis Approach to Determine Language Quality of Students’ Responses Using WorldNet Similarity Techniques


  • Farhan Ullah COMSATS Institute of Information Technology, Sahiwal
  • M. Farhan COMSATS Institute of Information Technology, Sahiwal
  • K. R. Malik COMSATS Institute of Information Technology, Sahiwal
  • M. M. Iqbal University of Engineering and Technology, Taxila
  • M. Ibrar COMSATS Institute of Information Technology, Sahiwal
  • Z. Rahman Sichuan University, 610064, Chengdu, China


This study aims to electronically assessed (e-assessment) students’ replies in response to teachers’ question. It can be useful to systematize the question answering context regarding matching text semantically through WordNet semantic similarity techniques. WordNet is a lexical database of words’ synonyms. It uses group of synonyms called synsets for semantical operation of English text. For this purpose, a new methodology is proposed to automate e-assessment in the field of education. The collected dataset contains 210 pairs of words extracted from different undergraduate students’ replies in contradiction of teacher’s question statement. Further WordNet similarity measures i.e. Path Length, Lin, Wu &Palmer and Hirst & Onge are used to compute the semantic relatedness score. In the pilot study 42 pair of words were extracted from 8 students’ replies, which are marked using semantic similarity measures and equated with teacher’s marks. Teachers are provided with four boxes of the mark while our developed method provides a precise measure of marks. The experiment is shown with comprehensive dataset resulting with words’ frequencies in similarity measures.

Author Biographies

Farhan Ullah, COMSATS Institute of Information Technology, Sahiwal

Department of Computer Science

M. Farhan, COMSATS Institute of Information Technology, Sahiwal

Department of Computer Science

K. R. Malik, COMSATS Institute of Information Technology, Sahiwal

Department of Computer Science

M. M. Iqbal, University of Engineering and Technology, Taxila

Department of Computer Science and Engineering

M. Ibrar, COMSATS Institute of Information Technology, Sahiwal

Department of Computer Science

Z. Rahman, Sichuan University, 610064, Chengdu, China

College of Computer Science



Sidorov, G., A. Gelbukh, H. Gómez-Adorno, and D. Pinto, Soft similarity and soft cosine measure: Similarity of features in vector space model. Computación y Sistemas, 2014. 18(3): p. 491-504.

Sidorov, G., H. Gómez-Adorno, I. Markov, D. Pinto, and N. Loya. Computing text similarity using tree edit distance. in Fuzzy Information Processing Society (NAFIPS) held jointly with 2015 5th World Conference on Soft Computing (WConSC), 2015 Annual Conference of the North American. 2015. IEEE.

Sidorov, G., F. Velasquez, E. Stamatatos, A. Gelbukh, and L. Chanona-Hernández. Syntactic dependency-based n-grams as classification features. in Mexican International Conference on Artificial Intelligence. 2012. Springer.

Luaces, O., J. Díez, A. Alonso-Betanzos, A. Troncoso, and A. Bahamonde, Content-based methods in peer assessment of open-response questions to grade students as authors and as graders. Knowledge-Based Systems, 2016.

Agirre, E., C. Banea, D. Cer, M. Diab, A. Gonzalez-Agirre, R. Mihalcea, and J. Wiebe. Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. in Proceedings of the 10th International Workshop on Semantic Evaluation. 2016.

Kastner, M., J. Antony, C. Soobiah, S.E. Straus, and A.C. Tricco, Conceptual recommendations for selecting the most appropriate knowledge synthesis method to answer research questions related to complex evidence. Journal of clinical epidemiology, 2016. 73: p. 43-49.

Kang, H.Y., S.H. Moon, H.J. Jang, D.H. Lim, and J.H. Kim, Validation of" quality-of-life questionnaire in Korean children with allergic rhinitis" in middle school students. Allergy, Asthma & Respiratory Disease, 2016. 4(5): p. 369-373.

Burrows, S., I. Gurevych, and B. Stein, The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 2015. 25(1): p. 60-117.

Kim, J., G. Chern, D. Feng, E. Shaw, and E. Hovy. Mining and assessing discussions on the web through speech act analysis. in Proceedings of the Workshop on Web Content Mining with Human Language Technologies at the 5th International Semantic Web Conference. 2006.

Otegi, A., X. Arregi, O. Ansa, and E. Agirre, Using knowledge-based relatedness for information retrieval. Knowledge and Information Systems, 2015. 44(3): p. 689-718.

Partalas, I., A. Kosmopoulos, N. Baskiotis, T. Artieres, G. Paliouras, E. Gaussier, I. Androutsopoulos, M.-R. Amini, and P. Galinari, Lshtc: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581, 2015.

Cigdem, H. and S. Oncu, E-assessment adaptation at a military vocational college: student perceptions. Eurasia Journal of Mathematics, Science & Technology Education, 2015. 11(5): p. 971-988.

Chen, T. and B. Van Durme, Discriminative Information Retrieval for Knowledge Discovery. arXiv preprint arXiv:1610.01901, 2016.

Bille, P., A survey on tree edit distance and related problems. Theoretical computer science, 2005. 337(1-3): p. 217-239.

Tymoshenko, K., D. Bonadiman, and A. Moschitti. Learning to Rank Non-Factoid Answers: Comment Selection in Web Forums. in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 2016. ACM.

Goyal, M.M., N. Agrawal, M.K. Sarma, and N.J. Kalita, Comparison Clustering using Cosine and Fuzzy set based Similarity Measures of Text Documents. arXiv preprint arXiv:1505.00168, 2015.

Xu, H., W. Zeng, J. Gui, P. Qu, X. Zhu, and L. Wang. Exploring similarity between academic paper and patent based on Latent Semantic Analysis and Vector Space Model. in Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on. 2015. IEEE.

Jiang, R., S. Kim, R.E. Banchs, and H. Li. Towards improving the performance of Vector Space Model for Chinese Frequently Asked Question Answering. in 2015 International Conference on Asian Language Processing (IALP). 2015. IEEE.

Bao, X., S. Dai, N. Zhang, and C. Yu, Large-Scale Text Similarity Computing with Spark. International Journal of Grid and Distributed Computing, 2016. 9(4): p. 95-100.

Mihalcea, R., C. Corley, and C. Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. in AAAI. 2006.

Acree, B., E. Hansen, J. Jansa, and K. Shoub, Comparing and Evaluating Cosine Similarity Scores, Weighted Cosine Similarity Scores, & Substring Matching. 2016.

Zhang, W.-N., Z.-Y. Ming, Y. Zhang, T. Liu, and T.-S. Chua, Capturing the Semantics of Key Phrases Using Multiple Languages for Question Retrieval. IEEE Transactions on Knowledge and Data Engineering, 2016. 28(4): p. 888-900.

Delen, E., Enhancing a Computer-Based Testing Environment with Optimum Item Response Time. Eurasia Journal of Mathematics, Science & Technology Education, 2015. 11(6): p. 1457-1472.

De Boni, M. and S. Manandhar. The Use of Sentence Similarity as a Semantic Relevance Metric for Question Answering. in New Directions in Question Answering. 2003.

Jiang, J.J. and D.W. Conrath, Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008, 1997.

Lin, D. An information-theoretic definition of similarity. in Icml. 1998. Citeseer.

Wu, Z. and M. Palmer. Verbs semantics and lexical selection. in Proceedings of the 32nd annual meeting on Association for Computational Linguistics. 1994. Association for Computational Linguistics.

Hirst, G. and D. St-Onge, Lexical chains as representations of context for the detection and correction of malapropisms. WordNet: An electronic lexical database, 1998. 305: p. 305-332.

Pedersen, T., S. Patwardhan, and J. Michelizzi. WordNet:: Similarity: measuring the relatedness of concepts. in Demonstration papers at HLT-NAACL 2004. 2004. Association for Computational Linguistics.




How to Cite

F. Ullah, M. Farhan, K. R. Malik, M. M. Iqbal, M. Ibrar, and Z. Rahman, “Statistical Analysis Approach to Determine Language Quality of Students’ Responses Using WorldNet Similarity Techniques”, The Nucleus, vol. 54, no. 4, pp. 258–265, Jan. 2018.