NLP-based approaches and tools have been proposed to improve the efficiency of software engineers, processes, and products, by automatically processing natural language artifacts (issues, emails, commits, etc.).
We believe that the availability of accurate tools is becoming increasingly necessary to improve Software Engineering (SE) processes. Two important processes are (i) issue management and prioritization and (ii) code comment classification where developers have to understand, classify, prioritize, assign, etc. incoming issues and code comments reported by end-users and developers.
We are pleased to announce the third edition of the NLBSE'25 tool competition on code comment classification; an important task in code comment management and prioritization.
The tool competition consists of building and testing a set of three multi-label classification models for class comments in each of target programming languages (Java, Python, Pharo).
We provide a dataset of 14,875 code comment sentences belonging to 19 categories (7 for Java, 5 for Python, and 7 for Pharo), and three baseline classifiers based on Sentence Transformers (SetFit).
You must train, tune, and evaluate your models oun the provided data.
We are looking forward to the solutions that outperform our baseline models.
Detailed instructions about the competition (data, rules, baseline, results, etc.) can be found in our GitHub repository and a Google Colab notebook.
Compared to the 2024 competition, we have moved from binary classification per comment category to multi-label classification per programming language.
The code comment classification competition is organized by: Pooja Rani (rani@ifi.uzh.ch), Ali Al-Kaswan (a.al-kaswan@tudelft.nl), Nataliia Stulova (nata.stulova@macpaw.com), and Giuseppe Colavito (giuseppe.colavito@uniba.it).
To participate in the competition, you must train, tune and evaluate your models using the provided training and test sets.
Additionally, you must write a paper (2-4 pages) describing:
Submit the paper by the deadline using our submission form.
All submissions must conform to the ICSE'25 formatting and submission instructions and do not need to be double-blinded.
Submissions will be evaluated and accepted based on correctness and reproducibility, defined by the following criteria:
We will use a formula to rank the competition submissions and determine a winner, see details in the Google Colab notebook.
The accepted submissions will be published at the workshop proceedings.
Please cite if participating:
@article{rani2021,
title={How to identify class comment types? A multi-language approach for class comment classification},
author={Rani, Pooja and Panichella, Sebastiano and Leuenberger, Manuel and Di Sorbo, Andrea and Nierstrasz, Oscar},
journal={Journal of systems and software},
volume={181},
pages={111047},
year={2021},
publisher={Elsevier}
}
@INPROCEEDINGS{AlKaswan2023,
author={Al-Kaswan, Ali and Izadi, Maliheh and Van Deursen, Arie},
booktitle={2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE)},
title={STACC: Code Comment Classification using SentenceTransformers},
year={2023},
pages={28-31}
}
@inproceedings{pascarella2017,
title={Classifying code comments in Java open-source software systems},
author={Pascarella, Luca and Bacchelli, Alberto},
booktitle={2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR)},
year={2017},
organization={IEEE}
}
December 12, 2024
January 09, 2025
January 30, 2025
All dates are Anywhere on Earth (AoE).