Natural Language-Based Software Engineering

Tool Competition

Introduction

NLP-based approaches and tools have been proposed to improve the efficiency of software engineers, processes, and products, by automatically processing natural language artifacts (issues, emails, commits, etc.).

We believe that the availability of accurate tools is becoming increasingly necessary to improve Software Engineering (SE) processes. Two important processes are (i) issue management and prioritization and (ii) code comment classification where developers have to understand, classify, prioritize, assign, etc. incoming issues and code comments reported by end-users and developers.

We are pleased to announce the third edition of the NLBSE'25 tool competition on code comment classification; an important task in code comment management and prioritization.

Code Comment Classification

The tool competition consists of building and testing a set of three multi-label classification models for class comments in each of target programming languages (Java, Python, Pharo).

We provide a dataset of 14,875 code comment sentences belonging to 19 categories (7 for Java, 5 for Python, and 7 for Pharo), and three baseline classifiers based on Sentence Transformers (SetFit).

You must train, tune, and evaluate your models oun the provided data.

We are looking forward to the solutions that outperform our baseline models.

Detailed instructions about the competition (data, rules, baseline, results, etc.) can be found in our GitHub repository and a Google Colab notebook.

Updates

Compared to the 2024 competition, we have moved from binary classification per comment category to multi-label classification per programming language.

The code comment classification competition is organized by: Pooja Rani (rani@ifi.uzh.ch), Ali Al-Kaswan (a.al-kaswan@tudelft.nl), Nataliia Stulova (nata.stulova@macpaw.com), and Giuseppe Colavito (giuseppe.colavito@uniba.it).

Participation

To participate in the competition, you must train, tune and evaluate your models using the provided training and test sets.

Additionally, you must write a paper (2-4 pages) describing:

The architecture and details of the classification models;
The procedure used to pre-process the data;
The procedure used to tune the classifiers on the training sets;
The results of your classifiers on the test sets;
A link to the code/tool with proper documentation on how to run it and replicate the results.

Submit the paper by the deadline using our submission form.

All submissions must conform to the ICSE'25 formatting and submission instructions and do not need to be double-blinded.

Submission acceptance

Submissions will be evaluated and accepted based on correctness and reproducibility, defined by the following criteria:

Clarity and detail of the paper content;
Availability of the code/tool, including the training/tuning/evaluation pipeline, released as open-source;
Correct training/tuning/evaluation of your code/tool on the provided data;
Correct report of the metrics and results;
Clarity of the code documentation.

We will use a formula to rank the competition submissions and determine a winner, see details in the Google Colab notebook.

The accepted submissions will be published at the workshop proceedings.

Citing relevant work

Please cite if participating:

@article{rani2021,
                title={How to identify class comment types? A multi-language approach for class comment classification},
                author={Rani, Pooja and Panichella, Sebastiano and Leuenberger, Manuel and Di Sorbo, Andrea and Nierstrasz, Oscar},
                journal={Journal of systems and software},
                volume={181},
                pages={111047},
                year={2021},
                publisher={Elsevier}
              }

@INPROCEEDINGS{AlKaswan2023,
                author={Al-Kaswan, Ali and Izadi, Maliheh and Van Deursen, Arie},
                booktitle={2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE)}, 
                title={STACC: Code Comment Classification using SentenceTransformers}, 
                year={2023},
                pages={28-31}
              }

@inproceedings{pascarella2017,
                title={Classifying code comments in Java open-source software systems},
                author={Pascarella, Luca and Bacchelli, Alberto},
                booktitle={2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR)},
                year={2017},
                organization={IEEE}
              }

Important Dates

Paper/tool submission

December 12, 2024

Acceptance notification

January 09, 2025

Camera-ready paper submission

January 30, 2025

All dates are Anywhere on Earth (AoE).

Important Links

Submission form

Google Colab notebook

GitHub repository

NLBSE 2025

The 4th Intl. Workshop on NL-based Software Engineering

Sun 27 April 2025, Ottawa, Ontario, Canada