Siamese neural networks on the trail of similarity in bugs in 5G mobile network base stations
13 May 2022
To improve the R&D process, by reducing duplicated bug tickets, we used an idea of composing BERT encoder as Siamese network to create a system for finding similar existing tickets. We found that only 2% of the entire dataset are known duplicates or similar reports that have been flagged as such by the fault coordinators. Therefore, we proposed several different methods of generating artificial ticket pairs, to augment the training set. We applied additional filters to overcome the problem of false similarity - similar pairs were accepted only if they have the same categorical values (software version, the same product, the same hardware parts). Two phases of training were conducted. The first, normal, showed that only and approximate 9% pairs were correctly identified as certainly similar. Only 48% of the test samples are found to be pairs of similar tickets. The problem is difficult and concerns a unique area, so we do not expect results like those of original works by other researchers. However, fine-tuning improved that result up to 81% of positively recognised ticket pairs. At the end, only 17% of similar pairs were wrongly identified, making the proof of concept viable for further work and improvement.