Article de Périodique
What's new on the market? Combining internet traces and pretrained language models to recognize emerging drug names (2026)
Auteur(s) :
GRENIER, G. ;
CHAREST, M. ;
ESSEIVA, P. ;
ROSSY, Q.
Année
2026
Page(s) :
art. 112958
Langue(s) :
Anglais
Refs biblio. :
61
Domaine :
Drogues illicites
Discipline :
MAR (Marchés / Markets)
Thésaurus mots-clés
DROGUES DE SYNTHESE
;
PHENOMENE EMERGENT
;
INTERNET
;
MARCHE DE LA DROGUE
;
MODELE
;
TECHNOLOGIE
;
SURVEILLANCE EPIDEMIOLOGIQUE
;
LANGAGE
;
FORUM DE DISCUSSION
;
EFFICACITE
Résumé :
Posts and comments published by users in online forum discussions provide valuable insights and might contain the earliest traces of new substances emerging on the market. However, the systematic recognition of emerging new psychoactive substances (NPS) remains an important challenge for both public health agencies and law enforcement authorities. Large volumes of messages published by users, combined with the unstructured nature of text, complicate the retrieval of relevant information like drug name mentions. Common approaches based on keywords matching (e.g., regular expressions) limit current monitoring systems, as they can only detect known terms. Consequently, new or previously unseen drug names may remain undetected, leaving novel NPS under active discussion potentially overlooked. To address this challenge, we introduce DrugRecon, a RoBERTa based pretrained language model specifically fine-tuned for drug name recognition. The model was trained and evaluated on a manually annotated corpus of posts and comments collected from drug-related sections of three online forums (Drugs-Forum, Dread, and Reddit). A data augmentation strategy was applied during fine-tuning to improve generalization to previously unseen drug names. To demonstrate its applicability in real-world settings, DrugRecon was applied to posts and comments published between April and June 2025 across the three forums. The model successfully recognized drug names absent from existing lexicons, highlighting its capacity to detect emerging terminology. By combining automatic recognition with expert validation, 12 names were classified as denoting potential novel NPS. This proactive monitoring approach not only guides further investigations, but also strengthens preparedness for when these substances eventually appear in drug-checking services, police seizures, or toxicological reports. [Author's abstract]
Highlights:
Introducing a pretrained language model fine-tuned for drug name recognition.
Data augmentation improves recognition of unseen drug names.
Applying the model in a monitoring framework successfully detects novel drug names.
Expands existing lexicons and supports proactive actions.
Highlights:
Introducing a pretrained language model fine-tuned for drug name recognition.
Data augmentation improves recognition of unseen drug names.
Applying the model in a monitoring framework successfully detects novel drug names.
Expands existing lexicons and supports proactive actions.
Affiliation :
Ecole des Sciences Criminelles, University of Lausanne, Lausanne, Switzerland
Historique