Skip to main content
Celebrating a decade of Culture Reframed! Read our 2025 Impact Report.
Celebrating a decade of Culture Reframed! Read our 2025 Impact Report. ×

Cybercrimes

LLM-guided classification and extension of the MISP taxonomy for dark web Q&A forums.

 

Open Access: Yes.

Abstract

Dark web question-and-answer (Q&A) forums hosted on the Tor network serve as critical hubs for cyber threats and social interactions, yet their diverse content challenges traditional classification methods. This study leverages large language models (LLMs) to classify and extend the MISP dark web taxonomy for analyzing 2,055 substantive posts from three high-traffic dark web Q&A forums, scraped between July and November 2024. A multi-stage pipeline was employed: initial classification with the Mistral 7B model identified dominant topics (e.g., ‘‘finance-crypto,’’ 10.80%) and motivations (e.g., ‘‘forum,’’ 20.54%), revealing high ambiguity (40.34% for topics, 10.37% for motivations). HDBSCAN clustering of ambiguous posts uncovered novel themes, including ‘‘mental-health’’ and ‘‘confessions-andpersonal-secrets,’’ prompting the extension of the MISP taxonomy with seven new topic categories and one motivation category. Reclassification reduced ambiguity to 2.95% for topics and 1.51% for motivations, enhancing the taxonomy’s coverage of both cybersecurity threats and non-traditional content. Compared to traditional topic modeling, the LLM-guided approach provided superior contextual accuracy, offering a scalable framework for threat intelligence. Despite limitations, such as potential LLM biases and a focus on three forums, this study advances dark web analysis by refining the MISP taxonomy and revealing social dynamics alongside illicit activities.

Relevance

The study scraped data from three dark web Q-and-A forums hosted on the Tor network. The category they defined as “Pornography-illicit-or-illegal” was the 4th most common topic (behind “Finance-crypto,” “Hacking,” and “Mental health”). The 13th most popular category was “Pornography-child-exploitation.” (Culture Reframed note: If we combine the two pornography categories, then they rise to the 3rd most popular topic of conversation, only behind Finance-crypto and Hacking.)

Citation

de-Marcos, L., Ferrer Oliva, M., & Ruiz-Zambrano, A. (2026). LLM-guided classification and extension of the MISP taxonomy for dark web Q&A forums. IEEE Access. https://doi.org/10.1109/ACCESS.2026.3682868