Examination of the evolution of language among Dark Web users

Description

This project uses Dark Web discussion board data to examine changes in language surrounding crime/offending over time. This project employs natural language processing.

Duration

Total project length: 175 hours

Task ideas

Extract and preprocess textual data from online forum posts for NLP analysis.
Develop and train NLP models, such as LSTM or BERT, to understand the context, sentiment, and thematic elements of forum discussions.

Expected results

Create a processed dataset of textual content from online forum posts, ready for NLP tasks.
Train an NLP model capable of identifying key themes, sentiments, or user engagement patterns within forum discussions, based on the linguistic features of the posts.
If time allows: Analyze the linguistic relationships and communication patterns between forum participants to uncover insights into community dynamics and discourse trends.

Requirements

Ability to code in R or Python; understanding of machine learning and/or natural language processing.

Project difficulty level

Intermediate

Mentors

Jane Daquin (University of Alabama)

Please DO NOT contact mentors directly by email. Instead, please email human-ai@cern.ch with Project Title and include your CV and test results. The mentors will then get in touch with you.

Corresponding Project

ISSR

Participating Organizations

Alabama