Description
This project uses Dark Web discussion board data to examine changes in language surrounding crime/offending over time. This project employs natural language processing.
Duration
Total project length: 175 hours
Task ideas
- Extract and preprocess textual data from online forum posts for NLP analysis.
- Develop and train NLP models, such as LSTM or BERT, to understand the context, sentiment, and thematic elements of forum discussions.
Expected results
- Create a processed dataset of textual content from online forum posts, ready for NLP tasks.
- Train an NLP model capable of identifying key themes, sentiments, or user engagement patterns within forum discussions, based on the linguistic features of the posts.
- If time allows: Analyze the linguistic relationships and communication patterns between forum participants to uncover insights into community dynamics and discourse trends.
Requirements
Ability to code in R or Python; understanding of machine learning and/or natural language processing.
Project difficulty level
Intermediate
Mentors
Please DO NOT contact mentors directly by email. Instead, please email human-ai@cern.ch with Project Title and include your CV and test results. The mentors will then get in touch with you.
Corresponding Project
Participating Organizations