Tools
Below are services and tools (including datasets and code) offered by the lab. Please reach out to sharathg at seas.upenn.edu if you’re interested in learning more about our services. Researchers may download datasets and code from the links below.
Services
Social media data from a variety of sources (including but not limited to Facebook, Twitter, Google search history, and Reddit) is collected and analyzed along with survey data.
Cell phone data collection services for digital phenotyping and personal sensing research projects using the open-source AWARE framework.
Dataset
Mental health metrics and number of symptom mentions on Twitter are measured daily using pre-trained machine learning models applied to a random 1% Twitter data.
Twitter data of users with self-disclosed ADHD/ADD.
Domain-related dataset with self-reported age and gender, and text-predicted depression and anxiety scores for users.
This data set contains Twitter user ids with their text-predicted Big-Five personality scores.
This dataset enables research in exploration of the influence of personality (Five-Factor Model) and cultural traits (Hofstede Model) on multimedia-evoked positive and negative affects, perceived quality and enjoyment.
A set of 40 covid-related topics (per week) derived from Twitter starting on March 12, 2020.
Wbbyyr (WeiBo-BY-YeaR) is a set of 70 Word Embedding models trained on Sina Weibo posts.
Code
Ongoing list of useful lexica in Computational Social Science
Matlab implementation of Convolutional Deep Belief Networks
Downloads public photos (tagged favorite) of Flickr users, whose usenames are given in a text document