|Title:||NUS MULTI-SOURCE Social DATASET (NUS-MSS)|
CHUA TAT SENG
|NUS Contact:||CHUA TAT SENG|
|External Contact:||Aleksandr Farseev|
Multiple source integration
User profile learning
With the rapid growth of multi-source social media resources, comprehensive user profile learning serves as an actual backbone in various application domains. Such user profile components as user mobility and user demography describe social media users from different views. However, there was no much research done on multi-source multimodal user profile learning. Moreover, there is not any benchmark dataset released towards user mobility and demographic profiling.
Here we introduce a multi-source dataset created by Lab for Media Search in National University of Singapore. The dataset includes six types of features extracted from these data, including location semantics features, location semantics LDA-based features, text LDA-based features, text LIWC features, sentiment and writing style features, ImageNet image concept features; and ground-truth data from three geographical regions: Singapore, New York, and London. In order to cover the most popular data modalities (visual, textual and location data), we incorporate following social media sources: Foursquare (the largest location based social network) as a location data source; Twitter (microblog service with the biggest English-speaking users base) as a textual data source; Instagram (The most popular photo sharing service) as a visual data source and Facebook as a ground truth source. We also provide the baseline results for user Demographic profiling by learning from the text, image and location data using the ensemble model. The benchmark results show that it is possible to learn models from these data aiming to improve user profile learning. Please check more details about user profile learning and features description from slides.
Our dataset can be used for both descriptive and prescriptive research. That is to say, we do not intend to constraint future research on user profile learning, since the available ground truth provides possibility to tackle other contemporary problems. We list some potential research topics that can be conducted on our released dataset:
For more details of this dataset and to reuse this dataset, please visit http://nusmultisource.azurewebsites.net/
|Citation:||When using this data, please cite the original publication and also the dataset.|
|Appears in Collections:||Staff Dataset|
Show full item record
Files in This Item:
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.