TG-OUT: Temporal outlier patterns detection in Twitter attribute induced graphs

Abstract: Given a node-attributed network of Twitter users, can we capture their posting behavior over time and identify patterns that could probably describe, model or predict their activity? Based on the assumption that such posts are topic-specific, can we identify temporal connectivity patterns that emerge from the use of specific attributes? More challengingly, are there any particular attribute usage patterns which indicate an inherent anomaly for either users or attributes? This work provides solid answers to all the above questions, extending previous work employed on other social networks and attribute types. We propose a pipeline of methods which :
(a) model the temporal evolution of attribute induced graphs to detect peculiar attributes, (b) identify temporal patterns in individual attribute distributions, (c) investigate differences in patterns emerging from bot and/or non-bot accounts to spot outliers and (d) extract tailored sets of features exploitable by machine learning models. More specifically, we model the attribute distributions using the log-Odds ratio, we provide evidence about varying attribute induced subgraph patterns, we identify key differences on the patterns of categorical attributes and create machine learning models for attributes used by bots using a small set of features. Real dataset experimentation on multiple Twitter users activities and attributes, has proven that our method TG-OUT, has effectively identified temporal evolution patterns, outlier irregular behavior norms, and specific features roles for attribute induced subgraphs. Experimental results show that : most of the individual attribute distributions remain stable over time following mostly power laws norm; the temporal evolution of attribute induced graphs obey certain laws and deviations are outliers; we discover that patterns present deviations which depend on the type of accounts which use each attribute; finally, we show that careful selection of only two features which are used to train a simple machine learning algorithm, produces a model which efficiently identifies attributes mainly used by bots.