Tuesday, December 3, 2013

Profiling based on Social Media Behavior

Rise of social media and networks is a hot topic of the decade. People have been flocking to social media sites like never before. Large proportion of traffic flowing on the internet is carrying updates from social networks. These social networks have become platform for everything from showcasing your lunch food to clamoring against barbaric governments. In essence, these digital platforms are gaining deeper roots in society.

If you look at them from an Analyst's perspective, you will find a golden pot of data on social sentiments. This huge collection of user generated opinions, likes, expressions and patterns waiting to be explored and analysed. There is an entire field called social media analytics or simply, data analytics that seeks to tap this golden pot. Data analytics is instrumental for advertising and marketing. The data collected from a user's behavior on social networking sites or e-commerce sites is processed to throw targeted advertising at them. That is, highly customized and tailored adverts based on the customer's interests, age, sex, geographical location and browsing history. This field has been gaining lot of attention these days but its focus has been only one - targeted advertising. Clearly, application of social media analytics in marketing has lot of benefits.

I believe there is another application of analyzing social media behavior - behavioral profiling. And I am not talking in terms profiling-for-targeted-advertising or profiling-for-product-suggestions. This behavioral profiling can be thought of as a unique 'digital footprint of user's behavior'. I will explain what it exactly stands for in the next paragraph. Please note that is is a relatively new idea and I have found very little to no material available about this.

I think every regular user of social media has their own unique way of using it and communicating through it. Each regular user can have different interests. It can be understood by analyzing-
1. the kind of pages they visit
2. the kind of content they like
3. the kind of group they are part of
4. the profiles they visit
5. the kind of content they comment on
6. the people they interact with
Each user can generate different data sets for above mentioned parameters. This largely requires judging the user and finding these traits. 

Also I think users can be distinguished by analyzing their chat logs or messages sent. 'Chatting styles' can vary from person to person. Some characteristics based on which the chats or messages can differ are - 
1. Capitalization used
2. Use of trailing dots after a sentence
3. Short-forms used
4. Use of punctuation and places at which it is used
5. Use of smileys, their type and frequency
6. The frequency of splitting sentences
7. Frequency of common words
8. Spelling mistakes
For example, some users may religiously follow capitalization rule for nouns and some may not use capitalization at all. Sometimes, all the first letters in a sentence of a chat log maybe capital, indicating that the user was probably chatting through a smartphone because most smartphone keypads make first letter capital while typing. Some novice users may use lot of trailing dots and some may put two or three after their sentences. Some people may use a lot of short forms while chatting, some may use them only for few words. A few may always use one short form for a particular word which almost becomes a trademark to their chats. People can also be distinguished by the amount of punctuation and smileys they use. Some may put an exclamation mark after many of their messages, others may heavily use their favorite smiley. Some people may type long chats and then hit 'send' while some others may hit 'send' after every few words. Some users may involuntarily repeat a word or phrase many times. Someone may spell a word wrongly and it may be exhibited in many of their chats. Some may type very fast and some might be very slow. If you are a regular user, I am sure you will find at least one of these traits in one of your friends.

The point I am trying to get across is, all these chatting traits can be recorded to form a unique identity of each user. It is very easy to fake the display identity (i.e. name) on the social networking sites, but still these chatting styles remain unique to the human. These traits cannot be changed unless the user is conscious and aware about this type of monitoring. This brings us to the applications of such analysis...

This type of analysis is of very little use to the marketers and advertisers but it can be very useful for intelligence agencies. As I said, criminals and terrorists often spoof their identities but may be unaware of these subtle details that go unnoticed in their chats. Intelligence and government can exploit this to identify suspects. They can try matching these behavioral traits of a known criminal with that of a suspect that is being monitored and listened onto.

Again, the application is limited to intelligence agencies in detecting fraud and criminals but I believe it can be immensely useful to them. Also this kind of intelligence gathering requires access to private data (i.e. chat logs, user behavior info etc) which is not revealed to public but a federal agency may access.

Please note that this is a relatively new idea on which I found no documentation available. I have tried to explain it the best I can. Feel free to contact me if you have anything to say. Also I will be glad to receive some pointers if there is any work being done on this.

Thank you :)