2014年11月17日星期一

A Review of Recommender System

When you step into SASA and ask the sales for a foundation make-up, it is quite possible for the sales to suggest you with another make-up product, for instance, the blusher or BB cream. This is a very typical recommender case in our daily life.

With the development of internet services, all kinds of e-commerce platforms could not wait to introduce the recommender system into their online shops. On one hand, many statistic reports have shown that the recommender can be really helpful in raising the sales volume. On the other hand, as an average customer exposed to over hundreds or even thousands of choices, the recommender system is a time-saving tool for me when shopping.

Since in lecture 9&10 of social media analytics, many algorithms for recommender have been introduced, in my fourth post (the last post of this course), I am trying to compare the differences between user-based collaborative filtering (UserCF) and item-based collaborative filtering (ItemCF).

The main thought of UserCF is to find out the users with the similar opinion or behaviors towards the same items first. The items liked by the user group will be recommended to this user next. For example, user A is interested in product X1, X2 and Y1 while user B is interested in X1, X3 and Y2. According to UserCF algorithm, user A and user B are regarded to have similar preferences, therefore the system may recommend Y1 to user B while recommend Y2 to user A. Figure-1 could be an application of UserCF with a large possibility.


Figure-1

On contrary, the idea of ItemCF is to find out the similar items first. When the user is selecting one item, the similar item can be recommended. The judgment criterion is of the similarity is mainly by the amount of users who hold the same attitude towards the two items. For instance, if the item C and item D are liked by a large amount of users, the system then will regards them as similar items. When user is looking at item C, item D will be recommended to him or her. The related recommending (相关推荐) could be implemented by ItemCF from my understanding.

Figure-2

So, will you agree or disagree with me?


2014年10月17日星期五

Let's Start Analyzing with NodeXL

In the sixth lecture of Social Media Analysis, we've learnt that social network can be viewed as a graph that describes the relations among a group of people. In the Graphs, vertices are used to represent an individual while edges are the concept to identify the direct connections between vertices.

During the class, Prof. Chan introduced a useful tool called NodeXL, which is a free, open-source template for Microsoft Excel 2007,2010 and 2013 that makes it easy to explore network graphs.

After the class, I used NodeXL to help me find out the twitters who send tweets with the keyword "CUHK" from the location of Hong Kong in the past one week and their netwok information as well.

Here is the graph I output by using NodeXL at the first place, we can see there are too many nodes (vertices) and lines (edges) on the graph and it is difficult to observe any information from it.



Then I adopt the Groups function to group the vertices by clusters and layout the graph by each cluster. 


Now can you find out the centrality, in-degree/out-degree and the betweenness/closeness information from it ? 

As a matter of fact, this is the first time for me to use NodeXL. What surprised me is that NodeXL integrates the API interface with some popular social network website already.  

If you find it interesting , you can check this tutorial video online to learn more about NodeXL (http://www.youtube.com/watch?v=PC-PgkhpsNc) to learn more about NodeXL :)


2014年10月6日星期一

A Sentiment Analysis Case Study

Sentiment analysis (also known as opinion mining) refers to the use of natural language processingtext analysis and computational linguistics to identify and extract subjective information in source materials[1]. 


In the forth lecture of Social Media Analysis, professor Chan has introduced to us the theory of sentiment analysis and several cases as well.  Actually, sentiment analysis is not totally new to us. It is already applied to the consumer websites and used by us almost every day. When we are purchasing goods on taobao.com or amazon.com, we can see ratings of the good from many aspects. When we are choosing restaurant online, we can make our decision by counting how many stars the restaurant owns. Then the next question put forward is how to conduct the sentiment analysis?  


Here I found an interesting case to analyze Smartphone related twitter Reviews by using opinion mining techniques. Why not take a brief look at the case and learn how to conduct a real Sentiment analysis or opinion mining. 

Figure-1

Figure-1 illustrates the opinion mining procedure on this Smartphone related twitter
reviews.It takes four steps all together.[3]

The first step is to retrieve the tweets which were containing information for Galaxy
S4, IPhone 5 and Blackberry Q10 from a specified period of time. Those tweets are then
saved to a local database by using Twitter Open API.

The second step is to classify those tweets into six categories (Display,Network, AP,
Size, Camera and Audio), which are pre-defines as six important attributes for the
Smartphone.

The third step is to find out the polarity of the opinions towards each attributes. In order
to make it simpler, only positive and negative values are counted. In this case, an
opinion mining analysis program called LIWC (Linguistic Inquiry and Word Count)[2] is used
to normalize the degree of the polarity.

The fourth step is to display and analyze the opinion mining result. From the output
shown in figure-2 , we can find out which type of Smartphone has the highest rating over
some certain period while figure-3 indicates more detailed opinion towards each attributes

Figure-2
Figure-3

Well, this is only a very simple opinion mining case I found on the website sprinter.com. Based on this case and what we’ve learnt from the lectures, there are still some questions we can think about one or more steps further. Here I list some of the questions I think we can give a second thought and those who visit my homepage are welcomed to give your own opinions.

In the second step of the above case, what kind of method or program can be adopted to classify the twitter reviews?  

In the third step instead of using the paid program LIWC , is there any other methods can be
used to normalize the opinion ?

When displaying the result of the analysis, is there any tool or program can help to show the
result more direct or maybe beautifully ?

[reference]
[1]  http://en.wikipedia.org/wiki/Sentiment_analysis
[3] http://link.springer.com/chapter/10.1007/978-3-319-05503-9_20
[2] http://www.liwc.net

2014年9月21日星期日

A glimpse at the Application of Social Media Analysis

After getting enrolled into the course of Social Media Analysis (SMA), the first question I want to clarify by myself is why analyzing the social media network.

According to the illustration from WIKIPEDIA, A social networking service is a platform to build social networks or social relations among people who share interests, activities, backgrounds or real-life connections. The social network platform has blossomed everywhere since the emergence of internet. According to the statistic report revealed by tech.qq.com on July 25th, 2014, the total subscribers on Facebook reached up to as high as 2.2 billion, which represents nearly 30% of the global population. Tons of conversations, comments and replies that within them could actually implicate a great value for the business market given all these contents are processed properly.


Figure 1 Various Social Media Network

With the further question of how can SMA help the social media network platform, I did a little research on the internet and found a business instance as the application of SMA. In this business scenario, MyMadrid is an mobile APP providing a unique interface to the online and mobile fan community of the top football club Real Madrid C.F. A SMA & SNA (Social Network Analysis) company was hired to help MyMadrid for an in-depth statistical view on its user behavior. Eventually the following information was output via the analyzing techniques.
-          Who is talking to whom?
-          What do they say?
-     The top influencers surrounding the social object “MyMadrid”. The top 4         influencers can be seen as the brand’s ambassadors as the total network of these 4 people has an immediate reach of 5420 twitter accounts.

Apparently, with the above statistic information, the APP can understand its user much better in many dimensions. Maybe different marketing campaigns could be deployed based on this deeper understanding in order to attract more users hereafter.

Thus, from my point of view, social network analysis provides the applicable approach enabling us to leverage all these contents and extract the valuables from social network. In the lecture 2&3, Professor Chan has explained various useful methods & principals to do the analysis. And I believe this course will lead us into a fantastic journey of the SMA world.

Reference
[1] WIKIPEDIA, http://en.wikipedia.org/wiki/Social_network
[2] SNA focus to MyMadrid - Real Madrid official Mobile Community for its Fans, www.social-3.com