<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Highlighting Part of the News Feed in Social Networks</title>
	<atom:link href="http://zhenhua2012.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://zhenhua2012.wordpress.com</link>
	<description>Master Thesis Blog of Zhenhua</description>
	<lastBuildDate>Mon, 20 Feb 2012 17:13:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='zhenhua2012.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Highlighting Part of the News Feed in Social Networks</title>
		<link>http://zhenhua2012.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://zhenhua2012.wordpress.com/osd.xml" title="Highlighting Part of the News Feed in Social Networks" />
	<atom:link rel='hub' href='http://zhenhua2012.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Problems with Opening Bank Accounts in Sweden</title>
		<link>http://zhenhua2012.wordpress.com/2012/02/20/problems-with-opening-bank-accounts-in-sweden/</link>
		<comments>http://zhenhua2012.wordpress.com/2012/02/20/problems-with-opening-bank-accounts-in-sweden/#comments</comments>
		<pubDate>Mon, 20 Feb 2012 14:52:14 +0000</pubDate>
		<dc:creator>zhenhua2012</dc:creator>
				<category><![CDATA[Progress]]></category>

		<guid isPermaLink="false">http://zhenhua2012.wordpress.com/?p=81</guid>
		<description><![CDATA[It is really hard to open a bank account in Sweden. SEB, probably the biggest national bank, does not provide short-term accounts, which means one is not eligible for an account if he/she is staying for less than 6 months &#8230; <a href="http://zhenhua2012.wordpress.com/2012/02/20/problems-with-opening-bank-accounts-in-sweden/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=81&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>It is really hard to open a bank account in Sweden. SEB, probably the biggest national bank, does not provide short-term accounts, which means one is not eligible for an account if he/she is staying for less than 6 months in Sweden. My only hope is that Nordea, a Scandinavian bank head-quartered in  Sweden, will kindly let me open one after I get a tax number from the tax office. According to the tax office, hopefully I will receive the number in two weeks.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zhenhua2012.wordpress.com/81/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zhenhua2012.wordpress.com/81/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zhenhua2012.wordpress.com/81/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zhenhua2012.wordpress.com/81/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zhenhua2012.wordpress.com/81/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zhenhua2012.wordpress.com/81/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zhenhua2012.wordpress.com/81/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zhenhua2012.wordpress.com/81/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zhenhua2012.wordpress.com/81/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zhenhua2012.wordpress.com/81/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zhenhua2012.wordpress.com/81/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zhenhua2012.wordpress.com/81/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zhenhua2012.wordpress.com/81/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zhenhua2012.wordpress.com/81/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=81&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zhenhua2012.wordpress.com/2012/02/20/problems-with-opening-bank-accounts-in-sweden/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/96eed133c977eaea7cfab08fafb56d7d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zhenhua2012</media:title>
		</media:content>
	</item>
		<item>
		<title>Writing the Draft Proposal</title>
		<link>http://zhenhua2012.wordpress.com/2012/02/20/draftproposal/</link>
		<comments>http://zhenhua2012.wordpress.com/2012/02/20/draftproposal/#comments</comments>
		<pubDate>Mon, 20 Feb 2012 14:46:26 +0000</pubDate>
		<dc:creator>zhenhua2012</dc:creator>
				<category><![CDATA[Progress]]></category>

		<guid isPermaLink="false">http://zhenhua2012.wordpress.com/?p=79</guid>
		<description><![CDATA[Writing the draft proposal is a mandatory step for KTH thesis students. It is a good opportunity to weave papers and thoughts together after 4 weeks of literature study. This How to Write a Proposal provides some useful guidelines. I hope &#8230; <a href="http://zhenhua2012.wordpress.com/2012/02/20/draftproposal/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=79&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Writing the draft proposal is a mandatory step for KTH thesis students. It is a good opportunity to weave papers and thoughts together after 4 weeks of literature study. This <a title="How to Write a Proposal" href="http://filebox.vt.edu/users/nussbaum/subpages/ProposalHowTo.pdf" target="_blank">How to Write a Proposal</a> provides some useful guidelines. I hope I can finish it by Wednesday.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zhenhua2012.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zhenhua2012.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zhenhua2012.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zhenhua2012.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zhenhua2012.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zhenhua2012.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zhenhua2012.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zhenhua2012.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zhenhua2012.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zhenhua2012.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zhenhua2012.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zhenhua2012.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zhenhua2012.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zhenhua2012.wordpress.com/79/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=79&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zhenhua2012.wordpress.com/2012/02/20/draftproposal/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/96eed133c977eaea7cfab08fafb56d7d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zhenhua2012</media:title>
		</media:content>
	</item>
		<item>
		<title>Review of &#8220;The Role of Social Networks in Information Diffusion&#8221;</title>
		<link>http://zhenhua2012.wordpress.com/2012/02/02/review-of-the-role-of-social-networks-in-information-diffusion/</link>
		<comments>http://zhenhua2012.wordpress.com/2012/02/02/review-of-the-role-of-social-networks-in-information-diffusion/#comments</comments>
		<pubDate>Thu, 02 Feb 2012 07:47:04 +0000</pubDate>
		<dc:creator>zhenhua2012</dc:creator>
				<category><![CDATA[Paper Reviews]]></category>
		<category><![CDATA[Sociology]]></category>

		<guid isPermaLink="false">http://zhenhua2012.wordpress.com/?p=56</guid>
		<description><![CDATA[(This paper is from a completely different field. Its topic is about information dissemination, which I think belongs to the field of information theory and sociology. The conclusions are really interesting and would be very useful for my thesis. However, &#8230; <a href="http://zhenhua2012.wordpress.com/2012/02/02/review-of-the-role-of-social-networks-in-information-diffusion/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=56&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>(This paper is from a completely different field. Its topic is about information dissemination, which I think belongs to the field of information theory and sociology. The conclusions are really interesting and would be very useful for my thesis. However, the one thing I do not like about the paper is that the authors do not include enough details of how the probabilities are computed, which makes the paper harder to read than it could be. Because of my lack of related background knowledge, I will write this review in an informal way to make it easier to read for my future self.)</p>
<p><strong>Motivation</strong></p>
<p>We all know how to share a link on Facebook: just go to our walls and post it there. Whenever we do this, we are potentially affecting our friends. They are exposed to this link and may be affected by the content. The influence can be far more than that, because the link can be re-shared by those friends who think it is interesting/important, which exposes the link to a broader audience.</p>
<p>This kind of information dissemination happens not only on-line on Facebook, Twitter, etc., but also off-line when we meet others in person. It plays an important role not only in social life[1], but also in other aspects such as psychological development[2] and adoption of political opinions.</p>
<p>Due to various difficulties in measuring social influence in real life, past studies are either based on observational data which is prone to biases, or limited to a population of small scale which confines the generality of the conclusions. The authors of this paper[3] are lucky in that they are empowered to have access to Facebook data and tweak some Facebook features to conduct well-controlled experiments with a large population, thus overcoming the difficulties mentioned above.</p>
<p><strong>The Idea Behind the Controlled Experiment</strong></p>
<p>The focus of this paper is &#8220;sharing&#8221;, a crucial step in information dissemination. The authors present a fairly good analysis of the sources of information we share in the context of Facebook: It is either what we read on other websites (then we share it by a direct post on our walls), or what we read from our News Feed on Facebook (then we share it by a click on the re-share button).</p>
<p>It can be imagined that people with similar characteristics tend to associate with each other (what is called &#8220;homophily&#8221; in sociology). Based on homophily, the authors infer that two close friends very likely will have overlapping information sources. Thus the following could happen: Suppose A and B are close friends, and they are both potentially interested in a link L. A first reads L and afterwards shares it on Facebook, which leads to B sees L on Facebook and then re-shares L. In this case, B helps in the dissemination of L with the influence from A. But what if A&#8217;s sharing of L is not seen by B? B could as well read L somewhere else and share L independently on Facebook. In this second case, B disseminates L without the  influence from A. Now it is obvious that exposure to the sharing action of A will increase the possibility of B&#8217;s sharing action, but how much is the increase? The authors manage to quantify this increase by conducting a controlled experiment on Facebook.</p>
<p><strong>The Controlled Experiment</strong></p>
<p>The experiment focuses on URL sharing in Facebook. Normally, a URL shared by our friends will appear on our walls (denoted as the Feed condition). In the experiment, the authors tweak Facebook a bit so that a small portion of those URLs will not appear on our walls (denoted as the No Feed condition). Then they measure two things: 1) how many times a URL is re-shared by friends; 2)how many times a URL is independently shared by friends even it is blocked to the friends&#8217; walls. By comparing these two measurements, we can tell how the influence of seeing a friend sharing a URL increases the possibility of we ourselves sharing the same URL.</p>
<p>Obviously there are many factors that can affect data quality, so the authors spend some effort on ensuring validity of the data. For details (can be useful for my thesis), refer to section 3.2 and 3.4 of the paper.</p>
<p><strong>Analysis and Conclusions</strong></p>
<p>The experiment seems simple, but by digging the data, the authors manage to find many interesting conclusions. The following ones are quite understandable:</p>
<ol>
<li>Possibility of Sharing in the Feed condition is 0.191% compared with 0.025% in the No Feed condition. This shows that exposure to sharing actions of our friends significantly increases the possibility of our sharing actions (7.37 times).</li>
<li>By clustering on time, the authors find that the actions of sharing of the same link typically happen within a short time frame, both in the Feed and No Feed condition, although in the Feed condition sharing is faster (a mean of 6 hours) than that in the No Feed condition (a mean of 20 hours).</li>
<li>The more friends share the link, the higher possibility one shares the link too.</li>
</ol>
<div>The next conclusions related to tie strength are my favourite. We all have many friends on Facebook, but not all of them are our close friends, and actually a big portion of them may just be acquaintances. A strong tie refers to the relationship to a closer friend who you comment on his/her post or appear in the same photo with him/her.</div>
<div>
<ol>
<li>A link shared by a strong tie is more likely to be shared by us. In the Feed condition, it is 2.83 times. In the No Feed condition, it is 3.84 times. The bigger increase in the No Feed condition deserves a further investigation which leads to the most interesting conclusion of the paper:</li>
<li><strong>Information from a weak tie is more novel</strong>. This is quite hard to prove, but intuitively we can think that weak ties are weak because they are interested in different things from us (opposite of homophily), and that&#8217;s why the information from weak ties are more novel. To prove it from the data is more complicated, but let me try. Note that if one shares a link in the No Feed condition, this is because he/she sees the link elsewhere than Facebook. Thus the increase of possibility of sharing from No Feed to Feed has another meaning:  If the increase is small, that is because one can easily access the same URL elsewhere; While if the increase is high, one is less likely to have that access, and thus the more novelty the URL bears to him/her. From the data, we can observe that URLs from weak ties enjoy a much higher relative increase from No Feed to Feed. Thus those URLs are less likely to be seen elsewhere and bears more novelty.</li>
<li>Due to the abundance of weak ties, their collective power (instead of the strong ties) is responsible for most of the information dissemination.</li>
</ol>
<div><strong>Strong Points</strong></div>
<div>
<ol>
<li>The experiment is well-controlled and collects large quantity of data (over 2 millions users).</li>
<li>The authors applies clustering in the analysis and obtains convincing results which will otherwise not been easily seen.</li>
<li>Introducing tie strength as a parameter in the analysis is a very good decision and the conclusions are enlightening in understanding information dissemination.</li>
</ol>
<div><strong>Weak Points</strong></div>
<div>
<ol>
<li>As said in the beginning, how the probabilities are computed is not shown in the paper, which makes the paper not so easy to read.</li>
<li>Tweaking Facebook may lead to some legal issues such as privacy violation.</li>
</ol>
<div><strong>Related Articles</strong></div>
<div><a title="The End of the Echo Chamber" href="http://www.slate.com/articles/technology/technology/2012/01/online_echo_chambers_a_study_of_250_million_facebook_users_reveals_the_web_isn_t_as_polarized_as_we_thought_.html" target="_blank">This article</a> argues that Facebook is not an Echo Chamber, based on the results of this paper. I have read it and I do not quite understand how the author reaches this radical conclusion.</div>
<div><a title="Rethinking Information Diversity in Networks" href="http://www.facebook.com/notes/facebook-data-team/rethinking-information-diversity-in-networks/10150503499618859" target="_blank">This article</a> from the first author of the paper (a researcher in the Facebook Data Team) presents the main results and is an easier and shorter version of the paper. It also includes nicer graphs about strong ties and weak ties.</div>
</div>
</div>
<div></div>
</div>
<p>[1] T. Pempek, Y. Yermolayeva, and S. Calvert, “College students’ social networking experiences on Facebook,” Journal of Applied Developmental Psychology, vol. 30, no. 3, pp. 227-238, May 2009.<br />
[2] N. B. Ellison, C. Steinfield, and C. Lampe, “The Benefits of Facebook ‘Friends:’ Social Capital and College Students’ Use of Online Social Network Sites,” Journal of Computer‐Mediated Communication, vol. 12, no. 4, pp. 1143-1168, Jul. 2007.<br />
[3] E. Bakshy, I. Rosenn, C. Marlow, and L. Adamic, “The Role of Social Networks in Information Diffusion,” arXiv:1201.4145, Jan. 2012.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zhenhua2012.wordpress.com/56/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zhenhua2012.wordpress.com/56/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zhenhua2012.wordpress.com/56/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zhenhua2012.wordpress.com/56/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zhenhua2012.wordpress.com/56/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zhenhua2012.wordpress.com/56/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zhenhua2012.wordpress.com/56/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zhenhua2012.wordpress.com/56/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zhenhua2012.wordpress.com/56/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zhenhua2012.wordpress.com/56/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zhenhua2012.wordpress.com/56/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zhenhua2012.wordpress.com/56/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zhenhua2012.wordpress.com/56/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zhenhua2012.wordpress.com/56/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=56&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zhenhua2012.wordpress.com/2012/02/02/review-of-the-role-of-social-networks-in-information-diffusion/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/96eed133c977eaea7cfab08fafb56d7d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zhenhua2012</media:title>
		</media:content>
	</item>
		<item>
		<title>Learning &#8220;Machine Learning&#8221;</title>
		<link>http://zhenhua2012.wordpress.com/2012/01/29/learning-machine-learning/</link>
		<comments>http://zhenhua2012.wordpress.com/2012/01/29/learning-machine-learning/#comments</comments>
		<pubDate>Sun, 29 Jan 2012 15:06:29 +0000</pubDate>
		<dc:creator>zhenhua2012</dc:creator>
				<category><![CDATA[Progress]]></category>

		<guid isPermaLink="false">http://zhenhua2012.wordpress.com/?p=52</guid>
		<description><![CDATA[Since my topic is more related with Machine Learning, Šarūnas recommended me a free on-line course by Stanford professor Andrew Ng. I have watched a few chapters and it is a really good learning experience. Two things I like most are: &#8230; <a href="http://zhenhua2012.wordpress.com/2012/01/29/learning-machine-learning/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=52&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Since my topic is more related with Machine Learning, Šarūnas recommended me a free on-line <a title="Machine Learning Stanford" href="http://www.ml-class.org/course/class/index" target="_blank">course</a> by Stanford professor <a title="Andrew Ng's Home Page" href="http://www.cs.stanford.edu/people/ang/" target="_blank">Andrew Ng</a>. I have watched a few chapters and it is a really good learning experience. Two things I like most are:</p>
<ol>
<li>Immediate Feedback. All the exercises are graded immediately and automatically after submission, and all the answers are with good explanations that help clear questions and doubts.</li>
<li>Well-tailored Video Clips. Each video clip focuses on one topic. The length of a video clip depends only on the amount of content. Some are only 5 minutes while some can be 15 minutes. This is really much better compared with other on-line courses that are just videos of normal 45-minute classes happened off-line.</li>
</ol>
<p>Another interesting thing  is that you can choose to play the videos with 1.2x or 1.5x speed up (imagine how the lips would look like) to save time, though it is not very useful for me.</p>
<p>I hope I can finish the whole course in February. Several parts look quite interesting. <a title="GNU Octave" href="http://www.gnu.org/software/octave/" target="_blank">Octave</a> seems useful for trying out new ideas rapidly. The logistic regression part would be very helpful for evaluation.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zhenhua2012.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zhenhua2012.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zhenhua2012.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zhenhua2012.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zhenhua2012.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zhenhua2012.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zhenhua2012.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zhenhua2012.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zhenhua2012.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zhenhua2012.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zhenhua2012.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zhenhua2012.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zhenhua2012.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zhenhua2012.wordpress.com/52/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=52&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zhenhua2012.wordpress.com/2012/01/29/learning-machine-learning/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/96eed133c977eaea7cfab08fafb56d7d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zhenhua2012</media:title>
		</media:content>
	</item>
		<item>
		<title>Finally, Official Start of My Thesis at Salesforce.com</title>
		<link>http://zhenhua2012.wordpress.com/2012/01/21/finally-official-start-of-my-thesis-at-salesforce-com/</link>
		<comments>http://zhenhua2012.wordpress.com/2012/01/21/finally-official-start-of-my-thesis-at-salesforce-com/#comments</comments>
		<pubDate>Sat, 21 Jan 2012 02:18:54 +0000</pubDate>
		<dc:creator>zhenhua2012</dc:creator>
				<category><![CDATA[Progress]]></category>

		<guid isPermaLink="false">http://zhenhua2012.wordpress.com/?p=43</guid>
		<description><![CDATA[Finally, after several weeks of waiting, today I signed the contract with Salesforce.com (BTW, it is a bit strange to me that the &#8220;.com&#8221; is included in the name, just like the &#8220;!&#8221; in Yahoo!. Maybe it is just for marketing.) Beginning &#8230; <a href="http://zhenhua2012.wordpress.com/2012/01/21/finally-official-start-of-my-thesis-at-salesforce-com/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=43&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Finally, after several weeks of waiting, today I signed the contract with <em>Salesforce.com </em>(BTW, it is a bit strange to me that the &#8220;<em>.com</em>&#8221; is included in the name, just like the &#8220;<em>!</em>&#8221; in <em>Yahoo!. </em>Maybe it is just for marketing.)<em> </em>Beginning from next week, I will be working on my master thesis at the Swedish office of <em>Salesforce.com</em>. The topic is related to social networks. It is called &#8220;Highlighting News Feeds&#8221; for now, but in the near future it will be refined because this topic is actually very big. The basic problem to solve can be described as follows:</p>
<p><em>In today&#8217;s social networks such as Facebook and Twitter, a user can receive overwhelmingly tons of news <em>everyday</em>. For instance, in Facebook, hundreds of status updates and new posts of all one&#8217;s friends flush into one&#8217;s screen, making it hard to read through. One possible solution for this problem is to highlight the most interesting pieces of information for each user, based on the user&#8217;s profile and his/her relationships. Then, the question is how to effectively and efficiently find the most interesting information.</em></p>
<p>(Facebook is actually already applying an <em><a title="EdgeRank: The Secret Sauce That Makes Facebook's News Feed Tick" href="http://techcrunch.com/2010/04/22/facebook-edgerank/" target="_blank">EdgeRank</a></em> algorithm to help filtering news. When I search on-line about the EdgeRank algorithm, most of what I find is that it is affecting media marketing on Facebook. The details of the algorithm is not yet fully revealed.)</p>
<p>After reading several papers, now I see that this problem is closely related with <em>Recommender Systems. </em>According to paper[1], recommender systems can be categorized into two main genres: <em>collaborative</em> and <em>content-based</em>. I have read several papers about content-based recommender systems, and it seems to fit better with my topic. The problem of the collaborative approach is that it requires users to rate about objects. In social networks, ratings about a piece of news is rare. The only examples I can think about are the <em>like </em>on Facebook and possibly the <em>retweet </em>on twitter.</p>
<p>This recommendation part is just half of the story. Given the amount of data in today&#8217;s social networks, a distributed and scalable design is just as important. A publish/subscribe system seems promising, but how can I combine it with a recommender system? <a title="Neo4j vs. MySQL" href="http://www.rene-pickhardt.de/neo4j-graph-database-vs-mysql/" target="_blank">Graph databases are good for querying the social graph</a>, but key-value stores have better scalability and maybe better for storing feeds. Maybe both can be used to build a hybrid system.</p>
<p>There is much to explore and much to decide. I am a bit nervous and a bit exciting. Next week I will meet <a title="Šarūnas" href="http://www.sics.se/~sarunas/" target="_blank">Šarūnas</a>, my possible supervisor at SICS (Swedish Institute of Computer Science). I am looking forward to getting some ideas from him.</p>
<p>[1]</p>
<p>P. Resnick and H. R. Varian, “Recommender systems,” Commun. ACM, vol. 40, no. 3, pp. 56–58, Mar. 1997.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zhenhua2012.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zhenhua2012.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zhenhua2012.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zhenhua2012.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zhenhua2012.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zhenhua2012.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zhenhua2012.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zhenhua2012.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zhenhua2012.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zhenhua2012.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zhenhua2012.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zhenhua2012.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zhenhua2012.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zhenhua2012.wordpress.com/43/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=43&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zhenhua2012.wordpress.com/2012/01/21/finally-official-start-of-my-thesis-at-salesforce-com/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/96eed133c977eaea7cfab08fafb56d7d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zhenhua2012</media:title>
		</media:content>
	</item>
		<item>
		<title>Review of &#8220;Solving The Apparent Diversity-Accuracy Dilemma Of Recommender Systems&#8221;</title>
		<link>http://zhenhua2012.wordpress.com/2012/01/19/review-of-solving-the-apparent-diversity-accuracy-dilemma-of-recommender-systems/</link>
		<comments>http://zhenhua2012.wordpress.com/2012/01/19/review-of-solving-the-apparent-diversity-accuracy-dilemma-of-recommender-systems/#comments</comments>
		<pubDate>Thu, 19 Jan 2012 07:02:25 +0000</pubDate>
		<dc:creator>zhenhua2012</dc:creator>
				<category><![CDATA[Paper Reviews]]></category>

		<guid isPermaLink="false">http://zhenhua2012.wordpress.com/?p=24</guid>
		<description><![CDATA[Motivation Existing recommendation systems overwhelmingly focus on accuracy, i.e., the recommended objects should be interesting for the target user. While accuracy is important, diversity is often overlooked, resulting in a narrow band of popular objects. Accuracy is often measured by similarity, &#8230; <a href="http://zhenhua2012.wordpress.com/2012/01/19/review-of-solving-the-apparent-diversity-accuracy-dilemma-of-recommender-systems/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=24&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>Motivation</strong></p>
<p>Existing recommendation systems overwhelmingly focus on accuracy, i.e., the recommended objects should be interesting for the target user. While accuracy is important, diversity is often overlooked, resulting in a narrow band of popular objects. Accuracy is often measured by similarity, thus strong diversity may hurt accuracy. This paper aims to solve this dilemma.</p>
<p><strong>Assumptions</strong></p>
<p>This paper addresses the above problem in the context of <em>bipartite graphs</em>. On websites such as Amazon, Netflix, RYM, and Delicious, users interact with objects only. Each user has a collection of objects, while each object is collected by multiple users. These links form a bipartite user-object graph.</p>
<p><strong>Solution</strong></p>
<p>To solve the dilemma, the authors adopt a hybrid method. Two algorithms, <em>HeatS </em>and <em>ProbS</em>, are combined using weighted linear aggregation. <em>HeatS </em>is a new algorithm designed for addressing the diversity challenge. <em>ProbS </em>has proven performance for accuracy. Both of them are diffusive algorithms, and thus can be combined efficiently.</p>
<p><em><em>HeatS </em></em>and <em><em>ProbS </em></em>are simple and effective algorithms. Figure-1 below shows an example of them (cited from the original paper).</p>
<p><a href="http://zhenhua2012.files.wordpress.com/2012/01/heats_probs.png"><img class="alignnone size-full wp-image-31" title="HeatS_ProbS" src="http://zhenhua2012.files.wordpress.com/2012/01/heats_probs.png?w=800" alt=""   /></a></p>
<p>As shown in Figure-1, at the start of <em>HeatS</em>, the objects linked to the target user (in <em>u</em>&#8216;s collection) are assigned an initial score 1, objects not in the collection are assigned 0. All users are also assigned 0. The objects scores are then diffused to the users side, with each user receiving a score equal to the average score of the objects in its collection. After that, the score is diffused back to the objects side, with each object receiving the average score of the users who have collected it.</p>
<p>In <em>PorbS,</em> the process is quite similar. The only difference is that each score is distributed evenly among all the links.</p>
<p><strong>Evaluation</strong></p>
<p>Three datasets (Netflix, RYM and Delicious) are used for evaluation. 10% of the links are randomly deleted. Then the algorithms are applied to the remainder to generate a recommendation list for each user. The results are evaluated off-line. No field study is conducted. The metrics are defined as follows:</p>
<p>Accuracy:</p>
<ol>
<li>Recovery of deleted links: Good accuracy corresponds to high rank of a deleted link. The relative rank is defined to be <em>r=p/(o-k),</em> where <em>p</em> is the rank of the uncollected object, <em>o</em> is total number of objects, and <em>k</em> is the number of collected objects. This metric is then defined to be the average of <em>r</em> for all deleted links.</li>
<li>Precision and recall enhancement: Most users consider only the top <em>L </em>places in the recommendation list. To make it better, the precision is defined as <em>d/L</em> where <em>d</em> is the number of deleted links appear in the top <em>L</em> places. The recall is defined as <em>d/D </em>where <em>D </em>is the total number of deleted links. To make it relative, these values are then divided by their counterparts from a random recommendation algorithm. Averaging the result values over all users leads to the precision metric and recall metric.</li>
</ol>
<p>Diversity:</p>
<ol>
<li>Personalization: This measures the uniqueness of recommendation lists. For two users <em>i</em> and <em>j</em>, the difference between their lists are measured by <em>h=1-q/L, </em>where <em>q</em> is number of common items in the top <em>L</em> places. Averaging over all pairs of users with at least one deleted link, we obtain the personalization metric.</li>
<li>Surprisal/novelty: The &#8220;surprisal&#8221; value of an object is defined by <em>I=lg(u/k) </em>where  <em>u </em>is total number of users, and <em>k</em> is number of users who have collected this object. Taking average of the surprisal values of each user&#8217;s top L objects, and then averaging over all users with at least one deleted link, we obtain the surprisal metric.</li>
</ol>
<p><strong>Results&amp;Conclusions</strong></p>
<ol>
<li><em>HeatS </em>is an effective algorithm for generating diverse recommendations. It has very high personalization and surprisal values, compared to other algorithms. But it performs quite poor in terms of accuracy.</li>
<li><em>ProbS</em> is an effective algorithm for generate accurate recommendations. It  consistently outperforms <em>USim</em>, the algorithm based on similarity, though <em>USim </em>performs quite closely.</li>
<li>The hybrid algorithm of <em>HeatS </em>and <em>ProbS </em>can be tuned to improve both accuracy and diversity.</li>
</ol>
<p><strong>Strong Points</strong></p>
<ol>
<li>The evaluation metrics are well and reasonably defined. The random deletion of 10% links is an interesting method for  evaluation.</li>
<li><em>HeatS </em>successfully diversifies the recommendations, according to the metrics defined in the paper.</li>
<li>The hybrid algorithm performs better than any single algorithm, which is really a surprising and beautiful result.</li>
<li>Datasets from Netflix, RYM and Delicious are quite extensive.</li>
</ol>
<p><strong>Weak Points</strong></p>
<ol>
<li>Both <em>HeatS </em>and <em>ProbS </em>work on bipartite graphs, so they are not directly applicable to social graphs.</li>
<li>Field study is the final and probably most important evaluation for an recommendation algorithm. In this paper, however, all evaluations are based on hard-metrics.</li>
</ol>
<p>[1]</p>
<div>
<p>T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J. R. Wakeling, and Y.-C. Zhang, “Solving the apparent diversity-accuracy dilemma of recommender systems,” <em>Proceedings of the National Academy of Sciences</em>, Feb. 2010.</p>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zhenhua2012.wordpress.com/24/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zhenhua2012.wordpress.com/24/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zhenhua2012.wordpress.com/24/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zhenhua2012.wordpress.com/24/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zhenhua2012.wordpress.com/24/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zhenhua2012.wordpress.com/24/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zhenhua2012.wordpress.com/24/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zhenhua2012.wordpress.com/24/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zhenhua2012.wordpress.com/24/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zhenhua2012.wordpress.com/24/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zhenhua2012.wordpress.com/24/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zhenhua2012.wordpress.com/24/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zhenhua2012.wordpress.com/24/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zhenhua2012.wordpress.com/24/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=24&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zhenhua2012.wordpress.com/2012/01/19/review-of-solving-the-apparent-diversity-accuracy-dilemma-of-recommender-systems/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/96eed133c977eaea7cfab08fafb56d7d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zhenhua2012</media:title>
		</media:content>

		<media:content url="http://zhenhua2012.files.wordpress.com/2012/01/heats_probs.png" medium="image">
			<media:title type="html">HeatS_ProbS</media:title>
		</media:content>
	</item>
		<item>
		<title>Review of &#8220;Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches&#8221;</title>
		<link>http://zhenhua2012.wordpress.com/2012/01/19/review-of-recommending-twitter-users-to-follow-using-content-and-collaborative-filtering-approaches/</link>
		<comments>http://zhenhua2012.wordpress.com/2012/01/19/review-of-recommending-twitter-users-to-follow-using-content-and-collaborative-filtering-approaches/#comments</comments>
		<pubDate>Thu, 19 Jan 2012 03:40:03 +0000</pubDate>
		<dc:creator>zhenhua2012</dc:creator>
				<category><![CDATA[Paper Reviews]]></category>

		<guid isPermaLink="false">http://zhenhua2012.wordpress.com/?p=18</guid>
		<description><![CDATA[Motivation This paper is quite similar to the previous paper I just reviewed yesterday. So I will write shortly. The biggest difference is that the previous paper aims to recommend interesting URLs, while this paper aims to recommend interesting users. &#8230; <a href="http://zhenhua2012.wordpress.com/2012/01/19/review-of-recommending-twitter-users-to-follow-using-content-and-collaborative-filtering-approaches/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=18&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>Motivation</strong></p>
<p>This paper is quite similar to <a title="Short and Tweet: Experiments on Recommending Content from Information Streams" href="http://wp.me/p28VVy-7" target="_blank">the previous paper I just reviewed yesterday</a>. So I will write shortly. The biggest difference is that the previous paper aims to recommend interesting URLs, while this paper aims to recommend interesting users. Additionally, one aim of this paper is to check whether twitter data can be effectively used for profiling a user.</p>
<p><strong>Design</strong></p>
<p>The basic idea is to search for profiles that match <em>u</em>&#8216;s profile. How do we build user profiles? It seems that <em>term-vector</em> is a very popular approach. The authors explored five strategies for building a user profile:</p>
<ol>
<li>User&#8217;s own tweets</li>
<li>Followers&#8217; tweets (not explored in the previous paper)</li>
<li>Followees&#8217; tweets</li>
<li>Followers&#8217; ids (not explored in the previous paper)</li>
<li>Followees&#8217; ids (not explored in the previous paper)</li>
</ol>
<p>1~3 are content-based, while 4~5 are collaborative (using the social graph only). Hybrid strategies are also included in experiments.</p>
<p>Another critical issue is how to rank the search results. Ideally the most matching result should rank first. Since this paper uses <a title="Apache Lucene" href="http://lucene.apache.org/java/docs/index.html" target="_blank">Lucene</a> as the underlying infrastructure, it takes advantage of  Lucene&#8217;s built-in TF-IDF weighting algorithm (also used in the previous paper) for ranking the results.</p>
<p>Lucene is used for both query and recommendation. In the case of query, a user enters several terms and Lucene returns a list of profiles that match these terms. In the case of recommendation, the user&#8217;s profile is used as the input of a query.</p>
<p><strong>Off-line Evaluation</strong></p>
<p>This paper evaluates its recommendation system by dividing its dataset into a <em>training-set</em> (19,000 users) and a <em>test-set </em>(1000 users)<em>. </em>Precision and Ranking Effectiveness are evaluated. The more true followees appear in the recommendation list, the more precise. The higher true followees are ranked in the list, the more effective.</p>
<p>The results show that collaborative strategies have better precision than content-based strategies. All strategies have similar ranking effectiveness.</p>
<p><strong>Live-user Trial</strong></p>
<p>The field study uses a hybrid strategy which combines all the recommendations from the basic recommend algorithms. The ranking algorithm is based on the ranks in the basic algorithms. 34 participants choose to follow an average of 6.9 users from a recommendation-list of 30 users, with most new followees rank towards the top of each list.</p>
<p>The query service is also tested. 31 participants choose to follow an average of 4.9 users from the search result which is again 30 users. Each search query consists of an average of 3.7 terms.</p>
<p><strong>Conclusion</strong></p>
<ol>
<li>Twitter data is effective for profiling a user.</li>
<li>Collaborative (or social-relation based) recommendation is significantly more precise for twitter.</li>
</ol>
<p><strong>Strong Points</strong></p>
<ol>
<li>The content-based strategies for building profiles have a good coverage.</li>
<li>The off-line evaluation for recommendation precision and ranking effectiveness is very reasonable.</li>
<li>Using Lucene for ranking and searching is convenient. Lucene&#8217;s built-in TF-IDF algorithm is important for ranking.</li>
</ol>
<p><strong>Weak Points</strong></p>
<ol>
<li>Users in the recommendation list are selected from the global space. Some local space, such as followees of followees and followers of followers, could also be interesting.</li>
<li>There is no mention of whether the content-based strategies are dealing with stems and stop words.</li>
<li>The dataset used in the evaluation could be problematic. Leaf nodes do not have all their followers and followees in Lucene. If a leaf node is in the test-set, the precision of the recommendations would be low for it, no matter what algorithm is used. For better evaluation, leaf nodes should be excluded from the test-set.</li>
</ol>
<p>[1]<br />
J. Hannon, M. Bennett, and B. Smyth, “Recommending twitter users to follow using content and collaborative filtering approaches,” in <em>Proceedings of the fourth ACM conference on Recommender systems</em>, New York, NY, USA, 2010, pp. 199–206.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zhenhua2012.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zhenhua2012.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zhenhua2012.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zhenhua2012.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zhenhua2012.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zhenhua2012.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zhenhua2012.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zhenhua2012.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zhenhua2012.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zhenhua2012.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zhenhua2012.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zhenhua2012.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zhenhua2012.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zhenhua2012.wordpress.com/18/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=18&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zhenhua2012.wordpress.com/2012/01/19/review-of-recommending-twitter-users-to-follow-using-content-and-collaborative-filtering-approaches/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/96eed133c977eaea7cfab08fafb56d7d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zhenhua2012</media:title>
		</media:content>
	</item>
		<item>
		<title>Review of &#8220;Short and Tweet: Experiments on Recommending Content from Information Streams&#8221;</title>
		<link>http://zhenhua2012.wordpress.com/2012/01/18/short-and-tweet-experiments-on-recommending-content-from-information-streams/</link>
		<comments>http://zhenhua2012.wordpress.com/2012/01/18/short-and-tweet-experiments-on-recommending-content-from-information-streams/#comments</comments>
		<pubDate>Wed, 18 Jan 2012 07:57:53 +0000</pubDate>
		<dc:creator>zhenhua2012</dc:creator>
				<category><![CDATA[Paper Reviews]]></category>

		<guid isPermaLink="false">http://zhenhua2012.wordpress.com/?p=7</guid>
		<description><![CDATA[Motivation: For web users, it is becoming harder and harder to deal with the huge amount of news from information streams such as Twitter. Recommendation systems have been created to help users to filter down the stream as well as &#8230; <a href="http://zhenhua2012.wordpress.com/2012/01/18/short-and-tweet-experiments-on-recommending-content-from-information-streams/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=7&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>Motivation:</strong></p>
<p>For web users, it is becoming harder and harder to deal with the huge amount of news from information streams such as Twitter. Recommendation systems have been created to help users to <em>filter down the stream</em> as well as <em>discover new useful content</em>. However, so far there has been little discussion or evaluation of these systems. The authors aim to evaluate the effectiveness of recommendation systems, and identify key elements for good recommendations.</p>
<p><strong>Design:</strong></p>
<p>A recommendation system is built on top of Twitter. Three dimensions of the design space are explored:</p>
<ol>
<li>The candidate content set<br />
Two options for the content set:<br />
a) FoF: URLs from followees and followees-of-followees, within 7 days<br />
b) Popular: Most popular URLs in the last week</li>
<li>Ranking by topic relevance<br />
Relevance is computed between the content and the user profile, using <em>cosine similarity</em>. The user profile can be modelled in two ways:<br />
a) Self-profile: A bag-of-words profile is built for each user with (TF-IDF) scheme, based on <em>u</em>&#8216;s own tweets.<br />
b) Followee-profile: a profile built from profiles of <em>u</em>&#8216;s followees. May better capture <em>u</em>&#8216;s interest as an information seeker.</li>
<li>Ranking by social voting<br />
Every piece of content  get some score from a user who mentioned it. The vote power of a user  <em>f</em> is inverse proportional to <em>f</em>&#8216;s tweet frequency and proportional to # of <em>u</em>&#8216;s followees who follow <em>f</em>.</li>
</ol>
<p>A total of 12 algorithms are built from different combinations of the above components.</p>
<p><strong>Evaluation:</strong></p>
<p>A controlled field study is conducted. Five highest-ranked URLs from each of the 12 algorithms are combined and randomized. Each URL is then marked by subjects as interesting or not. An algorithm get one score if a URL it recommends is deemed as interesting. Best algorithm is FoF-Self-Vote.<br />
Logistic regression is used for analysis of the results. It&#8217;s a good tool for analysing effects of individual parameters, and interaction between parameters.</p>
<p><strong>Conclusions:</strong></p>
<ol>
<li>Topic Relevance is helpful. Self-Topic (relevance to one&#8217;s own tweets) is significantly better than Followee-Topic (relevance to followees&#8217; tweets).</li>
<li>Social voting process is helpful.</li>
<li>Regarding candidate content set, FoF seems to be a bit better than Popular, but not significantly.</li>
<li>Social voting and topic relevance ranking are not independent factors. The combined effect is less than the plus of individual effects. But social voting contributes best.</li>
</ol>
<div><strong>Strong Points:</strong></div>
<div>
<ol>
<li>Self-Profile captures <em>u</em>&#8216;s interest as an information producer, and Followee-Profile as an information seeker. This distinguish is reasonable and sound.</li>
<li>The social voting algorithm in this paper is well adapted. Tweet frequency and trust propagation are included in the voting algorithm, which strengthens its effectiveness.</li>
<li>It is a good idea to randomize of the combined URLs from all the algorithms, before subjects marking each URL. This is  because the order that the URLs are presented to a subject may affect the evaluation.</li>
<li>The logistic regression part is a very good analysis.</li>
</ol>
<div><strong>Weak Points:</strong></div>
<div>
<ol>
<li>Given Followee-Profile is tested, Follower-Profile, a profile built from followers&#8217; content, may be also an interesting design choice. But Follower-Profile is not mentioned in this paper.</li>
<li>Recency is not included as a design dimension. A user may have some long standing interests which could last years but not appear in recent tweets, and some temporary interests which has just been developed.</li>
<li>There is no information about the 44 subjects. Different background/occupation of subjects may affect the results.</li>
</ol>
</div>
</div>
<p><strong>Best Sentence of the Paper</strong></p>
<p>&#8220;With an abundance of information comes the scarcity of attention. Two user needs arise from attention scarcity: <em>filtering</em> and <em>discovery</em>.&#8221;</p>
<div>[1]</div>
<div>J. Chen, R. Nairn, L. Nelson, M. Bernstein, and E. Chi, “Short and tweet: experiments on recommending content from information streams,” in <em>Proceedings of the 28th international conference on Human factors in computing systems</em>, New York, NY, USA, 2010, pp. 1185–1194.</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zhenhua2012.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zhenhua2012.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zhenhua2012.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zhenhua2012.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zhenhua2012.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zhenhua2012.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zhenhua2012.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zhenhua2012.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zhenhua2012.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zhenhua2012.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zhenhua2012.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zhenhua2012.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zhenhua2012.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zhenhua2012.wordpress.com/7/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zhenhua2012.wordpress.com&amp;blog=31681972&amp;post=7&amp;subd=zhenhua2012&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zhenhua2012.wordpress.com/2012/01/18/short-and-tweet-experiments-on-recommending-content-from-information-streams/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/96eed133c977eaea7cfab08fafb56d7d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zhenhua2012</media:title>
		</media:content>
	</item>
	</channel>
</rss>
