Editors Note: This Post is a continuation of the conversation started on “Taming Big Data with Smart Contexts” by Edouard Lambelet.
Last week, Ed gave you a glimpse of what’s coming in our future and how we are approaching the data overload at Paper.li.
The responses were thought-provoking and just as we hoped, your input fueled a few discussions both online and offline. There were lots of ideas about data and providing real value that I thought it could be interesting to share.
Let’s go through some of them.
Joining the data discussion
To begin with, let me highlight Jeremy Silver’s response as he touched on various points including: analytics, ethical data sales, connecting brands and users and semantic analysis. All interesting areas to unleash the value in data, with some just down the road for us, and some others certainly a bit further. But I will come back specifically to the point on semantics a little later in the post.
Paulo Caldeira’s reference to McLuhan was totally on point. Studying the medium and the context around it can give us some new directions in piecing the puzzle together. Paulo also had a very simple statement on how data could be given back:
“Well, to give the data back to people we don’t need more “underwares”, all we need are apps and projects that really get all connected giving context and localization.”
I agree with Paulo. There are many ways data can be given back to users, and his suggestion is a great idea. This is certainly where the “web-ecosystem” is headed to, and already is, in a way. To fulfill the full potential of such an approach may require some new industry standards that will manage the discovery and interaction of these services. That means time…but we’ll be sure to keep this in mind, and hopefully be able to participate, in our small way, in this global undertaking.
‘JW or What just changed’ touched on many of the questions we have been asking ourselves over the past while:
“What is the type of content readers like to see? what is the point of difference that appeals to them’ and ‘what of all the sources you index for us to use is the most current – i.e. content sources that are always refreshing their collections not leaving them to languish and are quality assured in some way”
This brings up something we have worked on over the past 2 months, trying to find the best way to profile source quality and average engagement potential of articles published by a source. For sure this is another great way to bring data back to users and we are looking to integrate just that in our new project… stay tuned.
Peter Young also made a great point regarding the “value” of the data which made us really think:
“What does all of this “collection” equal in money and “time saved” and a variety of other tangible metric points.”
While we would have to dig deeper into these questions in order to provide some meaningful answers, we can say that we are currently processing upwards of 1000 articles daily per paper – just imagine the time needed to find, browse and read so many articles… every day. Thats a lot of time saved!
Boyink had a valid question about framing the discussion and asking what exactly we are thinking of?
Our thoughts at this moment are open… we know we can do many things with the data we have. Rather than just thinking about it, we believe it is best to try out new ideas. We are humble enough not to believe we have all the answers, but nimble enough to iterate through quick trials – to learn by doing. We are looking to develop some simple products, on a regular basis, which would bring value outside of the publishing context and of course bring new features to the Paper.li service itself in the long run.
Food for thought
So, as Boyink asked, what exactly are we thinking of?
As Ed said, we have built a platform that processes a LOT of content and social signals from various sources, including Twitter, Facebook, Google+ and RSS. For us, Twitter remains the largest provider of all that data so to get started we decided to consider new angles to look at the Twitter data ecosystem.
Up until now, we have focused on solving the issue of “social overload” for our users by analysing the never ending and ever increasing stream of tweets, articles, photos and videos. Once analysed, we pick the most important and relevant items and publish them in a newspaper format.
But now it’s time to ask the question:
“What else should, and can, be done to improve the quality and relevancy of the chosen content?”
In the Twitter ecosystem it’s all about followers… followers interact with the shared content. Interesting content keeps them listening… irrelevant content ends up in unfollows.
How well does the average Twitter user know their community? Just take a moment to think about that yourself. How well do you know your followers? Are you aware of their interests, what they want to read, retweet and what inspires them to share? This important part of the equation has been somewhat overlooked, until now.
Matching quality and relevant content to what audiences want to read, based on analysis and not speculation, is something that would bring great value to the content world.
The final layer to improving the shareability of content has to be the source. Where does the article come from, how relevant and engaging are the articles from that source over a given period of time? Factoring in audience and sources would certainly bring that extra value to those looking for content to share.
We believe this is something we can do. And more to the point, something we have actually started to work on. Probably the biggest part of solving this is finding a way to capture the interests of a user’s followers in such a way that we are able to find matching content. This has to do with semantic analysis. For the past few months, we have been working on adding a much more fine-grained topical analysis of articles, starting with English content. This should allow us to create more precise topical interests profiles for any and all users. And from there, well, you will have to wait just a bit longer to see what we have in store.
To get back to Jeremy’s initial mention of semantics, I believe that this is where the fun starts…
What do you think about knowing your followers? How important are the sources for you?
With a little help from my friends
Thanks again to all of you for your great input on Ed’s post and I hope some more of you will be encouraged to share your thoughts and ideas over the next while. We are committed to making your data work better and harder for you and we are interested to continue the discussions. I’d like to invite you to join me and some of the team at a Google Hangout on Feb 11th, 2 PM ET (20:00 CET) to do just that. Take a moment to meet some of the team, hear the ideas we have regarding what we could do with data and importantly share your ideas with us. Let me know if you are interested in joining the hangout, just add a comment to this post or drop us a mail at firstname.lastname@example.org
If you haven’t done so yet, please sign up to Backstage at Paper.li , to keep the discussions going and help shape the future together.