Paper.li has grown to become a real publishing powerhouse: 80 million articles published everyday in our user’s papers. Such volume requires both a robust processing pipeline for fetching articles from everywhere on the web and semantically analyzing them in 7 languages as well as an inherently scalable storage engine.
Back in March we implemented the first phase of a two-phase platform upgrade in order to keep up with our growing publisher’s needs and scale for the future. For the technically curious and pending a more detailed walk-through of the technical stack that support our presses, we thought we would share a few details about what has been happening in the machine room.
- We transitioned to a new distributed storage engine called Cassandra and a new distributed message broker called Kafka.
- We significantly changed the way data is gathered, stored and organised to be able to move to the next level of service to deliver even better content to your editions.
- For those of you who are into numbers, we can now process over 10’000 articles per second — we believe we can see future growth coming with a certain degree of optimism!
- simplified paper creation
- improved content search across networks keywords, queries and RSS
- auto-populated content for viewing before publishing
- complete editions of papers available
- access to video and images
- Ability to share older editions via Twitter