How Social Media Giants Leverage Big Data And ML To Serve Users Better



Find out how giants like Instagram, Twitter, and Reddit are taking this advanced tech up another notch.


Access exclusive SMW+ content by marketers whose careers you can emulate with a free 30-day trial!

The growth in social network popularity continues posthaste. As of 2018, the number of social media users exceeded 3 billion, and it doesn’t seem the situation is going to change overnight.

To get people hooked and deliver wow user experiences, Facebook, YouTube, LinkedIn, and other big players apply the cutting edge of technology, with big data solutions being the go-to option. Underpinned by artificial intelligence (AI) and machine learning (ML), these solutions let social media thoroughly analyze large amounts of user data, derive actionable insights, and, in turn, deliver hyper-personalized offerings.

And this is just one example of how machine learning solutions can be implemented in the social network environment. Read further to find out how giants like Instagram, Twitter, and Reddit are taking this advanced tech up another notch.

Instagram: In a fight against trolling

Coming in sixth on the list of most popular social networks worldwide, Instagram aims to make the platform as tolerable as possible. For this purpose, they capitalize on DeepText, Facebook’s “learning-based text understanding engine that can comprehend, with near-human accuracy, the textual content of several thousand posts per second.”

Before going live, the system was trained on at least two million comments and categorized them into segments like “bullying, racism, or sexual harassment.” Now, users just have to turn on automatic and manual filters in their account settings if they want to activate offensive comment functionality.

Image source:

To determine tone and intention, i.e. give the target word or phrase an appropriate interpretation and distinguish between abusive language and constructive criticism (across cultures and languages), Instagram’s AI also carefully studies the contextual meaning of surrounding words.

Besides, DeepText assists Instagram in detecting spam. Empowered by huge data assets and human input, the system identifies fake accounts and cleans up their spam comments on posts and live videos. This feature is currently available in nine languages, but the social media behemoth is working toward expanding this list.

To improve its AI system’s accuracy and avoid becoming an over-sanitized platform, Instagram continues gathering and analyzing new data sets.

Twitter: A step toward engaging users

Twitter, another social media giant, banks on ML to make the grade in image cropping. By using data from eye trackers, Twitter trains its neural networks to predict the areas users might want to look at — which are usually faces, text, animals, and other salient image regions.

As neural networks for saliency prediction tend to be too slow and cumbersome to make smart auto-cropping in real time, Twitter splits the process by using two techniques. The first one, knowledge distillation, is employed to train a smaller network to imitate the more powerful one and make a prediction based on a set of images and third-party salient data. The second technique, Fisher pruning, is used to delete features or parameters that are in some sense redundant, while lowering the computational cost.

Such a smart combination allows Twitter to obtain much more runtime-efficient architectures for saliency prediction and to crop images as soon as they’re uploaded — 10x faster than in a vanilla approach. This makes the uploaded photos more engaging and positively impacts the overall user experience.

Below is an example of how Twitter’s shift from a face detection to a saliency prediction algorithm redefined image cropping.

Image source:

Reddit: In a bid to improve website search

For Reddit — a vivid hub of internet news, pics, stories, memes, and videos — advanced search is of top priority. So it stands to reason the social media giant implements the best of tech to increase its searching capabilities and provide users with a custom-fit stream of high-quality content.

Aimed to make its search relevant, fast, and easy to scale with the platform’s growth, Reddit employs Lucidworks’ AI-based platform called Fusion. This helps the company successfully tackle the challenge of updating their indexing pipeline — by pulling together data from several sources into one cohesive canonical view. Also, Reddit not only indexes new post creations, but also updates their relevance signals in real time — based on votes, comments, etc.

The partnership with Lucidworks has given Reddit impressive results:
1. There was a 33% increase in posts indexed.
2. The reindex of all the website content slashed from 11 to 5 hours.
3. The error rate was down by two orders of magnitude, with 99% of search results served in under 500ms.
4. The number of machines needed to run search dropped from 200 to 30.

On top of that, Reddit excelled in boosting user experience and keeping operational costs down. Here’s how the tech stack of the revitalized search platform looks like now:

Image source:

A final word

From crafting personalized offers to fighting spam to enhancing search, machine learning delivers business value to an array of social media platforms. Facebook, Instagram, Twitter, and others have already found the ML-enabled solution to reap these benefits. Have you?

Join 100,000+ fellow marketers who advance their skills and knowledge by subscribing to our weekly newsletter.

Newsletter Subscription

Get the latest insights, trends and best practices from today's leading industry voices.

Learn More

Write for Us

Interested in sharing your ideas and insights with the world? Become a SMW News contributor and reach 300k readers each month.

Apply Here