Companies That Provide Data to AI Firms
follow up post to: thru Gnip the Tumblr @staff sold our posts to the highest bidders
Key players in social media/web data provision; many partner with or sell to AI developers for training datasets, sentiment analysis, or real-time intelligence
DataSift (now part of Meltwater or evolved): A major Gnip competitor that aggregated and filtered social data from multiple sources for enterprise use. firstmonday.org
Brandwatch (acquired by Cision): Social listening and analytics platform with strong data access; previously integrated directly with Gnip for full Twitter firehose access. prnewswire.com
Dataminr: Real-time social/media intelligence, often using public data streams for alerts and AI-powered insights. cbinsights.com
Bright Data (formerly Luminati): Provides large-scale web and social media data scraping/collection services, including structured datasets from platforms like Facebook, Instagram, etc., popular for AI training. brightdata.com
Other social listening/monitoring firms that supply or analyze data usable for AI:
Sprout Social
Hootsuite
Meltwater
Talkwalker
Crimson Hexagon
Broader AI Training Data Providers
Beyond pure social media firehoses, many companies supply curated, annotated, or scraped data (including social/web content) specifically for AI model training:
Scale AI: High-quality labeled data for training, including from various sources.
Appen, Labelbox, iMerit: Data annotation and collection services.
Defined.ai, Nexdata, others specializing in datasets. datarade.ai
Note: Many social platforms now restrict bulk data access due to AI scraping concerns (e.g., Twitter/X, Reddit licensing deals, Meta changes). AI companies often rely on licensed partnerships, public APIs (with limits), or compliant providers rather than open firehoses. Some platforms (like Reddit or X) directly license data to AI firms.
techpolicy.press If you're looking for providers for a specific platform (e.g., X/Twitter, Instagram), use case (training vs. real-time monitoring), or region, provide more details for tailored recommendations. Always check compliance with platform terms and privacy laws (GDPR, etc.).
see also:
Tumblr @Staff Finally Breaks Silence: “Yes, We Sold Your Unhinged Blogs… And We’d Do It Again”
hellsite tumblr is to blame for chatgpt dysfunctionality














