NLP · Personality computing · 2025

Traitlytics

Predicting Big-Five personality traits from LinkedIn text with fine-tuned BERT.

0.476 – 0.556

BERT R² across the five Big-Five traits

MSE

Fine-tuning objective on weakly supervised labels

PythonBERT (bert-base-uncased)TF-IDFscikit-learnJupyter

GitHub

Overview

Traitlytics is a research project that infers Big-Five personality traits (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) from LinkedIn profile text, bypassing the self-report questionnaires that dominate the field. The main empirical result is that a fine-tuned BERT model substantially outperforms classical regression baselines on all five traits, reaching R² values between 0.476 and 0.556.

Background

The Big-Five model is the dominant framework in personality psychology, but its standard instruments are self-report inventories. Self-reported personality suffers from social desirability bias, limited scale, and respondent fatigue, and it is impossible to administer retroactively over historical text.

Text-based personality inference sidesteps these problems by treating natural language as behavioural evidence. The challenge is supervision: true Big-Five scores rarely accompany the text corpora one wants to model. Weak supervision from trait-defining sentences offers one way in.

Methodology

Labels are generated by computing cosine similarity between each LinkedIn profile (TF-IDF vectorised) and a bank of trait-defining sentences drawn from psychometric literature. The result is a continuous 1-to-5 score per trait per profile, suitable for regression.

Baselines include linear regression, support-vector regression, and a random forest with 100 estimators. The best model is bert-base-uncased fine-tuned with mean-squared-error loss on the weak labels. All models are evaluated with RMSE, MAE, and R² on a held-out portion of the corpus.

Results

BERT wins on every trait. R² lands between 0.476 (Neuroticism) and 0.556 (Openness). Classical baselines cluster 0.15 to 0.20 R² lower across the board. The gap tracks the intuitions one might have from the literature: contextual representations help most on the traits (Openness, Conscientiousness) whose text signals depend on phrasing rather than keywords.

Scope and limitations

The labels are weak by construction. They are derived from similarity to curated sentences rather than validated human ratings, so the absolute R² values should be read as an upper bound on agreement with the label generation process, not on ground-truth personality. The comparison across models, however, is fair because every model sees the same labels.

Tech stack

Python 3.9+: Primary language.
BERT (bert-base-uncased): Fine-tuned with MSE loss for continuous trait regression.
TF-IDF: Feature extraction for classical baselines and label generation.
scikit-learn: Logistic regression, SVR, random forest baselines.
Jupyter: Exploratory analysis, ablations, and evaluation notebooks.

References

[1]Goldberg, L. R. (1993). The structure of phenotypic personality traits. American Psychologist.
[2]Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL.