Method

The WFH Map uses a Large Language Model (LLM) built on the DistilBERT architecture to identify remote work opportunities in job postings.


Phase 1 — Pre-training Foundation

The LLM is pre-trained on the entire English-language Wikipedia corpus. This gives the model a broad foundation for understanding the semantic meaning of text.

Phase 2 — Domain Specialisation

The model undergoes additional training on roughly one million text sequences sourced from our job posting database. This step ensures the model is familiar with the language conventions of employment advertisements.

Phase 3 — Remote Work Classification

Finally, the model is trained using 30,000 human-coded text extracts from job advertisements. Human auditors flagged text that indicates an offer of remote work. The model learns from these labels to classify new postings.


Performance

The final model achieves 99% accuracy relative to human evaluation.

Our peer-reviewed paper Remote Work across Jobs, Companies, and Space (2023) includes detailed methodology comparisons with alternative text classification algorithms, including recent generative AI approaches.


Code Access

Researchers and non-commercial users can contact the team to obtain access to the underlying code and training materials used for the LLM's development. Please visit our Contact page.