Measuring remote work across space and time, using job ads.


Our Large Language Model (LLM) is built using the DistilBERT attention-based transformer model. Transformers are a deep-learning method in which every output element is connected to every input element of a text sequence, for example, with weights on each element dynamically calculated as the text is processed.

This LLM is then pre-trained on the entire English-language Wikipedia corpus, which helps the framework interpret the intended meaning of a given document or passage.

We further pre-train this model on roughly one million text sequences drawn from our corpus of new online vacancy postings. This ensures the language model is familiar with the language of job ad text.

For further information about our method, including a comparison of its performance relative to other text-algorithms (including recent Generative AI models), see our paper: “Remote Work across Jobs, Companies, and Space”  (2023).

Researchers and other non-commercial users can contact us to gain access to the underlying code and information used to construct the WHAM model.