Our algorithm is built using the DistilBERT language-processing framework. DistilBERT is a direct descendant of BERT, which is widely used in industry. BERT stands for Bidirectional Encoder Representations from Transformers. Transformers are a deep-learning method in which every output element is connected to every input element of a text sequence, for example, with weights on each element dynamically calculated as the text is processed.
This language model is pre-trained on thousands of books and the English-language Wikipedia corpus, which helps the framework interpret the intended meaning of a given document or passage.
We further pre-train this model on roughly one million text sequences drawn from our corpus of new online vacancy postings. This ensures the language model is familiar with the language of job ad text.
For further information about our algorithm, including a comparison of its performance relative to other text-algorithms widely used in economics, see our paper: “Remote Work across Jobs, Companies, and Space” (2023).
Researchers and other non-commercial users can contact us to gain access to the underlying code and information used to construct the WHAM model.