Making sense of Cape Town using NLP

Speaker: Gordon Inggs

Track: Data Science

Type: Talk

Room: Ballroom

Time: Oct 11 (Fri), 11:00

Duration: 0:45

In this talk, I will describe how Natural Language Processing helped the City of Cape Town understand itself better. By doing so, I will hopefully illustrate how Machine Learning can be applied in the context of a large organisation, with pre-existing formal structures.

Several months ago, I was asked to help identify City employees who perform "data-intensive" work. After several fruitless keyword searches across the City's formal job description data, we turned to the excellent spaCy NLP library to help make sense of the data. And spaCy quickly yielded useful results: understanding of human resource gaps, audience segmentation for internal communication purpose, identification of potential beta testers for new tools, and more.

I will first describe how we embedded the semi-structured formal job descriptions into a vector space using spaCy's large English language model; secondly, how we then embedded phrases of interest, such as "data", into that same vector space. Once everything was in the same vector space, we used various distance measures to assess the "relevance" of the search phrases to the job description. It is these relevance measures that we used to understand the dynamics within the HR dataset.

This talk will appeal to anyone interested in how data science can find a place in a large public organisation. This talk also has practical value to anyone interested in doing a bit of NLP, but are unsure where to start.