Using web data to enrich (not replace) the official industrial taxonomies.

The Data Analytics team at Nesta recently developed a prototype industrial taxonomy based on business website data from glass.ai. The paper published on the Economic Statistics Centre of Excellence (ESCoE) website is worth a read for anyone interested in how web data and AI can help improve our understanding of the economy. The Nesta team used business website data to explore the limitations of the Standard Industrial Classification (SIC) taxonomy and developed a prototype for a bottom-up industrial taxonomy based on semantic similarities between company descriptions. The prototype made it possible to decompose uninformative SIC codes into granular industries and build user-driven industry groups which might be of interest to policymakers (e.g. ‘green economy’).

There has been a long debate on whether the SIC code system should be updated or totally replaced. Standard Industrial Classification (SIC) codes are used to categorise businesses based on their activity and policymakers and analysts use this official taxonomy to measure sectors, identify stakeholders to engage with, develop policies, and measure the impact of policies. However, SIC codes have important limitations as we previously outlined in this article.

At glass.ai we believe the SIC code system needs updating (quite urgently) but it should not be replaced. Over the past two years, we have completed 100+ projects where we’ve used our AI to read the web and research many sectors of the economy (including emerging sectors). One of the things we have learned is that different audiences can have different opinions about the boundaries of sectors, which is totally understandable as sectors can be defined in many ways and for different purposes. For example, a mapping project of the AI sector may need to focus on companies developing AI, whereas another study may need to use a narrower definition of AI and also include firms across sectors that are adopting AI technologies.

We believe policy analysts and economists will increasingly combine SIC codes and novel sources like web data to gain a better understanding of the economy. Official industrial classification systems do not need to be replaced with alternative taxonomies as this will simply replicate the same mistakes of the rigid SIC codes. At glass.ai we are convinced that a flexible approach that combines an updated industrial classification system with the richness of the web’s data is a much better fit for purpose. Just like the Nesta paper suggests, there are several potential avenues to combine official and bottom-up taxonomies in order to improve the understanding of the economy and inform economic policy. The debate is not over.

Sergi Martorellbatch2