How glass.ai works

Frontiers of AI

One of the frontiers in AI is machine language “understanding”, which aims to give machines the power to understand not just words but entire sentences and eventually entire paragraphs. Following several years of R&D, at glass.ai we have invented AI technology that can read and interpret text at scale. We decided to apply this innovation to reading the web, the biggest research resource that has ever existed.

 

Multiple data sources
90% of the world's data is unstructured and can be found on various platforms. Our AI makes sense of vast quantities of written language, that is textual data – whether from company websites, news, social media, government or other sources. Web data is unstructured, fast-moving and hard to query at scale. Our intelligent crawler tracks hundreds of thousands of topics, signals and other indicators of interest across billions of web pages, watching over more than 40 million organisations with active websites globally, in different countries and languages. We have built a new research capability that reads the web at scale and can understand and monitor the activities of millions of companies.

Below is the typical process we follow to build bespoke data on companies, sectors and markets for our clients:

 
 
 

Step 1:
Data Exploration

  • Define search requirements

  • AI reads business websites, social

  • Discovery of other sources

  • Initial sample for client review

Step 2:
Intelligent Crawling

  • Refine search based on feedback

  • Deeper read of websites, social, news

  • Other sources to augment results

  • Match to official sources, registers

Step 3:
Analysis & QA

  • Automatic QA process

  • Detect and remove anomalies

Step 4:
Data Delivery

  • Format delivery method agreed

  • Packaging of data

  • Single or regular delivery of results

  • Summary stats, visualisations

 

Making sense of web content at scale
glass.ai has an ongoing discovery process that reads millions of websites globally and classifies a site as a company website if it detects certain criteria around content (e.g. active, in English or other languages) and if possible, will predict the sector and geography of the business if enough content is available on the web. Currently, our system reads 35M+ millions of websites of businesses globally - including companies, partnerships, sole traders, academic institutions, government and non-profits. We estimate this is 80% of all businesses globally that have useful web content. The crawler also reads sources like news, social media, and academic and sector-specific websites. Where possible we match the web data about businesses with data from official business registers.

“Our proprietary AI technology uses various approaches in NLP, machine learning and computational linguistics. It combines language understanding through semantic analysis with resource crawling at scale and the maintenance of a deep topic ontology”.

 
 
 

Semantic analysis
glass.ai detects entities and classifies content from text (e.g. companies, people, products, news) with state-of-the-art precision. So when our AI goes to a website or reads a web source, it makes its own decisions on what is being talked about.

 

Resource crawling 
glass.ai is an intelligent crawler, with smart filtering and crawling that follows links that are likely to discover the data that is most relevant to the results, simulating how a human would efficiently scan a website. We extract large-scale datasets efficiently.

 

Topics ontology
glass.ai builds language models and large topic maps to help understand the web content and open the data for further investigation. Our current ontology contains around 300k related topics and themes, which are continuously updated.

 

Entity onboarding
Everything in glass.ai is fully automated. For example, when detecting businesses, glass.ai automatically recognises the type of site, the name of the business, its sector, and then it deeps reads the site to understand the activities of the firm.

 

You can find out more about how glass.ai differs from other approaches to understanding language in this blog post.

Glass.ai_Creative exploration_V2-58.png