How our AI research capability works

Frontiers of AI

One of the frontiers in AI is natural language understanding, which aims to give machines the power to comprehend and interpret human language. Unlike basic text processing, which is limited to recognizing keywords, natural language understanding aims to grasp the meaning behind sentences, paragraphs, and larger bodies of text, taking into account context, grammar, and semantics. In contrast to “generative” Large Language Models (LLMs), which are trained to predict the next word in a sentence and produce fluent, creative responses that may not be grounded in real-world facts, natural language understanding focuses on accurate interpretation, leading to fact-based, verifiable insights. Following several years of R&D at Glass.AI, we have invented Transparent AI technology that can read text at scale to interpret the evidence underlies business activities. We decided to apply this innovation to reading the web, the biggest research resource that has ever existed.

What is Transparent AI?

Large language models (LLMs) can write, summarise, and converse with remarkable fluency, but they also highlight a growing problem: trust. These systems can produce convincing answers without being grounded in verifiable facts. Their internal reasoning is hidden, sources undisclosed, and outputs often hallucinated. The response to this challenge has been a call for greater transparency: to open up the black box, share their training data, and explain how complex models reach their conclusions. But such explanations are often partial, interpretive, or after the fact. They may clarify model behaviour, yet they cannot make opaque systems truly verifiable.

Transparent AI, as developed by Glass.AI, takes a fundamentally different approach. Where most AI transparency efforts focus on explaining model outputs, Transparent AI is designed to make those explanations unnecessary. Instead of relying on statistical models trained on historical data, Glass.AI builds its intelligence from traceable, real-world evidence. Every data point is sourced, validated, and documented. Each connection between entities can be traced back to the underlying evidence that supports it. The result is a system that is auditable by design, not one that needs to be reverse-engineered to justify its decisions. Transparent AI doesn’t aim to make black boxes visible; it avoids building them in the first place.

Evidence-based AI

Generative AI and LLMs are model-driven: they learn from examples and generate outputs based on probabilistic predictions. This can create surface-level coherence without underlying truth. Transparent AI is evidence-led: it builds knowledge directly from verifiable sources rather than inferred patterns. Building on this principle, Glass.AI enables large-scale discovery and analysis across trusted open and public sources. Our systems automatically extract, structure, and link evidence to create verified intelligence that can be traced and audited at every stage, not inferred from hidden training data. Where generative models estimate what might be true, Transparent AI shows what is true and where it came from.

Crucially, Transparent AI does not rely on any single source of information. Each insight is cross-validated across multiple independent sources—from company websites and regulatory filings to official registers, media coverage, and academic publications. By comparing signals across these diverse datasets, inconsistencies and anomalies can be identified, corroborated, or flagged for review. This process transforms raw data into substantiated evidence, giving users confidence that every output has been independently verified from multiple perspectives. Cross-validation ensures that Glass.AI’s intelligence is not only transparent in origin, but resilient against bias, misinformation, and single-source dependency. It turns the open web into a network of verifiable references rather than a field of untested claims.

Below is the typical process we follow to build bespoke datasets on companies, sectors and markets for our clients:

Data Exploration

  • Define search requirements

  • AI reads business websites & social

  • Discovery of other sources

  • Initial sample for client review

Intelligent Crawling

  • Refine search based on feedback

  • Deeper read of websites, social, news

  • Other sources to augment results

  • Match to official sources, registers

Analysis & QA

  • Automatic QA process

  • Detect and remove anomalies

Data Delivery

  • Format delivery method agreed

  • Packaging of data

  • Single or regular delivery of results

  • Summary stats, visualisations

Book a Demo

Making Sense of Web Content at Scale

Our proprietary evidence-led, transparent AI technology uses various approaches in NLP, machine learning and computational linguistics. It combines language understanding through semantic analysis with resource crawling at scale and the maintenance of a deep topic ontology.

Glass.AI has an ongoing discovery process that reads millions of websites globally and classifies a site as a company website if it detects certain criteria around content (e.g. active, in English or other languages) and if possible, will predict the sector and geography of the business if enough content is available on the web. Currently, our system reads 40 million websites of businesses globally - including companies, partnerships, sole traders, academic institutions, government and non-profits. We estimate that 80% of all businesses globally have useful web content. The crawler also reads sources like news, social media, and academic and sector-specific websites. Wherever possible, we cross-validate web data with official business registers and across other sources, ensuring that all insights are grounded in transparent, multi-source evidence.

Semantic analysis

Glass.AI detects entities and classifies content from text (e.g. companies, people, products, news) with state-of-the-art precision. So when our AI goes to a website or reads a web source, it makes its own decisions on what is being talked about.

Resource crawling 

Glass.AI is an intelligent crawler, with smart filtering and crawling that follows links that are likely to discover the data that is most relevant to the results, simulating how a human would efficiently scan a website. We extract large-scale datasets efficiently.

Topics ontology

Glass.AI builds language models and large topic maps to help understand the web content and open the data for further investigation. Our ontology adapts to emerging themes and trends to ensure comprehensive coverage of the constantly evolving digital landscape.

Entity onboarding

Everything in Glass.AI is fully automated. For example, when detecting businesses, Glass.AI automatically recognises the type of site, the name of the business, its sector, and then it deep reads the site to understand the activities of the firm.

Want Insights No One Else Can See?

Discover how Transparent AI can transform your research process and unlock intelligence hidden across the web. Let's discuss your specific needs.

Book a Demo
How It Works →