AI tools that cannot show their work are guessing. This sounds harsh, but it is accurate. Without sentence level source attribution, an AI answer is a synthesis the model produced from its training data and any context provided. You cannot tell what is grounded in your team’s actual content versus what is invented.
This is the single most important technical question to ask when evaluating an AI tool: does every claim in every answer link to a specific source?
This article gives you five test questions that reveal whether an AI tool is grounded or guessing.
Why sentence level attribution matters
When an AI tool gives you an answer that combines five facts, three of those facts might be accurate to your team’s actual content and two might be hallucinated. If the tool shows you one citation at the bottom of the answer, you cannot tell which is which. The hallucinated facts are indistinguishable from the cited ones.
Sentence level attribution solves this. Each claim in the answer is marked as either grounded (with a specific source link) or inferred (the AI’s synthesis without a single source). Users always know what is verifiable.
For factual lookup questions (“what is our refund policy?”) attribution matters because the AI might be wrong about the current version. For coordination questions (“who is owning the launch?”) attribution matters because the AI might be referencing stale information. For decision questions (“why did we choose Stripe?”) attribution matters because the reasoning needs to be traceable.
In every category, sentence level attribution converts an AI tool from a source of plausible answers into a source of verifiable answers.
The five test questions
When evaluating an AI tool, ask each of these questions and observe how the tool responds.
Test question 1: “What did we decide about [a specific architectural choice]?” A grounded tool returns the decision with citations to the specific Slack thread, meeting transcript, or document where the decision was made. The user can click through and verify. A guessing tool returns a confident sounding summary with no citation, or with a citation that turns out to be unrelated when you click it.
Test question 2: “Who is responsible for [a specific project]?” A grounded tool returns the owner with a citation to where the assignment was made (a Slack message, a Linear ticket, a meeting transcript). You can verify. A guessing tool returns a name with no source, or attributes the ownership based on weak signals (someone who mentioned the project once, for example).
Test question 3: “What is the latest status of [a specific work item]?” A grounded tool returns the most recent update with timestamp and source. You see when it was last updated and where. A guessing tool returns a general status that might be from a week ago, with no indication of recency.
Test question 4: “When did we ship [a specific feature]?” A grounded tool returns the date with citation to the GitHub PR, Linear ticket close, or announcement Slack message. A guessing tool returns a date that sounds plausible but is not sourced.
Test question 5: “Why did we change [a specific decision] last quarter?” A grounded tool returns the reasoning with citations to the decision update Slack thread or meeting transcript. A guessing tool synthesizes a plausible sounding reason from general context.
| Question | Grounded tool | Guessing tool |
|---|---|---|
| When did we ship X? | On 03-18 (Linear, PR #842, retro) | Recently or earlier this year |
| Who owns the auth service? | L. Park (CODEOWNERS, on-call rota) | Likely the platform team |
| Why did we pick Postgres? | Schema pain + small mig cost (ADR-031) | Performance and reliability |
| What broke last week? | PG replica lag (incident #214, 02-08) | A database issue |
| What is blocking checkout v2? | Payments retry policy (Linear #1721) | A few open issues |
What grounded answers look like in practice
A well attributed answer to “what did we decide about Stripe?” might look like:
The team decided to use Stripe over Adyen in Q1. [Source: Slack #engineering thread, March 12] The primary reasons were API quality, developer experience, and existing team familiarity. [Source: Slack #engineering, same thread] The decision was reviewed in the architecture meeting on April 3 and confirmed. [Source: Meeting transcript, April 3]
Three claims, three sources. The user can click through each source to verify the underlying content. No guessing.
A guessing answer to the same question might look like:
The team chose Stripe over Adyen due to API quality, developer experience, and team familiarity. This decision was made in Q1 and has been the standard since.
Same general information, but the user cannot verify any of it. If any part is wrong (maybe Adyen was the original choice and they switched to Stripe later, or maybe the reasoning included additional factors), the user has no way to detect the error.
What this means architecturally
Sentence level attribution is technically harder than answer level attribution. The AI model needs to track which source contributed to which sentence as it generates the answer, then surface that mapping in the response.
This requires retrieval architecture that supports per claim tracking. Vector search alone does not provide this; the retrieval has to be designed for source preservation from the beginning.
This is one of the architectural reasons we covered in the process graph cornerstone about why process graph systems can support better attribution than document graph systems. When the underlying data is structured with explicit source linkage, per claim attribution emerges naturally.
For an AI tool that has been built without this architectural foundation, adding sentence level attribution as a feature is difficult. The retrieval and generation pipelines need to be redesigned. This is why most AI tools currently provide answer level citations (one bundle of citations at the bottom) rather than sentence level (one citation per claim).
How to evaluate
When demoing an AI tool, ask the five test questions. Look at the answers. Three signals tell you the tool is grounded:
- Every meaningful claim has a citation marker
- Clicking the citation goes to a specific, relevant source
- The AI marks inferred content as inferred (different visual treatment from cited content)
If any of these are missing, the tool is guessing, regardless of how confident the answers sound.
Pulse is built around sentence level attribution from the architecture up. Every claim in every answer is marked as either cited (grounded in a specific source) or inferred (synthesized across sources). The live demo at pulsehq.tech shows this end to end. Walk through it and see what grounded answers actually look like.