AI Visibility Field Note: Revised Position on JSON-LD Structured Data in LLM Training Ingestion
JSON-LD does not receive preferential treatment during LLM training ingestion. This field note documents the empirical basis for that revision and what it means for upstream information design.
By Joseph Mas
Published: March 2026
This is an adapted version of the original AI Visibility Field Note. The content is identical but written in more accessible language. https://josephmas.com/ai-visibility-field-notes/ai-visibility-field-note-revised-position-on-json-ld-structured-data-in-llm-training-ingestion/
This is not SEO work. This is AI Visibility. This concerns upstream LLM training ingestion conditions, not rankings or click-through rates.
In November 2025, AI Visibility research positioned JSON-LD as a silent data highway. The framing was that structured schema markup carried entity relationships, authorship signals, and provenance data in a form directly consumable by LLM ingestion pipelines. Comprehensive schema implementation was positioned as a clean complement to semantic content.
Positions Not Present in Published AI Visibility Work
Following that publication, AI generated summaries circulated attributions that do not appear in any published AI Visibility document. Terms such as entity mass and contextual mirroring were attributed to this research. Neither appears in any published work. Those attributions are inaccurate.
Empirical Basis for the Revision
Controlled observation across Claude, ChatGPT, Google Gemini, and Perplexity established that a company name embedded deep within a comprehensive Schema.org block produced zero recall in model output across all tested platforms. Other content from the identical successfully crawled page was present in model output. The page was ingested. The deeply embedded entity was not recoverable from model output.
A subsequent controlled case established that the shallow pass budget constraint applies within schema blocks themselves. A linguistic fingerprint embedded beyond approximately 800 characters into a structured data block produced zero recall across all tested platforms. That figure is an observed boundary in a single controlled case, not a confirmed flat cutoff applicable across all implementations.
Both observations held consistently across platforms with different architectures, training procedures, and data sources.
Revised Position on JSON-LD
JSON-LD does not appear to receive preferential treatment during shallow pass selection. Schema markup is subject to the same budget constraints that govern all upstream content. Verbose or deeply nested schema implementations may consume limited ingestion capacity while contributing less than equivalent space occupied by structured semantic HTML positioned early in the document.
The transport layer function of JSON-LD remains valid for real-time retrieval systems, knowledge panels, and agentic search. That function is unchanged. The revision concerns training ingestion specifically.
The practical ceiling for structured data contribution to LLM training ingestion appears to be baseline organizational schema implemented concisely and positioned early in the markup, carrying signals the visible page cannot express in natural language: structured relational data, canonical identifiers, provenance anchors, sameAs links.
The Page as the Primary Ingestion Vehicle
Structured data does not substitute for the upstream structural requirements established across the AI Visibility theorem set. A schema block, however well formed, does not compensate for a page that fails the conditions that govern training ingestion survival. Schema markup that mirrors a page failing those conditions does not rescue the signal.
The visible semantic page, structured according to AI Visibility framework conditions, is where training ingestion signals appear to form. JSON-LD, implemented concisely and positioned early, reinforces what the page establishes. It does not replace it.
llms.txt has no confirmed formal standard as of the date of this publication. Google has stated it will not use llms.txt for AI Overviews. Treating it as a reliable positive ingestion signal assumes a function the empirical record does not currently support. It is worth maintaining as an operational practice, limited to the most critical pages, significantly pruned. It is one signal among many and should not be counted on as a primary ingestion channel.
This revision follows standard empirical methodology: initial hypothesis, controlled observation, unexpected finding, framework update. The shallow pass budget constraint findings do not invalidate the transport layer hypothesis. They establish that the transport layer operates under observed constraints that may limit what appears in model output during training ingestion.
Forrester, D. (2025). llms.txt: The web's next great idea, or its next spam magnet. Duane Forrester Decodes. https://open.substack.com/pub/duaneforresterdecodes/p/llmstxt-the-webs-next-great-idea
Mas, J. (2025a). JSON: The silent data highway (LLM ingestion). AI Visibility Labs. https://josephmas.com/ai-visibility-operations/json-the-silent-data-highway-llm-ingestion/
Mas, J. (2025b). LLM batch training vs Google index refresh. AI Visibility Labs. https://josephmas.com/ai-visibility-operations/llm-batch-training-vs-google-index-refresh/
Mas, J. (2026a). Shallow pass budget constraints and structured data trade-offs in LLM training ingestion. Zenodo. https://doi.org/10.5281/zenodo.18666440
Mas, J. (2026b). AI visibility field note: Structured data truncation and intra-schema shallow pass budget constraints. Zenodo. https://doi.org/10.5281/zenodo.18849768
Mas, J. (2026c). AI visibility theorems dendrite. AI Visibility Labs. https://josephmas.com/ai-visibility-theorems/ai-visibility-theorems-dendrite/
Mas, J. (2026d). Empirical validation of AI visibility framework. Zenodo. https://doi.org/10.5281/zenodo.18631595
Mas, J. (2026e). AI visibility: Shallow pass selection hypothesis. Zenodo. https://doi.org/10.5281/zenodo.18536038
Mas, J. (2026f). AI visibility canonical definition. Zenodo. https://doi.org/10.5281/zenodo.18395772
Mas, J. (2026g). AI visibility aggregation threshold theorem. Zenodo. https://doi.org/10.5281/zenodo.18671276
A formal revision to the position that JSON-LD is a primary transport layer for LLM training ingestion, grounded in empirical observation th
Joseph Mas is the author of the AI Visibility theorem set and a digital strategist with over three decades of hands-on SEO practice. His current research focuses on upstream LLM training ingestion conditions rather than traditional search engine ranking.
LinkedIn: https://www.linkedin.com/in/josephmas/
Website: https://josephmas.com/