<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Alexandros’s Substack]]></title><description><![CDATA[Artificial Intelligence and more]]></description><link>https://alexandroszenonos.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!QuEW!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17484f46-a9a1-48f9-8bf8-5c5667b7320e_1280x1280.png</url><title>Alexandros’s Substack</title><link>https://alexandroszenonos.substack.com</link></image><generator>Substack</generator><lastBuildDate>Wed, 08 Apr 2026 09:43:32 GMT</lastBuildDate><atom:link href="https://alexandroszenonos.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Alexandros Zenonos]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[alexandroszenonos@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[alexandroszenonos@substack.com]]></itunes:email><itunes:name><![CDATA[Alexandros Zenonos]]></itunes:name></itunes:owner><itunes:author><![CDATA[Alexandros Zenonos]]></itunes:author><googleplay:owner><![CDATA[alexandroszenonos@substack.com]]></googleplay:owner><googleplay:email><![CDATA[alexandroszenonos@substack.com]]></googleplay:email><googleplay:author><![CDATA[Alexandros Zenonos]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Global RAG Chatbot for internal compliance teams]]></title><description><![CDATA[Developing a intelligent chatbots that are actually useful]]></description><link>https://alexandroszenonos.substack.com/p/global-rag-chatbot-for-internal-compliance</link><guid isPermaLink="false">https://alexandroszenonos.substack.com/p/global-rag-chatbot-for-internal-compliance</guid><dc:creator><![CDATA[Alexandros Zenonos]]></dc:creator><pubDate>Tue, 24 Mar 2026 15:32:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!QuEW!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17484f46-a9a1-48f9-8bf8-5c5667b7320e_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A compliance assistant is not &#8220;an LLM with documents&#8221;. It&#8217;s a controlled information system sitting inside a risk function.</p><p>If you treat it like a demo, it will behave like one, and the first serious user will break it.</p><p>What matters is not how fluent it sounds. What matters is whether it can be held accountable.</p><h2><strong>Start with boundaries, not prompts</strong></h2><p>Before building anything, you need to answer:</p><ul><li><p>What is this allowed to do?</p></li><li><p>What is it explicitly not allowed to do?</p></li><li><p>What happens when it cannot answer safely?</p></li></ul><p>A useful baseline:</p><ul><li><p>it can quote and explain policy with citations</p></li><li><p>it can compare two internal rules if both are retrievable</p></li><li><p>it cannot give personal legal advice</p></li><li><p>it cannot &#8220;interpret intent&#8221; beyond the policy text</p></li></ul><p>This is where most projects die, because nobody wants to be the person who says &#8220;no&#8221;.</p><h2><strong>The core requirement: an auditable trail</strong></h2><p>Compliance users do not want magic. They want traceability.</p><p>At minimum, the system needs:</p><ul><li><p>retrieval trace: which sources were used (IDs + versions)</p></li><li><p>answer trace: which parts of the response map to which sources</p></li><li><p>system trace: prompt/model/version used</p></li><li><p>refusal trace: why it refused, and what the escalation path is</p></li></ul><p>If you cannot show the trail, you are not building a compliance tool. You are building a liability.</p><h2><strong>Failure modes to design for (because users will find them)</strong></h2><h3><strong>1) Judgement-shaped questions</strong></h3><p>Typical user prompts include:</p><ul><li><p>&#8220;Can I do X?&#8221;</p></li><li><p>&#8220;Is this allowed?&#8221;</p></li><li><p>&#8220;Is this risky?&#8221;</p></li></ul><p>You need explicit handling:</p><ul><li><p>classify risk tier</p></li><li><p>answer only what the policy says</p></li><li><p>route judgement to humans</p></li></ul><h3><strong>2) Policy conflict</strong></h3><p>Two documents say different things. The system must surface conflict and cite both, not &#8220;average them&#8221;.</p><h3><strong>3) Missing context</strong></h3><p>If retrieval is weak, answers become fiction. The correct output is refusal plus the next step.</p><h2><strong>Decisions that mattered</strong></h2><ul><li><p>Short answers, structured by default.</p></li><li><p>Citations are not a feature. They are the product.</p></li><li><p>Retrieval quality is the key metric, not &#8220;LLM accuracy&#8221;.</p></li><li><p>Change control: policy updates must be versioned and testable.</p></li></ul><h2><strong>What broke (and what we changed)</strong></h2><ul><li><p>Users tried to get the bot to &#8220;approve&#8221; decisions. We tightened refusal modes and added explicit escalation prompts.</p></li><li><p>Policy updates changed answers unexpectedly. We moved to strict versioning and regression checks.</p></li><li><p>Early outputs were too smooth. We forced &#8220;evidence-first&#8221; behaviour even if it looked less impressive.</p></li></ul><h2><strong>Takeaways</strong></h2><p>If you want a compliance assistant, you are building governance plus engineering. The model is the easy part.</p><p>If your project is stuck at &#8220;PoC but nobody will sign off&#8221;, the fix is usually boundaries, auditability, and test gates, not a better prompt.</p>]]></content:encoded></item><item><title><![CDATA[Months to minutes: the boring engineering behind genomics NLP]]></title><description><![CDATA[The problem was never &#8220;PDF extraction&#8221;]]></description><link>https://alexandroszenonos.substack.com/p/months-to-minutes-the-boring-engineering</link><guid isPermaLink="false">https://alexandroszenonos.substack.com/p/months-to-minutes-the-boring-engineering</guid><dc:creator><![CDATA[Alexandros Zenonos]]></dc:creator><pubDate>Sun, 08 Mar 2026 13:51:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!QuEW!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17484f46-a9a1-48f9-8bf8-5c5667b7320e_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Genomics is full of &#8220;AI potential&#8221; and short on operational reality.</p><p>The bottleneck is rarely the model. It&#8217;s the plumbing: unstructured reports, inconsistent formats, brittle hand-offs, and a review process that does not scale.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://alexandroszenonos.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Alexandros&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>In one NHS-facing workflow, we took manual extraction that was effectively measured in weeks or months and pushed it down to minutes for first-pass structured output. Not by doing something clever. By doing the unglamorous work properly.</p><h2>The problem was never &#8220;PDF extraction&#8221;</h2><p>The actual problem was end-to-end reliability:</p><ul><li><p>Reports arrive in different templates.</p></li><li><p>Key entities are missing or phrased differently.</p></li><li><p>Clinical users need traceability, not &#8220;best effort.&#8221;</p></li><li><p>The system has to fit into a hospital data boundary.</p></li></ul><p>If you only solve text extraction, you still fail at adoption.</p><h2>What we built (bounded version)</h2><p>A pipeline with four hard constraints:</p><ol><li><p>Deterministic ingestion (no silent failures).</p></li><li><p>Traceable extraction (what was found, where, and how).</p></li><li><p>Standardised output (so it can actually be used).</p></li><li><p>Review loop (so clinicians can correct and improve).</p></li></ol><p>The NLP is a component. The product is the workflow.</p><h2>Decisions that mattered</h2><h3>1) Standardisation beats bespoke &#8220;perfect extraction&#8221;</h3><p>We treated structured output as the target, not raw text. Instead of just scraping &#8220;BRCA1 positive,&#8221; we had to map complex phrases like <em>&#8220;Pathogenic variant identified in BRCA1 c.68_69delAG&#8221;</em> into a standardised schema linking the gene, variant, and clinical significance.</p><p>That means:</p><ul><li><p>Mapping into a consistent schema.</p></li><li><p>Keeping document provenance per extracted field.</p></li><li><p>Supporting partial results instead of all-or-nothing.</p></li></ul><h3>2) Build an audit trail early</h3><p>If you cannot answer &#8220;why did the system say this?&#8221;, you will lose trust fast.</p><p>So every extracted element needed:</p><ul><li><p>Source reference (document + location).</p></li><li><p>Confidence / extraction method metadata.</p></li><li><p>Versioning for pipeline changes.</p></li></ul><h3>3) Make the review loop part of the design</h3><p>Clinicians do not want another dashboard that&#8217;s &#8220;interesting&#8221;.</p><p>They want:</p><ul><li><p>A queue they can work through.</p></li><li><p>Fast correction.</p></li><li><p>Clear uncertainty flags.</p></li><li><p>A feedback mechanism that improves the system.</p></li></ul><h3>4) Treat data boundaries as design constraints</h3><p>Healthcare data is not a playground, especially when navigating NHS data governance. You can&#8217;t just pass personal data to a public API.</p><p>So the default posture was:</p><ul><li><p>Minimise what leaves the boundary (operating within secure, approved environments).</p></li><li><p>Store only what is necessary.</p></li><li><p>Document exactly who can access what, and why.</p></li></ul><h2>What broke (and what we changed)</h2><ul><li><p><strong>Template drift:</strong> Upstream report formats changed. We moved to robust pattern handling and added detection when a template looks &#8220;new&#8221;.</p></li><li><p><strong>Edge cases:</strong> Rare variants and unusual phrasing. We stopped trying to be perfect and focused on triage: surface uncertainty and push to review.</p></li><li><p><strong>Overconfidence:</strong> Early outputs looked clean but hid ambiguity. We forced the system to show uncertainty explicitly.</p></li></ul><h2>Takeaways</h2><ul><li><p><strong>Start with the workflow and data boundary</strong> if you want &#8220;AI in genomics&#8221; to actually deploy.</p></li><li><p><strong>Standardisation and auditability are your fastest wins</strong>, not a fancier model.</p></li><li><p><strong>Clinician review is not a fallback.</strong> It&#8217;s a core feature of the system.</p></li></ul><p>If you&#8217;re sitting on unstructured clinical reports and calling it a data science problem, you&#8217;re already late. It&#8217;s a delivery problem.</p><div><hr></div><p><em>Have you hit similar walls deploying NLP or AI in healthcare? Let me know in the comments below.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://alexandroszenonos.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Alexandros&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Coming soon]]></title><description><![CDATA[This is Alexandros&#8217;s Substack, a newsletter about Artificial Intelligence and more.]]></description><link>https://alexandroszenonos.substack.com/p/coming-soon</link><guid isPermaLink="false">https://alexandroszenonos.substack.com/p/coming-soon</guid><dc:creator><![CDATA[Alexandros Zenonos]]></dc:creator><pubDate>Sat, 31 Dec 2022 00:07:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!QuEW!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17484f46-a9a1-48f9-8bf8-5c5667b7320e_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>This is Alexandros&#8217;s Substack</strong>, a newsletter about Artificial Intelligence and more.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://alexandroszenonos.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://alexandroszenonos.substack.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>