Manuel Acosta – Senior Cloud & Data Engineer | AWS, Data Pipelines, Serverless

Data Glossaries: The Semantic Layer That Decides Whether AI on Your Data Actually Works

May 2026 · 24 min read

A follow-up to Data Catalog Core Concepts Explained.

In the previous post I argued that a weak glossary is the single biggest reason data catalogs become ghost towns. I want to take that claim seriously, because I keep watching teams nod at it, agree that glossaries matter, and then ship a catalog where the glossary is a folder of thirty terms named after database columns with the descriptions left blank.

There's a deeper reason this keeps happening. The data engineering profession learned how to model schemas, how to wire pipelines, how to write tests, how to draw lineage. It did not, as a discipline, learn how to model meaning. That work belongs to a different field entirely (information science), and most data teams have never been exposed to it.

This post tries to close that gap. It's a deep dive on data glossaries: what they actually are (and aren't), the three structural types you can build them as, what each one buys you, how to start without drowning, and what a realistic 12-month roadmap looks like. I'll keep referring back to the catalog post where the concepts connect.

If you're standing up a catalog in 2026 and you're serious about AI agents using it, glossary work is no longer optional. It's the layer the agents will lean on hardest, and the layer that's hardest to fake.

Read the full post...

Data Catalog Core Concepts Explained — With an Honest Look at OpenMetadata

May 2026 · 25 min read

There's a particular kind of pain that every data team eventually hits. A data scientist spends three days hunting for the "right" customer table. An analyst builds a dashboard on a column that was deprecated six months ago. A new hire asks where the revenue data lives, and four people give four different answers. Everyone has the data. Nobody can find it.

This is the problem data catalogs are built to solve, and in 2026, with AI agents now reading from the same warehouses humans do, solving it has gone from "nice to have" to "you cannot ship AI safely without it."

Read the full post...

Amazon QuickSight: Implementing Row-Level Security

March 2025 · 5 min read

Row-Level Security (RLS) in QuickSight ensures users only access data relevant to their roles, enhancing data confidentiality and compliance. It enables secure, multi-tenant analytics by restricting visibility at the row level. RLS is crucial for delivering personalized insights without compromising sensitive information.

Read the full post...

Amazon QuickSight User Roles: A Comprehensive Guide

March 2025 · 4 min read

QuickSight is a powerful business intelligence service that helps organizations visualize data, build interactive dashboards, and gain actionable insights. Understanding user roles is essential to maximizing the platform’s capabilities and be aware of costs.

Read the full post...

Manuel Acosta — Senior Cloud & Data Engineer specializing in AWS data pipelines, serverless architectures, and cloud cost optimization.

Let's Work Together!

My Latest Projects

Serverless Laravel on AWS: This Site, for Under $1/Month

Embedded Amazon QuickSight Dashboards

Blog Posts

Data Glossaries: The Semantic Layer That Decides Whether AI on Your Data Actually Works

Data Catalog Core Concepts Explained — With an Honest Look at OpenMetadata

Amazon QuickSight: Implementing Row-Level Security

Amazon QuickSight User Roles: A Comprehensive Guide