
Share
Diving into the 5,036-page Office Open XML documentation, this project reveals the intricate details behind PowerPoint's file format, offering insights on building a custom slide generator that rivals AI tools.
Making PowerPoint presentations can be deceptively challenging, especially when you need them to look polished and professional. While there are several AI-powered slide generators available, they often fall short of producing truly satisfying results. I decided to build a better one, diving deep into the 5,036-page Office Open XML documentation to understand how PowerPoint files work under the hood.
To generate custom PowerPoint presentations, I needed to understand their internal structure. Microsoft’s Office Open XML standard (ECMA-376) is the key here, but it’s a behemoth at 5,036 pages. For those who haven’t delved into this yet, here’s a quick overview:
A .pptx file is essentially a zip archive containing multiple XML files and media assets. You can explore its contents by simply unzipping the file:
unzip pres.pptx
Each slide, chart, speaker note, theme, master slide, and layout has its own XML file. These files reference each other in a well-defined but complex manner. The complexity is such that PowerPoint can be Turing-complete just through shapes and animations (no scripting required)!
Let’s break down the structure of a typical slide:
slideX.xml file, where X is the slide number. This file contains all the elements on the slide.media directory. The XML files reference these assets using relationship IDs (rIds).For example, here’s a snippet of an XML subtree for a picture:
<Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="../media/image41.png" />
<p:pic> tag represents the image, with child tags like <p:nvPicPr> for non-visible properties and <a:xfrm> for position transformations.slideX.xml.rel, using the relationship ID:
<a:blip r:embed="rId5" />
Once you understand where to place the right XML tags, generating a custom slide deck becomes more manageable. Here’s a simplified example of how to set the geometry and add adjustments for a rounded rectangle shape:
<a:prstGeom prst="roundRect">
<a:avLst>
<a:gd name="adj" fmla="val 2430" />
</a:avLst>
</a:prstGeom>
Building the slide generator wasn’t without its challenges:
Despite these hurdles, the process was rewarding. By the end of three weeks, I had a functional slide generator that could produce high-quality presentations tailored to our needs at Listen, where we conduct user interviews and need to present customer research insights effectively.
Reverse engineering PowerPoint’s XML structure allowed me to build a custom slide generator that meets our specific requirements. While it was a complex journey, the result is a tool that produces polished, professional slides with minimal effort.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
6 May 2025
133 articles
Related Articles
Related Articles
More Stories