Skip to the content.
Parsing Site Configurations with a Custom YAML Engine | AI Systems Design From Scratch

Connect with Amin Boulouma Official

AI Systems Design From First Principles - An implementation of AI Systems Design From First Principles | Product Hunt

🏠 Documentation Hub πŸ“ Engineering Blog πŸ’» GitHub Repository

Parsing Site Configurations with a Custom YAML Engine

Amin Boulouma β€” Software Engineer

We have successfully covered site orchestration and markdown translation. The final crucial pillar of our minimalist Jekyll engine is Configuration Mapping.

In a traditional setup, you would grab PyYAML or a similar heavy dependency. But remember our engineering constraint: zero external dependencies. To handle site configurations like title, author, or theme variables, we can build a lightweight key-value string mapper using the Fluent Builder interface we established earlier.

Let’s dissect how YAMLBuilder works to parse global configurations.


Symmetry in System Architecture

One mark of a clean codebase is architectural consistency. By replicating the exact structural lifecycle we used for the Markdown parser, the learning curve across our codebase drops to zero.

Our YAML configuration processor is split into identical, distinct boundaries:

  1. YAMLBuilder: Manages initialization parameters (fluent method chaining).
  2. YAML: Holds instance data and exposes structural endpoints.
  3. YAMLParser: A stateless engine dedicated solely to lexical transformation.

This design parity keeps components modular and highly predictable.


Analyzing the Configuration Source Code

from ai_systems_design.utils import FileOperationsUtility

class FileOperations:
    @staticmethod
    def read_yaml_file(yaml_file_path): 
        return FileOperationsUtility.read_decoded(yaml_file_path)

class YAMLBuilder:
    def __init__(self):
        self.yaml_file_path = None
        self.yaml_text = None

    def set_yaml_file_path(self, yaml_file_path):
        self.yaml_file_path = yaml_file_path
        return self

    def set_yaml_text(self, yaml_text):
        self.yaml_text = yaml_text
        return self
    
    def build(self):
        return YAML(self.yaml_file_path, self.yaml_text)

    @staticmethod
    def create_default():
        return YAMLBuilder().build()
    
    @staticmethod
    def create_from_file(yaml_file_path):
        return YAMLBuilder().set_yaml_file_path(yaml_file_path).build()
    
    @staticmethod
    def create_from_text(yaml_text):
        return YAMLBuilder().set_yaml_text(yaml_text).build()

class YAMLParser:
    @staticmethod
    def parse(yaml_content):
        mapping = {}
        for line in yaml_content.split("\n"):
            line = line.strip()
            if not line: 
                continue
            else: 
                # Split key and value on the structural delimiter ': '
                mapping[line.split(': ')[0]] = line.split(': ')[1]
        return mapping

class YAML:
    def __init__(self, yaml_file_path=None, yaml_text=None):
        self.yaml_file_path = yaml_file_path 
        self.yaml_text = yaml_text
        self.mapping = {}

    def get_mapping_from_file(self):
        self.mapping = YAMLParser.parse(FileOperations.read_yaml_file(self.yaml_file_path))
        return self.mapping
    
    def get_mapping_from_text(self):
        self.mapping = YAMLParser.parse(self.yaml_text)
        return self.mapping
                
    def get(self, key):
        return self.mapping[key]


Deconstructing the Lexical Splitting Loop

Because this specific implementation focuses on flat metadata dictionaries (e.g., configurations without complex nested arrays), the parser can leverage a lightning-fast line tokenizer.

for line in yaml_content.split("\n"):
    line = line.strip()
    if not line: continue
    else: mapping[line.split(': ')[0]] = line.split(': ')[1]

When given a file containing standard site variable blocks:

title: My Engineering Blog
author: Sarah Dev
url: localhost:4000

The string parser performs a direct split on the canonical : structure:

This dictionary is then returned directly to our HTMLRenderer token replacer, seamlessly matching the variable keys inside our template layouts (AI Systems Design From Scratch).


Why This Works So Efficiently

Connect with Amin Boulouma Official