Skip to content

pgEdge Document Loader

The pgEdge Document Loader is a command-line tool written in Go that loads documents from various formats into a Postgres database. The tool automatically converts documents to Markdown format and extracts metadata before storing them in the database.

pgEdge Document Loader supports the following document formats:

  • HTML (.html, .htm) - Extracts the document title from <title> tags.
  • Markdown (.md) - Extracts the title from first # headings.
  • reStructuredText (.rst) - Extracts the title from underlined headings.
  • DocBook SGML/XML (.sqml, .xml ) - Extracts the title from <title> or <refentrytitle> tags (PostgreSQL-style reference pages use <refentrytitle>).

Key Features

The Document Loader supports:

  • automatic document format detection.
  • conversion to Markdown format.
  • metadata extraction (title, filename, timestamps).
  • flexible column mapping.
  • importing from single files, directories, and user-specified glob patterns.
  • documentation updates (upsert functionality) in update or insert mode.
  • transaction-based processing with automatic rollback on errors.
  • configuration file storage of execution details for reusable setups.
  • secure password handling (environment variable, .pgpass, or interactive).

License

This project is licensed under the PostgreSQL License.