Configuration
The pgEdge Document Loader can be deployed with preferences saved in a YAML configuration file and/or specified on the command line with command-line flags.
Note
Command-line flags always take precedence over configuration file settings.
Column Data Types
The tool expects the following Postgres data types for each column type:
| Column | Type | Notes |
|---|---|---|
| doc_title | TEXT or VARCHAR | — |
| doc_content | TEXT or VARCHAR | — |
| source_content | BYTEA | Stores original source (binary) |
| file_name | TEXT or VARCHAR | Recommend UNIQUE constraint for update mode |
| file_created | TIMESTAMP or TIMESTAMPTZ | — |
| file_modified | TIMESTAMP or TIMESTAMPTZ | — |
| row_created | TIMESTAMP or TIMESTAMPTZ | Recommend DEFAULT CURRENT_TIMESTAMP |
| row_updated | TIMESTAMP or TIMESTAMPTZ | Recommend DEFAULT CURRENT_TIMESTAMP |
Specifying Options in a Configuration File
To save your deployment preferences in a file, create a YAML-formatted configuration file (for example, config.yml):
# Source documents
source: "./docs"
strip-path: false
# Database connection
db-host: localhost
db-port: 5432
db-name: mydb
db-user: myuser
db-sslmode: prefer
db-table: documents
# SSL/TLS certificates (optional)
db-sslcert: /path/to/client-cert.pem
db-sslkey: /path/to/client-key.pem
db-sslrootcert: /path/to/ca-cert.pem
# Column mappings
col-doc-title: title
col-doc-content: content
col-source-content: source
col-file-name: filename
col-file-created: created
col-file-modified: modified
col-row-created: created_at
col-row-updated: updated_at
# Operation mode
update: true
Then, when you invoke pgedge-docloader, include the --config flag and the configuration file name:
pgedge-docloader --config config.yml
Specifying Options on the Command-Line
All configuration options have corresponding command-line flags. Use --help to see all available flags:
pgedge-docloader --help
The following command demonstrates specifying options on the command line; in the command, each command line option is followed by the column name in which the content will be stored:
pgedge-docloader \
--source ./docs \
--db-host localhost \
--db-name mydb \
--db-user myuser \
--db-table documents \
--col-doc-title title \
--col-doc-content content \
--col-source-content original \
--col-file-name filename \
--col-file-modified modified_at \
--col-row-created created_at \
--col-row-updated updated_at
Reference - Configuration Options
You can include the following options on the command-line or in a configuration file when invoking pgedge-docloader. Command-line flags override configuration file values.
Use the following options to specify details about the source document:
| Option | Required | Description | Default |
|---|---|---|---|
| source | Yes | Path to file, directory, or glob pattern | — |
| strip-path | No | Remove directory path from filenames | false |
Use the following options to specify details about the database connection:
| Option | Required | Description | Default |
|---|---|---|---|
| db-host | No | Database hostname | localhost |
| db-port | No | Database port | 5432 |
| db-name | Yes | Database name | — |
| db-user | Yes | Database username | — |
| db-sslmode | No | SSL mode (disable, allow, prefer, require, verify-ca, verify-full) | prefer |
| db-table | Yes | Target table name | — |
Use the following options to specify details about the SSL/TLS configuration:
| Option | Required | Description | Default |
|---|---|---|---|
| db-sslcert | No | Path to client SSL certificate | — |
| db-sslkey | No | Path to client SSL key | — |
| db-sslrootcert | No | Path to SSL root certificate | — |
Use the following options to specify details about column mappings:
| Option | Required | Description | Default |
|---|---|---|---|
| col-doc-title | No | Column for document title (TEXT) | — |
| col-doc-content | No | Column for converted Markdown content (TEXT) | — |
| col-source-content | No | Column for original source (BYTEA) | — |
| col-file-name | No | Column for filename (TEXT) | — |
| col-file-created | No | Column for file creation timestamp (TIMESTAMP) | — |
| col-file-modified | No | Column for file modification timestamp (TIMESTAMP) | — |
| col-row-created | No | Column for row creation timestamp (TIMESTAMP) | — |
| col-row-updated | No | Column for row update timestamp (TIMESTAMP) | — |
To review a list of options online, use the command:
pgedge-docloader help
Examples
The following options specify the minimal configuration required by Document Loader:
source: "./docs/*.md"
db-host: localhost
db-name: mydb
db-user: myuser
db-table: documents
col-doc-content: content
col-file-name: filename
The following options specify a complete configuration:
source: "./documentation"
strip-path: true
db-host: db.example.com
db-port: 5432
db-name: production_db
db-user: doc_loader
db-sslmode: verify-full
db-sslcert: ./certs/client.pem
db-sslkey: ./certs/client-key.pem
db-sslrootcert: ./certs/ca.pem
db-table: knowledge_base
col-doc-title: title
col-doc-content: content_markdown
col-source-content: content_original
col-file-name: source_file
col-file-modified: file_modified_at
col-row-created: created_at
col-row-updated: updated_at
custom-columns:
product: "pgAdmin 4"
version: "v9.9"
update: true