Skip to content

Configuration

The pgEdge Document Loader can be deployed with preferences saved in a YAML configuration file and/or specified on the command line with command-line flags.

Note

Command-line flags always take precedence over configuration file settings.

Column Data Types

The tool expects the following Postgres data types for each column type:

Column Type Notes
doc_title TEXT or VARCHAR
doc_content TEXT or VARCHAR
source_content BYTEA Stores original source (binary)
file_name TEXT or VARCHAR Recommend UNIQUE constraint for update mode
file_created TIMESTAMP or TIMESTAMPTZ
file_modified TIMESTAMP or TIMESTAMPTZ
row_created TIMESTAMP or TIMESTAMPTZ Recommend DEFAULT CURRENT_TIMESTAMP
row_updated TIMESTAMP or TIMESTAMPTZ Recommend DEFAULT CURRENT_TIMESTAMP

Specifying Options in a Configuration File

To save your deployment preferences in a file, create a YAML-formatted configuration file (for example, config.yml):

# Source documents
source: "./docs"
strip-path: false

# Database connection
db-host: localhost
db-port: 5432
db-name: mydb
db-user: myuser
db-sslmode: prefer
db-table: documents

# SSL/TLS certificates (optional)
db-sslcert: /path/to/client-cert.pem
db-sslkey: /path/to/client-key.pem
db-sslrootcert: /path/to/ca-cert.pem

# Column mappings
col-doc-title: title
col-doc-content: content
col-source-content: source
col-file-name: filename
col-file-created: created
col-file-modified: modified
col-row-created: created_at
col-row-updated: updated_at

# Operation mode
update: true

Then, when you invoke pgedge-docloader, include the --config flag and the configuration file name:

pgedge-docloader --config config.yml

Specifying Options on the Command-Line

All configuration options have corresponding command-line flags. Use --help to see all available flags:

pgedge-docloader --help

The following command demonstrates specifying options on the command line; in the command, each command line option is followed by the column name in which the content will be stored:

pgedge-docloader \
  --source ./docs \
  --db-host localhost \
  --db-name mydb \
  --db-user myuser \
  --db-table documents \
  --col-doc-title title \
  --col-doc-content content \
  --col-source-content original \
  --col-file-name filename \
  --col-file-modified modified_at \
  --col-row-created created_at \
  --col-row-updated updated_at

Reference - Configuration Options

You can include the following options on the command-line or in a configuration file when invoking pgedge-docloader. Command-line flags override configuration file values.

Use the following options to specify details about the source document:

Option Required Description Default
source Yes Path to file, directory, or glob pattern
strip-path No Remove directory path from filenames false

Use the following options to specify details about the database connection:

Option Required Description Default
db-host No Database hostname localhost
db-port No Database port 5432
db-name Yes Database name
db-user Yes Database username
db-sslmode No SSL mode (disable, allow, prefer, require, verify-ca, verify-full) prefer
db-table Yes Target table name

Use the following options to specify details about the SSL/TLS configuration:

Option Required Description Default
db-sslcert No Path to client SSL certificate
db-sslkey No Path to client SSL key
db-sslrootcert No Path to SSL root certificate

Use the following options to specify details about column mappings:

Option Required Description Default
col-doc-title No Column for document title (TEXT)
col-doc-content No Column for converted Markdown content (TEXT)
col-source-content No Column for original source (BYTEA)
col-file-name No Column for filename (TEXT)
col-file-created No Column for file creation timestamp (TIMESTAMP)
col-file-modified No Column for file modification timestamp (TIMESTAMP)
col-row-created No Column for row creation timestamp (TIMESTAMP)
col-row-updated No Column for row update timestamp (TIMESTAMP)

To review a list of options online, use the command:

pgedge-docloader help

Examples

The following options specify the minimal configuration required by Document Loader:

source: "./docs/*.md"
db-host: localhost
db-name: mydb
db-user: myuser
db-table: documents
col-doc-content: content
col-file-name: filename

The following options specify a complete configuration:

source: "./documentation"
strip-path: true

db-host: db.example.com
db-port: 5432
db-name: production_db
db-user: doc_loader
db-sslmode: verify-full
db-sslcert: ./certs/client.pem
db-sslkey: ./certs/client-key.pem
db-sslrootcert: ./certs/ca.pem
db-table: knowledge_base

col-doc-title: title
col-doc-content: content_markdown
col-source-content: content_original
col-file-name: source_file
col-file-modified: file_modified_at
col-row-created: created_at
col-row-updated: updated_at

custom-columns:
  product: "pgAdmin 4"
  version: "v9.9"

update: true