From da2d59f73b4a9491285fd51f2a014cd1913d070b Mon Sep 17 00:00:00 2001
From: rzmk <30333942+rzmk@users.noreply.github.com>
Date: Mon, 13 Apr 2026 17:26:33 -0400
Subject: [PATCH 1/2] docs: update README
---
README.md | 245 +++++++++++++++++++++++++++++++++++-------------------
1 file changed, 158 insertions(+), 87 deletions(-)
diff --git a/README.md b/README.md
index f6e875a..650ead0 100644
--- a/README.md
+++ b/README.md
@@ -195,130 +195,201 @@ DRUF is completely optional and disabled by default. When disabled:
Datapusher+ from version 1.0.0 onwards will be installed as a extension of CKAN, and will be available as a CKAN plugin. This will allow for easier integration with CKAN and other CKAN extensions.
-1. Install the required packages.
+1. Install the required packages. We expect you are using a Linux distribution based on Ubuntu such as Ubuntu 24.04.
- ```bash
- sudo apt install python3-virtualenv python3-dev python3-pip python3-wheel build-essential libxslt1-dev libxml2-dev zlib1g-dev git libffi-dev libpq-dev uchardet
- ```
+```bash
+sudo apt install python3-virtualenv python3-dev python3-pip python3-wheel build-essential libxslt1-dev libxml2-dev zlib1g-dev git libffi-dev libpq-dev uchardet -y
+```
2. Activate the CKAN virtual environment using at least python 3.10.
- ```bash
- . /usr/lib/ckan/default/bin/activate
- ```
+```bash
+. /usr/lib/ckan/default/bin/activate
+```
3. Install the extension using following commands:
- ```bash
- pip install -e "git+https://github.com/dathere/datapusher-plus.git@2.0.0#egg=datapusher-plus"
- ```
+```bash
+cd /usr/lib/ckan/default/src
+pip install -e "datapusher-plus@git+https://github.com/dathere/datapusher-plus.git@3.0.0"
+```
4. Install the dependencies.
- ```bash
- pip install -r requirements.txt
- ```
+```bash
+cd datapusher-plus
+pip install -r requirements.txt
+pip install -r requirements-dev.txt
+```
+
+5. Install [qsv](https://github.com/dathere/qsv), such as the `qsvdp` binary and move it to `/usr/local/bin/qsvdp` for access through the `PATH` environment variable.
+
+
+qsv installation options (click here for more info)
-5. Install [qsv](https://github.com/dathere/qsv).
+### Option 1: Install prebuilt qsv binaries
- ## Option 1: Debian Package Installation (Easiest)
+[Download the appropriate prebuilt binaries](https://github.com/dathere/qsv/releases/latest) for your platform and copy
+it to the appropriate directory, e.g. for Linux:
+
+```bash
+wget https://github.com/dathere/qsv/releases/download/19.1.0/qsv-19.1.0-x86_64-unknown-linux-gnu.zip
+unzip qsv-19.1.0-x86_64-unknown-linux-gnu.zip
+rm qsv-19.1.0-x86_64-unknown-linux-gnu.zip
+sudo mv qsv* /usr/local/bin
+```
- [Download the appropriate precompiled binaries](https://github.com/dathere/qsv/releases/latest) for your platform and copy
- it to the appropriate directory, e.g. for Linux:
+If you get glibc errors when starting qsv, your Linux distro may not have the required version of the GNU C Library. If so, use the binaries ending with `unknown-linux-musl` instead as it they should be statically linked with the MUSL C Library.
- ```bash
- wget https://github.com/dathere/qsv/releases/download/4.0.0/qsv-4.0.0-x86_64-unknown-linux-gnu.zip
- unzip qsv-4.0.0-x86_64-unknown-linux-gnu.zip
- rm qsv-4.0.0-x86_64-unknown-linux-gnu.zip
- sudo mv qsv* /usr/local/bin
- ```
+> ℹ️ **NOTE:** qsv's prebuilt binaries have the ability to self-update to the latest version. Just run qsv with the `--update` option and it will check for the latest version and update itself as required.
+> ```
+> sudo qsvdp --update
+> ```
- Alternatively, if you want to install qsv from source, follow
- the instructions [here](https://github.com/dathere/qsv#installation). Note that when compiling from source,
- you may want to look into the [Performance Tuning](https://github.com/dathere/qsv#performance-tuning)
- section to squeeze even more performance from qsv.
+### Option 2: Install qsv from source
- Also, if you get glibc errors when starting qsv, your Linux distro may not have the required version of the GNU C Library
- (This will be the case when running Ubuntu 18.04 or older).
- If so, use the `unknown-linux-musl.zip` archive as it is statically linked with the MUSL C Library.
+Alternatively, if you want to install qsv from source, follow
+the instructions [here](https://github.com/dathere/qsv#installation). Note that when compiling from source,
+you may want to look into the [Performance Tuning](https://github.com/dathere/qsv#performance-tuning)
+section to squeeze even more performance from qsv.
- If you already have qsv, update it to the latest release by using the --update option.
+Also, if you get glibc errors when starting qsv, your Linux distro may not have the required version of the GNU C Library
+(This will be the case when running Ubuntu 18.04 or older).
+If so, use the `unknown-linux-musl.zip` archive as it is statically linked with the MUSL C Library.
- `qsvdp --update`
+If you already have qsv, update it to the latest release by using the --update option.
- > ℹ️ **NOTE:** qsv is a general purpose CSV data-wrangling toolkit that gets regular updates. To update to the latest version, just run
- qsv with the `--update` option and it will check for the latest version and update as required.
+`qsvdp --update`
- ### Linux Installation
+> ℹ️ **NOTE:** qsv is a general purpose CSV data-wrangling toolkit that gets regular updates. To update to the latest version, just run
+qsv with the `--update` option and it will check for the latest version and update as required.
- If you are running Debian based distribution, you can install qsv using the following command:
- If you are running Debian based Linux distribution on x86_64, you can quickly install qsv using the following commands:
+### Option 3: Install qsv from the Debian package
- Add the qsv repository to your sources list:
+If you are running a Debian-based Linux distribution on x86_64, you can quickly install qsv using the following commands:
- ```bash
- echo "deb [signed-by=/etc/apt/trusted.gpg.d/qsv-deb.gpg] https://dathere.github.io/qsv-deb-releases ./" > qsv.list
- ```
+Add the qsv repository to your sources list:
- Import trusted GPG key:
+```bash
+echo "deb [signed-by=/etc/apt/trusted.gpg.d/qsv-deb.gpg] https://dathere.github.io/qsv-deb-releases ./" > qsv.list
+```
- ```bash
- wget -O - https://dathere.github.io/qsv-deb-releases/qsv-deb.gpg | sudo apt-key add -
- ```
+Import trusted GPG key:
- Install qsv:
+```bash
+wget -O - https://dathere.github.io/qsv-deb-releases/qsv-deb.gpg | sudo apt-key add -
+```
- ```bash
- sudo apt update
- sudo apt install qsv
- ```
+Install qsv:
- ## Option 2: Install Prebuilt qsv Binaries (Easy)
- [Download the appropriate precompiled binaries](https://github.com/dathere/qsv/releases/latest) for your platform and copy it to the appropriate directory, e.g. for Ubuntu LTS 22.04 or 24.04:
+```bash
+sudo apt update
+sudo apt install qsv
+```
- ```bash
- wget https://github.com/dathere/qsv/releases/download/4.0.0/qsv-4.0.0-x86_64-unknown-linux-gnu.zip
- unzip qsv-4.0.0-x86_64-unknown-linux-gnu.zip
- rm qsv-4.0.0-x86_64-unknown-linux-gnu.zip
- sudo mv qsv* /usr/local/bin
- ```
+## Option 3: Build qsv from source
- If you get glibc errors when starting qsv, your Linux distro may not have the required version of the GNU C Library. If so, use the `qsv-4.0.0-unknown-linux-musl.zip` archive as it is statically linked with the MUSL C Library.
+Finally, you can build `qsvdp` from source. It has the additional benefit that the resulting binary will take advantage of all the machine's CPU features, making qsv and DP+ even faster, but may take up to 30 minutes to compile.
+```bash
+git clone https://github.com/dathere/qsv.git
+cd qsv
- > ℹ️ **NOTE:** qsv's prebuilt binaries have the ability to self-update to the latest version. Just run qsv with the `--update` option and it will check for the latest version and update itself as required.
- > ```
- > sudo qsvdp --update
- > ```
+# install Rust, if it's not installed
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
- ## Option 3: Build qsv from source
- Finally, you can build qsvdp from source. It has the additional benefit that the resulting binary will take advantage of all the machine's CPU features, making qsv and DP+ even faster, but may take up to 30 minutes to compile.
-
- ```bash
- git clone https://github.com/dathere/qsv.git
- cd qsv
+# build qsvdp
+CARGO_BUILD_RUSTFLAGS='-C target-cpu=native' cargo build --release --locked --bin qsvdp -F datapusher_plus
+sudo cp target/release/qsvdp /usr/local/bin
+cargo clean
+```
- # install Rust, if it's not installed
- curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+
- # build qsvdp
- CARGO_BUILD_RUSTFLAGS='-C target-cpu=native' cargo build --release --locked --bin qsvdp -F datapusher_plus
- sudo cp target/release/qsvdp /usr/local/bin
- cargo clean
- ```
+6. **Make sure CKAN is running** (e.g. through `ckan -c /etc/ckan/default/ckan.ini run` after activating your virtual environment) then while CKAN is running create an API token for the DP+ Service account which **this command automatically adds the relevant config `ckanext.datapusher_plus.api_token` line to your CKAN config file `/etc/ckan/default/ckan.ini`**. **Replace `CKAN_ADMIN` in the following command with an existing CKAN user with sysadmin privileges**.
-6. Create an API token for the DP+ Service account.
- Replace `CKAN_ADMIN` with an existing CKAN user with sysadmin privileges.
+```
+ckan config-tool /etc/ckan/default/ckan.ini "ckanext.datapusher_plus.api_token=$(ckan -c /etc/ckan/default/ckan.ini user token add CKAN_ADMIN dpplus | tail -n 1 | tr -d '\t')"
+```
- ```
- ckan config-tool /etc/ckan/default/ckan.ini "ckanext.datapusher_plus.api_token=$(ckan -c /etc/ckan/default/ckan.ini user token add CKAN_ADMIN dpplus | tail -n 1 | tr -d '\t')"
- ```
+7. Add the rest of the DP+ config to your CKAN config (e.g. `/etc/ckan/default/ckan.ini`):
-7. DataPusher+ Database Setup
+```ini
+# datapusher-plus settings
+ckanext.datapusher_plus.use_proxy = false
+ckanext.datapusher_plus.download_proxy =
+ckanext.datapusher_plus.ssl_verify = false
+# supports INFO, DEBUG, TRACE - use DEBUG or TRACE when debugging scheming Formulas
+ckanext.datapusher_plus.upload_log_level = INFO
+ckanext.datapusher_plus.formats = csv tsv tab ssv xls xlsx xlsxb xlsm ods geojson shp qgis zip
+ckanext.datapusher_plus.pii_screening = false
+ckanext.datapusher_plus.pii_found_abort = false
+ckanext.datapusher_plus.pii_regex_resource_id_or_alias =
+ckanext.datapusher_plus.pii_show_candidates = false
+ckanext.datapusher_plus.pii_quick_screen = false
+ckanext.datapusher_plus.qsv_bin = /usr/local/bin/qsvdp
+ckanext.datapusher_plus.preview_rows = 100
+ckanext.datapusher_plus.download_timeout = 300
+ckanext.datapusher_plus.max_content_length = 1256000000000
+ckanext.datapusher_plus.chunk_size = 16384
+ckanext.datapusher_plus.default_excel_sheet = 0
+ckanext.datapusher_plus.sort_and_dupe_check = true
+ckanext.datapusher_plus.dedup = false
+ckanext.datapusher_plus.unsafe_prefix = unsafe_
+ckanext.datapusher_plus.reserved_colnames = _id
+ckanext.datapusher_plus.prefer_dmy = false
+ckanext.datapusher_plus.ignore_file_hash = true
+ckanext.datapusher_plus.auto_index_threshold = 3
+ckanext.datapusher_plus.auto_index_dates = true
+ckanext.datapusher_plus.auto_unique_index = true
+ckanext.datapusher_plus.summary_stats_options =
+ckanext.datapusher_plus.add_summary_stats_resource = false
+ckanext.datapusher_plus.summary_stats_with_preview = false
+ckanext.datapusher_plus.qsv_stats_string_max_length = 32767
+ckanext.datapusher_plus.qsv_dates_whitelist = date,time,due,open,close,created
+ckanext.datapusher_plus.qsv_freq_limit = 10
+ckanext.datapusher_plus.auto_alias = true
+ckanext.datapusher_plus.auto_alias_unique = false
+ckanext.datapusher_plus.copy_readbuffer_size = 1048576
+ckanext.datapusher_plus.type_mapping = {"String": "text", "Integer": "numeric","Float": "numeric","DateTime": "timestamp","Date": "date","NULL": "text"}
+ckanext.datapusher_plus.auto_spatial_simplication = true
+ckanext.datapusher_plus.spatial_simplication_relative_tolerance = 0.1
+ckanext.datapusher_plus.latitude_fields = latitude,lat
+ckanext.datapusher_plus.longitude_fields = longitude,long,lon
+ckanext.datapusher_plus.jinja2_bytecode_cache_dir = /tmp/jinja2_butecode_cache
+ckanext.datapusher_plus.auto_unzip_one_file = true
+```
- ```
- ckan -c /etc/ckan/default/ckan.ini db upgrade -p datapusher_plus
- ```
+See the configuration section below for more information.
+
+8. **Optionally** add DRUF mode to your CKAN config:
+
+```ini
+# Enable DRUF (Dataset Resource Upload First) workflow for the DataPusher+ CKAN extension
+ckanext.datapusher_plus.enable_druf = true
+ckanext.datapusher_plus.enable_form_redirect = true
+```
+
+9. Set up the database for `datapusher_plus`:
+
+```bash
+ckan -c /etc/ckan/default/ckan.ini db upgrade -p datapusher_plus
+```
+
+10. If you get `Missing value` for multiple fields as a `ckan.logic.ValidationError`, temporarily you can add `validators: ignore_missing` for those fields in their YAML schema file used in [ckanext-scheming](https://github.com/ckan/ckanext-scheming) and you may also need to set `required: False`.
+11. Make sure you enable the [FileStore](https://docs.ckan.org/en/2.11/maintaining/filestore.html) for allowing file uploads (the `ckan.uploads_enabled` variable is available in your CKAN config already and you should set it to `true`). You'll also need to update FileStore storage permissions as per the docs, for example replace the Linux username `rzmk` to your username in the following commands:
+
+```bash
+sudo chown rzmk /var/lib/ckan/default
+sudo chmod -R u+rwx /var/lib/ckan/default
+```
+
+10. Make sure you enable the [Datastore](https://docs.ckan.org/en/2.11/maintaining/datastore.html) plugin.
+11. In a separate terminal start the job queue:
+
+```bash
+ckan -c /etc/ckan/default/ckan.ini jobs worker
+```
## Configuring
@@ -405,13 +476,13 @@ You can also manually trigger resources to be resubmitted. When editing a resour
Run the following command to submit all resources to datapusher, although it will skip files whose hash of the data file has not changed:
``` bash
- ckan -c /etc/ckan/default/ckan.ini datapusher_plus resubmit
+ckan -c /etc/ckan/default/ckan.ini datapusher_plus resubmit
```
To Resubmit a specific resource, whether or not the hash of the data file has changed:
``` bash
- ckan -c /etc/ckan/default/ckan.ini datapusher_plus submit {dataset_id}
+ckan -c /etc/ckan/default/ckan.ini datapusher_plus submit {dataset_id}
```
## License
From 0e46e10fa321e83b8969e22026798ddfc4bf30d2 Mon Sep 17 00:00:00 2001
From: rzmk <30333942+rzmk@users.noreply.github.com>
Date: Wed, 15 Apr 2026 13:50:34 -0400
Subject: [PATCH 2/2] docs: update README with Debian installation first
---
README.md | 52 ++++++++++++++++++++++++----------------------------
1 file changed, 24 insertions(+), 28 deletions(-)
diff --git a/README.md b/README.md
index 650ead0..c249c10 100644
--- a/README.md
+++ b/README.md
@@ -227,10 +227,25 @@ pip install -r requirements-dev.txt
qsv installation options (click here for more info)
-### Option 1: Install prebuilt qsv binaries
+
+### Option 1: Install qsv from the Debian package
+
+If you are running a Debian-based Linux distribution on x86_64, you can quickly install qsv binaries including `qsvdp` using the following commands:
+
+```bash
+# Add the qsv repository to your sources list:
+echo "deb [signed-by=/etc/apt/trusted.gpg.d/qsv-deb.gpg] https://dathere.github.io/qsv-deb-releases ./" > qsv.list
+# Import trusted GPG key:
+wget -O - https://dathere.github.io/qsv-deb-releases/qsv-deb.gpg | sudo apt-key add -
+# Install qsv:
+sudo apt update -y
+sudo apt install qsv -y
+```
+
+### Option 2: Install prebuilt qsv binaries
[Download the appropriate prebuilt binaries](https://github.com/dathere/qsv/releases/latest) for your platform and copy
-it to the appropriate directory, e.g. for Linux:
+it to the appropriate directory. For example you can use the following commands for qsv v19.1.0 on x86_64 Linux (you can update the version `19.1.0` to the latest version available on the [releases page](https://github.com/dathere/qsv/releases)):
```bash
wget https://github.com/dathere/qsv/releases/download/19.1.0/qsv-19.1.0-x86_64-unknown-linux-gnu.zip
@@ -246,7 +261,7 @@ If you get glibc errors when starting qsv, your Linux distro may not have the re
> sudo qsvdp --update
> ```
-### Option 2: Install qsv from source
+### Option 3: Install qsv from source
Alternatively, if you want to install qsv from source, follow
the instructions [here](https://github.com/dathere/qsv#installation). Note that when compiling from source,
@@ -264,31 +279,6 @@ If you already have qsv, update it to the latest release by using the --update o
> ℹ️ **NOTE:** qsv is a general purpose CSV data-wrangling toolkit that gets regular updates. To update to the latest version, just run
qsv with the `--update` option and it will check for the latest version and update as required.
-### Option 3: Install qsv from the Debian package
-
-If you are running a Debian-based Linux distribution on x86_64, you can quickly install qsv using the following commands:
-
-Add the qsv repository to your sources list:
-
-```bash
-echo "deb [signed-by=/etc/apt/trusted.gpg.d/qsv-deb.gpg] https://dathere.github.io/qsv-deb-releases ./" > qsv.list
-```
-
-Import trusted GPG key:
-
-```bash
-wget -O - https://dathere.github.io/qsv-deb-releases/qsv-deb.gpg | sudo apt-key add -
-```
-
-Install qsv:
-
-```bash
-sudo apt update
-sudo apt install qsv
-```
-
-## Option 3: Build qsv from source
-
Finally, you can build `qsvdp` from source. It has the additional benefit that the resulting binary will take advantage of all the machine's CPU features, making qsv and DP+ even faster, but may take up to 30 minutes to compile.
```bash
@@ -360,6 +350,12 @@ ckanext.datapusher_plus.jinja2_bytecode_cache_dir = /tmp/jinja2_butecode_cache
ckanext.datapusher_plus.auto_unzip_one_file = true
```
+Also add this entry to your CKAN's `resource_formats.json` file for `ckanext.datapusher_plus.formats` to work as expected with `tab` files.
+
+```
+["TAB", "Tab Separated Values File", "text/tab-separated-values", []],
+```
+
See the configuration section below for more information.
8. **Optionally** add DRUF mode to your CKAN config: