β οΈ Important Note: This workflow is not a scientifically meaningful pipeline. It is a test demonstration created to showcase the capabilities of the SciWIn Client (s4n).
It uses a sequence of steps (e.g., merging soil and weather data, training a model, and predicting yields) to illustrate how s4n can be used to:
- Create CommandLineTools from Python scripts
- Connect tools into a workflow
- Visualize the pipeline
- Execute workflows locally and remotely
It is not intended for real-world agricultural analysis or decision-making. The data, scripts, and logic are simplified for demonstration purposes only.
Install the latest version of s4n:
curl --proto '=https' --tlsv1.2 -LsSf https://fairagro.github.io/m4.4_sciwin_client/get_s4n.sh | shVerify installation:
s4n -VTo create Tools based of the Python scripts in the code Directory a virtual environment needs to be created using
python3 -m venv .venv
source .venv/bin/activate
pip install pandas==2.3.2 geopandas==1.1.1 shapely==2.1.1 scikit-learn==1.7.2 joblib==1.5.2 matplotlib==3.10.6 requests==2.32.5s4n initFor each script in the code/ directory, we create a CWL CommandLineTool using s4n create. These tools wrap the Python scripts and define how they are executed with inputs, outputs, and dependencies.
π Note: Most soil data has already been downloaded because downloading them takes time.
π Note: There are two different options for creating the tool. Either use the Dockerfile in this repository, but then you need to ensure that Docker is running for the remote and local execution. Alternatively, you can use an image from Docker Hub. In the this case, it is not necessary to have Docker running for remote execution.
The first step is to get soil data from soilgrids for the Iowa counties coordinates.
s4n create -c Dockerfile --container-tag pyplot --enable-network \
python code/get_soil.py --geojson data/iowa_counties.geojson --soil_cache data/soil_data.csvThis creates a new directory workflows/get_soil with a CWL CommandLineTool file get_soil.cwl:
#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool
requirements:
- class: InitialWorkDirRequirement
listing:
- entryname: code/get_soil.py
entry:
$include: ../../code/get_soil.py
- class: DockerRequirement
dockerFile:
$include: ../../Dockerfile
dockerImageId: pyplot
- class: NetworkAccess
networkAccess: true
inputs:
- id: geojson
type: File
default:
class: File
location: ../../data/iowa_counties.geojson
inputBinding:
prefix: --geojson
- id: soil_cache
type: File
default:
class: File
location: ../../data/soil_data.csv
inputBinding:
prefix: --soil_cache
outputs:
- id: soil
type: File
outputBinding:
glob: soil.csv
baseCommand:
- python
- code/get_soil.py
s4n create -c user12398/corn_demo:v1.0.0 --enable-network \
python code/get_soil.py --geojson data/iowa_counties.geojson --soil_cache data/soil_data.csvThis creates fileget_soil.cwl`:
#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool
requirements:
- class: InitialWorkDirRequirement
listing:
- entryname: code/get_soil.py
entry:
$include: ../../code/get_soil.py
- class: DockerRequirement
dockerPull: user12398/corn_demo:v1.0.0
- class: NetworkAccess
networkAccess: true
inputs:
- id: geojson
type: File
default:
class: File
location: ../../data/iowa_counties.geojson
inputBinding:
prefix: --geojson
- id: soil_cache
type: File
default:
class: File
location: ../../data/soil_data.csv
inputBinding:
prefix: --soil_cache
outputs:
- id: soil
type: File
outputBinding:
glob: soil.csv
baseCommand:
- python
- code/get_soil.py
Next, we fetch weather data for each county, for the year that was used for prediction.
s4n create -c Dockerfile --container-tag pyplot --enable-network \
python code/get_weather.py --geojson data/iowa_counties.geojsons4n create -c user12398/corn_demo:v1.0.0 --enable-network \
python code/get_weather.py --geojson data/iowa_counties.geojsonNow we combine soil and weather data into a single feature set.
s4n create -c Dockerfile --container-tag pyplot --enable-network \
python code/merge_features.py --geojson data/iowa_counties.geojson \
--weather weather.csv \
--soil soil.csvs4n create -c user12398/corn_demo:v1.0.0 --enable-network \
python code/merge_features.py --geojson data/iowa_counties.geojson \
--weather weather.csv \
--soil soil.csvWe train a simple model using historical yield data.
s4n create -c Dockerfile --container-tag pyplot --enable-network \
python code/train_model.py --features county_features.csv \
--yield data/iowa_yield.csvs4n create -c user12398/corn_demo:v1.0.0 --enable-network \
python code/train_model.py --features county_features.csv \
--yield data/iowa_yield.csvNow we use the trained model to predict yields for each county.
s4n create -c Dockerfile --container-tag pyplot --enable-network \
python code/predict_yields.py --features county_features.csv \
--model model.pkl \
--scaler scaler.pkls4n create -c user12398/corn_demo:v1.0.0 --enable-network \
python code/predict_yields.py --features county_features.csv \
--model model.pkl \
--scaler scaler.pklFinally, we visualize the predictions on a map.
s4n create -c Dockerfile --container-tag pyplot --enable-network \
python code/plot_yields.py --predictions county_predictions.csv \
--geojson data/iowa_counties.geojsons4n create -c user12398/corn_demo:v1.0.0 --enable-network \
python code/plot_yields.py --predictions county_predictions.csv \
--geojson data/iowa_counties.geojsonNow weβll build the workflow in two clear phases: connect, and save
Now connect the tools in the correct order. Use s4n list -a to inspect available inputs and outputs.
s4n connect demo --from @inputs/geojson --to get_soil/geojson
s4n connect demo --from @inputs/soil --to get_soil/soil_cache
s4n connect demo --from @inputs/geojson --to get_weather/geojson
s4n connect demo --from @inputs/geojson --to merge_features/geojson
s4n connect demo --from @inputs/yield --to train_model/yield
s4n connect demo --from @inputs/geojson --to plot_yields/geojsons4n connect demo --from get_soil/soil --to merge_features/soil
s4n connect demo --from get_weather/weather --to merge_features/weather
s4n connect demo --from merge_features/county_features --to train_model/features
s4n connect demo --from train_model/model --to predict_yields/model
s4n connect demo --from train_model/scaler --to predict_yields/scaler
s4n connect demo --from merge_features/county_features --to predict_yields/features
s4n connect demo --from predict_yields/county_predictions --to plot_yields/predictionss4n connect demo --from plot_yields/iowa_county_yields --to @outputs/iowa_county_yieldsOnce all connections are made, save the workflow to ensure itβs committed to version control:
s4n save demoGenerate a visual representation of the pipeline:
s4n visualize --renderer dot workflows/demo/demo.cwl > workflow.dot
dot -Tsvg workflow.dot -o workflow.svgs4n execute make-template workflows/demo/demo.cwl > inputs.ymlEdit inputs.yml with real data paths:
geojson:
class: File
location: data/iowa_counties.geojson
soil:
class: File
location: data/soil_data.csv
yield:
class: File
location: data/iowa_yield.csv
β οΈ Important Note: Docker should be running to execute the workflow locally.
s4n execute local workflows/demo/demo.cwl inputs.yml
β οΈ Important Note: If you used the local Dockerfile (Option 1), Docker should be running to execute the workflow remotely, because the Docker image will be created locally and published at ttl.sh for an hour. If the existing Docker image, it is not necessary that Docker is running.
s4n execute remote start workflows/demo/demo.cwl inputs.yml