Skip to content

fairagro/m4.4_demo_corn_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SciWIn Client Demo: Crop Yield Prediction Pipeline

⚠️ Important Note: This workflow is not a scientifically meaningful pipeline. It is a test demonstration created to showcase the capabilities of the SciWIn Client (s4n).

It uses a sequence of steps (e.g., merging soil and weather data, training a model, and predicting yields) to illustrate how s4n can be used to:

  • Create CommandLineTools from Python scripts
  • Connect tools into a workflow
  • Visualize the pipeline
  • Execute workflows locally and remotely

It is not intended for real-world agricultural analysis or decision-making. The data, scripts, and logic are simplified for demonstration purposes only.


πŸ”§ Installation

GitHub Release

Install the latest version of s4n:

curl --proto '=https' --tlsv1.2 -LsSf https://fairagro.github.io/m4.4_sciwin_client/get_s4n.sh | sh

Verify installation:

s4n -V

To create Tools based of the Python scripts in the code Directory a virtual environment needs to be created using

python3 -m venv .venv
source .venv/bin/activate
pip install  pandas==2.3.2 geopandas==1.1.1 shapely==2.1.1 scikit-learn==1.7.2  joblib==1.5.2 matplotlib==3.10.6 requests==2.32.5

πŸ› οΈ Step 1: Initialize Project

s4n init

πŸ§ͺ Step 2: Create CommandLineTools

For each script in the code/ directory, we create a CWL CommandLineTool using s4n create. These tools wrap the Python scripts and define how they are executed with inputs, outputs, and dependencies.

πŸ”Ή 1. Get Soil Data

πŸ“Œ Note: Most soil data has already been downloaded because downloading them takes time.

πŸ“Œ Note: There are two different options for creating the tool. Either use the Dockerfile in this repository, but then you need to ensure that Docker is running for the remote and local execution. Alternatively, you can use an image from Docker Hub. In the this case, it is not necessary to have Docker running for remote execution.

The first step is to get soil data from soilgrids for the Iowa counties coordinates.

Option 1: Dockerfile

s4n create -c Dockerfile --container-tag pyplot --enable-network \
  python code/get_soil.py --geojson data/iowa_counties.geojson --soil_cache data/soil_data.csv

This creates a new directory workflows/get_soil with a CWL CommandLineTool file get_soil.cwl:

#!/usr/bin/env cwl-runner

cwlVersion: v1.2
class: CommandLineTool

requirements:
- class: InitialWorkDirRequirement
  listing:
  - entryname: code/get_soil.py
    entry:
      $include: ../../code/get_soil.py
- class: DockerRequirement
  dockerFile:
    $include: ../../Dockerfile
  dockerImageId: pyplot
- class: NetworkAccess
  networkAccess: true

inputs:
- id: geojson
  type: File
  default:
    class: File
    location: ../../data/iowa_counties.geojson
  inputBinding:
    prefix: --geojson
- id: soil_cache
  type: File
  default:
    class: File
    location: ../../data/soil_data.csv
  inputBinding:
    prefix: --soil_cache

outputs:
- id: soil
  type: File
  outputBinding:
    glob: soil.csv

baseCommand:
- python
- code/get_soil.py

Option 2: existing Docker image

s4n create -c user12398/corn_demo:v1.0.0 --enable-network \
  python code/get_soil.py --geojson data/iowa_counties.geojson --soil_cache data/soil_data.csv

This creates fileget_soil.cwl`:

#!/usr/bin/env cwl-runner

cwlVersion: v1.2
class: CommandLineTool

requirements:
- class: InitialWorkDirRequirement
  listing:
  - entryname: code/get_soil.py
    entry:
      $include: ../../code/get_soil.py
- class: DockerRequirement
  dockerPull: user12398/corn_demo:v1.0.0
- class: NetworkAccess
  networkAccess: true

inputs:
- id: geojson
  type: File
  default:
    class: File
    location: ../../data/iowa_counties.geojson
  inputBinding:
    prefix: --geojson
- id: soil_cache
  type: File
  default:
    class: File
    location: ../../data/soil_data.csv
  inputBinding:
    prefix: --soil_cache

outputs:
- id: soil
  type: File
  outputBinding:
    glob: soil.csv

baseCommand:
- python
- code/get_soil.py

πŸ”Ή 2. Get Weather Data

Next, we fetch weather data for each county, for the year that was used for prediction.

Option 1: Dockerfile

s4n create -c Dockerfile --container-tag pyplot --enable-network \
  python code/get_weather.py --geojson data/iowa_counties.geojson

Option 2: existing Docker image

s4n create -c user12398/corn_demo:v1.0.0 --enable-network \
  python code/get_weather.py --geojson data/iowa_counties.geojson

πŸ”Ή 3. Merge Features

Now we combine soil and weather data into a single feature set.

Option 1: Dockerfile

s4n create -c Dockerfile --container-tag pyplot --enable-network \
  python code/merge_features.py --geojson data/iowa_counties.geojson \
                                --weather weather.csv \
                                --soil soil.csv

Option 2: existing Docker image

s4n create -c user12398/corn_demo:v1.0.0 --enable-network \
  python code/merge_features.py --geojson data/iowa_counties.geojson \
                                --weather weather.csv \
                                --soil soil.csv

πŸ”Ή 4. Train Yield Prediction Model

We train a simple model using historical yield data.

Option 1: Dockerfile

s4n create -c Dockerfile --container-tag pyplot --enable-network \
  python code/train_model.py --features county_features.csv \
                             --yield data/iowa_yield.csv

Option 2: existing Docker image

s4n create -c user12398/corn_demo:v1.0.0 --enable-network \
  python code/train_model.py --features county_features.csv \
                             --yield data/iowa_yield.csv

πŸ”Ή 5. Predict Yields

Now we use the trained model to predict yields for each county.

Option 1: Dockerfile

s4n create -c Dockerfile --container-tag pyplot --enable-network \
  python code/predict_yields.py --features county_features.csv \
                                --model model.pkl \
                                --scaler scaler.pkl

Option 2: existing Docker image

s4n create -c user12398/corn_demo:v1.0.0 --enable-network \
  python code/predict_yields.py --features county_features.csv \
                                --model model.pkl \
                                --scaler scaler.pkl

πŸ”Ή 6. Plot Predictions

Finally, we visualize the predictions on a map.

Option 1: Dockerfile

s4n create -c Dockerfile --container-tag pyplot --enable-network \
  python code/plot_yields.py --predictions county_predictions.csv \
                             --geojson data/iowa_counties.geojson

Option 2: existing Docker image

s4n create -c user12398/corn_demo:v1.0.0 --enable-network \
  python code/plot_yields.py --predictions county_predictions.csv \
                             --geojson data/iowa_counties.geojson

βš™ Step 3: Build the Workflow

Now we’ll build the workflow in two clear phases: connect, and save

πŸ”— Phase 1: Connect Inputs and Outputs

Now connect the tools in the correct order. Use s4n list -a to inspect available inputs and outputs.

πŸ”Ή Connect Inputs

s4n connect demo --from @inputs/geojson --to get_soil/geojson
s4n connect demo --from @inputs/soil --to get_soil/soil_cache
s4n connect demo --from @inputs/geojson --to get_weather/geojson
s4n connect demo --from @inputs/geojson --to merge_features/geojson
s4n connect demo --from @inputs/yield --to train_model/yield
s4n connect demo --from @inputs/geojson --to plot_yields/geojson

πŸ”Ή Connect Intermediate Steps

s4n connect demo --from get_soil/soil --to merge_features/soil
s4n connect demo --from get_weather/weather --to merge_features/weather
s4n connect demo --from merge_features/county_features --to train_model/features
s4n connect demo --from train_model/model --to predict_yields/model
s4n connect demo --from train_model/scaler --to predict_yields/scaler
s4n connect demo --from merge_features/county_features --to predict_yields/features
s4n connect demo --from predict_yields/county_predictions --to plot_yields/predictions

πŸ”Ή Connect Final Output

s4n connect demo --from plot_yields/iowa_county_yields --to @outputs/iowa_county_yields

βœ… Phase 2: Save the Workflow

Once all connections are made, save the workflow to ensure it’s committed to version control:

s4n save demo

πŸ“Š Step 4: Visualize the Workflow

Generate a visual representation of the pipeline:

s4n visualize --renderer dot workflows/demo/demo.cwl > workflow.dot
dot -Tsvg workflow.dot -o workflow.svg

the resulting workflow


πŸš€ Step 5: Execute the Workflow

πŸ”Ή Generate Input Template

s4n execute make-template workflows/demo/demo.cwl > inputs.yml

Edit inputs.yml with real data paths:

geojson:
  class: File
  location: data/iowa_counties.geojson
soil:
  class: File
  location: data/soil_data.csv
yield:
  class: File
  location: data/iowa_yield.csv

πŸ”Ή Run the Full Pipeline

⚠️ Important Note: Docker should be running to execute the workflow locally.

s4n execute local workflows/demo/demo.cwl inputs.yml

πŸ”Ή Run Remotely (Optional)

⚠️ Important Note: If you used the local Dockerfile (Option 1), Docker should be running to execute the workflow remotely, because the Docker image will be created locally and published at ttl.sh for an hour. If the existing Docker image, it is not necessary that Docker is running.

s4n execute remote start workflows/demo/demo.cwl inputs.yml

πŸ“š Learn More

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors