NUST MS-AI thesis | Machine Vision and Intelligent Systems Lab

Pakistan-localized, cost-aware reinforcement learning for simulated crop resource allocation.

CyclesGym-PK extends the original CyclesGym workflow with Pakistan weather, soil, calendar and price signals, NPK-aware fertilization, hierarchical crop planning and a frozen 155-run evidence corpus.

Candidate
Haider Masood
Supervisor
Dr. Zuhair Zafar
Co-supervisors
Dr. Muhammad Moazam Fraz, Dr. Fahad Ahmed Satti, Dr. Syed Imran Ali
Graphical abstract showing localized data, CyclesGym-PK, hierarchical decisions and audited evidence.
Graphical abstract: localized inputs, simulator loop, two decision scales and audited evidence.
155 audited final runs
113 final-matrix runs
42 ablation runs
NPK fertilization action space
Research Position

A systems thesis, not a black-box demo.

The work is framed as a controlled research platform for learning and evaluating agricultural decisions inside a crop simulator. It does not claim to be a field-ready agronomic advisory product.

01

Problem

Crop decisions in Pakistan are shaped by weather uncertainty, rising input prices and delayed yield feedback. Existing RL environments rarely model the local economics and scope boundaries needed for this context.

02

Platform

CyclesGym-PK connects Stable-Baselines3 policies to a CYCLES-backed environment where actions modify simulator inputs, the executable reruns, and observations are rebuilt from output files.

03

Decision Scales

The stack supports weekly fertilization, annual crop planning and a hierarchical path that combines crop choice with within-season NPK dosing.

04

Evidence Discipline

Final claims are tied to frozen artifacts, generated tables, verification JSON and a canonical 113+42 run corpus rather than proposal-era promises.

System Architecture

Policies do not act on an in-memory toy model.

Each step writes management changes into simulator files, reruns CYCLES when needed, parses generated outputs and then reconstructs observations and rewards. That file-based loop is slower than a pure Python environment, but it keeps the agent tied to crop-model dynamics.

Environment contract Gym/Gymnasium-compatible envs expose reset and step interfaces over CYCLES simulator runs.
Observation path Weather, crop, season, soil and operation outputs are parsed before being converted into policy inputs.
Reward logic Economic returns are expressed in PKR terms using crop revenue minus nutrient cost and compliance signals.
Component-level architecture showing SB3 policy, CyclesGym environment, implementers, CYCLES executable, output parsers, observers and rewarders.
Component-level architecture of the active thesis stack.

Implementation layers

  • Pakistan weather file and soil profile wired into environment setup.
  • Crop and nutrient yearly price series used by the reward pipeline.
  • Fertilization logic expanded beyond nitrogen-only control into NPK-aware decisions.
  • Hierarchical crop-planning and fertilization flow with reporting hooks.

Training stack

  • Stable-Baselines3 PPO and A2C as primary learned-policy families.
  • DQN retained as descriptive comparator where action-space wrapping is required.
  • Vector monitoring and normalization used in training and evaluation flows.
  • JSON, CSV and figure outputs packaged for thesis-grade traceability.
Localized Data

Economic localization changes what the reward means.

The thesis avoids importing generic price assumptions into a Pakistan-facing study. Crop prices, nutrient prices, weather coverage and seasonal structure are made visible so the reward signal can be defended.

Pakistan crop price series for maize producer price, soybean producer price and maize silage proxy.
Crop price series used by the thesis stack.
Pakistan nutrient price trends used for NPK cost-aware reward calculation.
Nutrient price series used for NPK cost accounting.
Pakistan weather coverage plot used in training and evaluation.
Weather coverage and local simulation backbone.
Data role in the implemented stack
Signal Purpose Thesis interpretation
Weather Controls simulator climate conditions and fixed/random weather comparisons. Supports robustness testing, but does not replace field validation.
Soil Initializes local soil conditions for repeated simulation episodes. Anchors the system to a Pakistan-relevant profile rather than a generic default.
Crop and nutrient prices Convert yield and fertilizer use into economic reward components. Turns the objective from yield-only optimization into cost-aware resource allocation.
Crop calendar Constrains planting windows and yearly planning behavior. Connects RL actions to recognizable management timing.
Final Evidence

The main claims come from a frozen 155-run corpus.

The final matrix identifies stable method-and-environment settings. The companion ablation suite isolates targeted design choices such as entropy pressure, blocked-nutrient shaping and nutrient-cost weighting.

Final matrix 113 runs

Primary evidence layer for grouped fertilization, flat crop-planning and hierarchical guarded-rerun results.

Ablation suite 42 runs

Controlled contrasts for entropy, blocked-nutrient penalty and nutrient cost weight choices.

Best single fertilization seed 7.91e5 PKR/ha

PPO single-run reference used as an exemplar, not as a grouped universal winner.

Hierarchical guarded rerun 1.86e6 PKR/ha

PPO fixed-weather grouped reference for the joint crop-planning and fertilization setting.

Grand summary chart comparing fertilization, flat crop planning and hierarchical experimental domains.
Grand summary across fertilization, flat crop planning and hierarchical domains.

What the final chapter emphasizes

The strongest claims are grouped and corpus-level, not selected from isolated outliers. DQN evidence is kept descriptive. Hierarchical guarded reruns are discussed separately because their decision process is materially different from flat crop-planning-only settings.

What remains bounded

Simulation performance is not presented as direct farm advice. The work is evidence for a localized research platform, a decision workflow and a reproducible evaluation process that can support later agronomic validation.

Live Demos and Walkthrough

Open the farmer demo, then step through the RL training loop.

The farmer-facing demo shows how saved simulation artifacts can become clear field actions. The walkthrough below explains one PPO training step inside CyclesGym-PK, from environment reset through rollout collection and policy update.

The walkthrough is served from this static site. The farmer demo opens the hosted Cloud Run prototype.

Embedded Walkthrough

One step of RL training

Open full page
Screenshot of the soybean seasonal planning demo interface.
Prototype interface used to explain seasonal planning and farmer-facing inputs.
Machine-readable QA Verification JSON files track expected rows, matched histories and artifact integrity.
Generated thesis figures Figures are derived from frozen reporting CSVs and cached training data.
Source-bound writing The final narrative separates implemented work, partial work and future scope.
Scope Control

The thesis is strongest because its boundaries are explicit.

The completed system is a localized, cost-aware, simulation-based RL research platform. The page keeps the same discipline as the thesis: implemented capability is separated from post-thesis ambition.

Implemented and evidenced

  • Pakistan weather, soil and price localization in the active stack.
  • NPK-aware fertilization logic and cost decomposition.
  • Crop-planning and hierarchical crop-planning plus fertilization environment paths.
  • Frozen 113+42 run corpus with verification and generated reporting artifacts.

Reserved for future work

  • Irrigation as a learned action in active training flows.
  • Rice-specific or rice-wheat localized experiments.
  • Formal field validation with agronomists and farm observations.
  • Broader market frictions, labor costs and full farm-level economics.
Future work roadmap for extending the thesis after current limitations.
Future work roadmap implied by current limitations.