Problem
Crop decisions in Pakistan are shaped by weather uncertainty, rising input prices and delayed yield feedback. Existing RL environments rarely model the local economics and scope boundaries needed for this context.
NUST MS-AI thesis | Machine Vision and Intelligent Systems Lab
CyclesGym-PK extends the original CyclesGym workflow with Pakistan weather, soil, calendar and price signals, NPK-aware fertilization, hierarchical crop planning and a frozen 155-run evidence corpus.
The work is framed as a controlled research platform for learning and evaluating agricultural decisions inside a crop simulator. It does not claim to be a field-ready agronomic advisory product.
Crop decisions in Pakistan are shaped by weather uncertainty, rising input prices and delayed yield feedback. Existing RL environments rarely model the local economics and scope boundaries needed for this context.
CyclesGym-PK connects Stable-Baselines3 policies to a CYCLES-backed environment where actions modify simulator inputs, the executable reruns, and observations are rebuilt from output files.
The stack supports weekly fertilization, annual crop planning and a hierarchical path that combines crop choice with within-season NPK dosing.
Final claims are tied to frozen artifacts, generated tables, verification JSON and a canonical 113+42 run corpus rather than proposal-era promises.
Each step writes management changes into simulator files, reruns CYCLES when needed, parses generated outputs and then reconstructs observations and rewards. That file-based loop is slower than a pure Python environment, but it keeps the agent tied to crop-model dynamics.
The thesis avoids importing generic price assumptions into a Pakistan-facing study. Crop prices, nutrient prices, weather coverage and seasonal structure are made visible so the reward signal can be defended.
| Signal | Purpose | Thesis interpretation |
|---|---|---|
| Weather | Controls simulator climate conditions and fixed/random weather comparisons. | Supports robustness testing, but does not replace field validation. |
| Soil | Initializes local soil conditions for repeated simulation episodes. | Anchors the system to a Pakistan-relevant profile rather than a generic default. |
| Crop and nutrient prices | Convert yield and fertilizer use into economic reward components. | Turns the objective from yield-only optimization into cost-aware resource allocation. |
| Crop calendar | Constrains planting windows and yearly planning behavior. | Connects RL actions to recognizable management timing. |
The final matrix identifies stable method-and-environment settings. The companion ablation suite isolates targeted design choices such as entropy pressure, blocked-nutrient shaping and nutrient-cost weighting.
Primary evidence layer for grouped fertilization, flat crop-planning and hierarchical guarded-rerun results.
Controlled contrasts for entropy, blocked-nutrient penalty and nutrient cost weight choices.
PPO single-run reference used as an exemplar, not as a grouped universal winner.
PPO fixed-weather grouped reference for the joint crop-planning and fertilization setting.
The strongest claims are grouped and corpus-level, not selected from isolated outliers. DQN evidence is kept descriptive. Hierarchical guarded reruns are discussed separately because their decision process is materially different from flat crop-planning-only settings.
Simulation performance is not presented as direct farm advice. The work is evidence for a localized research platform, a decision workflow and a reproducible evaluation process that can support later agronomic validation.
The farmer-facing demo shows how saved simulation artifacts can become clear field actions. The walkthrough below explains one PPO training step inside CyclesGym-PK, from environment reset through rollout collection and policy update.
The walkthrough is served from this static site. The farmer demo opens the hosted Cloud Run prototype.
The completed system is a localized, cost-aware, simulation-based RL research platform. The page keeps the same discipline as the thesis: implemented capability is separated from post-thesis ambition.