Avocado Dry Matter Quality Prediction

Executive Summary
Weekly dry matter predictions by field and variety help time harvests to DM thresholds. The system consolidates weather and field data, enriches it with agronomy context like terrain and seasonality, and publishes versioned reports and dashboards with natural language Q&A.
Achieving 1.2% MAE versus 2% requirement on holdout seasons, the solution provides weekly coverage across all varieties and fields with under 24-hour data freshness. The interactive dashboard and chat assistant enable faster decision-making and reduce out-of-spec harvests.
Business Challenge
Harvest Scheduling Inefficiencies
Dry matter uncertainty led to suboptimal harvest timing, resulting in quality issues and missed revenue opportunities. Without accurate DM predictions, harvests occurred too early or too late.
Limited Supply Visibility
Week-by-week supply uncertainty made it difficult to optimize labor allocation and logistics planning. Manual estimation couldn't provide the granularity needed for efficient operations.
Complex Data Integration
Manual and brittle ingestion across weather, field, and lab sources created data quality issues and delays. Lack of automation meant predictions were often based on outdated information.
Analysis Quality Risks
Risk of leakage or bias in historical analyses due to improper train/test splitting and lack of versioning. Manual processes made it difficult to ensure consistent, reliable predictions.
Industry Context
- Avocado dry matter content is the primary indicator of fruit maturity and eating quality
- Optimal DM thresholds vary by variety and market destination, requiring precise timing
- Weather variability and microclimates create significant heterogeneity within and between fields
- Legacy approaches using rules of thumb miss plot-level variations and inter-annual dynamics
What We Built
Data and Signals
Weather Data
- • Historical and forecast weather patterns
- • Temperature accumulation and heat units
- • Rainfall and humidity metrics
- • Solar radiation and evapotranspiration
Field Data
- • Field boundaries and identifiers
- • Variety and planting information
- • Terrain features and elevation
- • Historical dry matter measurements
Lab Measurements
- • Weekly DM sampling results
- • Historical DM progression curves
- • Quality metrics and correlations
- • Sampling location metadata
Agronomy Features
- • Seasonal position indicators
- • Terrain-based microclimate effects
- • Historical yield and quality patterns
- • Variety-specific maturation profiles
Modeling Approach
Automated Data Pipeline
Weekly refresh pipelines automatically consolidate weather, field, and lab data with consistent identifiers and quality checks that flag anomalies like field size shifts.
Prediction Engine
ML model generates weekly DM predictions at field and variety level, trained on historical data with strict train-on-past, test-on-future evaluation to ensure real-world performance.
Feature Enrichment
Agronomy-aware features capture terrain context, seasonal position, and historical patterns. The system automatically derives complex interactions between weather, terrain, and variety.
Trust by Design
Holdout season testing with MAE tracking ensures predictions generalize to new conditions. Versioned outputs enable comparison over time and continuous improvement.
Planning and Simulation Tool
Interactive DM dashboard displays predictions with charts, filters, and monitors. Integrated chat assistant enables natural language queries for custom analysis. Versioned reports ensure traceability and comparability.
Unified Data Foundation
Consistent week-by-week organization across all data sources with automated intake
Quality Controls
Automated checks surface data gaps or anomalies before they impact predictions
Versioned Artifacts
All predictions and reports versioned for audit trail and performance tracking
Change Management
Phased rollout starting with high-value varieties and expanding coverage
Side-by-side comparison with manual estimates for confidence building
Regular accuracy reviews using holdout data and actual harvest results
Training sessions for planning and agronomy teams on dashboard and chat features
Results and Impact
Operational Outcomes
- MAE of 1.2% achieved versus 2% requirement
- Weekly predictions covering all fields and varieties
- Under 24 hours from data collection to predictions
- Dashboard and chat assistant in active daily use
- Average days error to DM threshold significantly reduced
- Data quality defects caught through automated validations
Financial View
- Reduced out-of-spec risk through better harvest timing
- Improved price realization from optimal maturity
- Labor cost savings from efficient scheduling
- Logistics optimization through accurate supply forecasts
- Faster decision-making reducing planning overhead
- Fewer quality claims from properly timed harvests