Skill Demand Forecasting and Salary Prediction: A Multi-Granularity Analysis Using XGBoost

1 May

Authors: Md Zahidul Islam Sany, Wubo Zhang

Abstract: The rapid evolution of the labour market makes it difficult for job seekers, employers, and policymakers to anticipate which skills will be in demand and what salaries to expect. Traditional forecasting methods often fail when faced with large-scale, sparse, and non-linear job advertisement data. In this paper, we address two interconnected problems: forecasting monthly skill demand at multiple granularities (company, region, and occupation levels) and predicting salaries from job attributes. Using real job postings collected between 2021 and 2023, we construct a comprehensive dataset containing millions of rows of skill demand time series. We apply XGBoost with carefully engineered features – 12 lagged values, a rolling 3-month average, and month indicators – to predict future demand. Because many months have zero demand, we evaluate performance separately on non-zero months. Our model achieves a Symmetric Mean Absolute Percentage Error (SMAPE) of 10.01% on active demand, demonstrating excellent predictive accuracy when a skill is actually needed. For salary prediction, we use job titles, locations, experience levels, and vacancy volume, obtaining an R² of 0.164 – modest but better than a baseline mean prediction. Beyond forecasting, we provide feature importance analysis (the rolling average is the strongest predictor), granularity comparisons (occupation-level forecasts are most accurate), clustering of jobs into four distinct market segments, and correlation analysis (experience correlates most strongly with salary). All code and processed data are publicly available to ensure full reproducibility

DOI: https://zenodo.org/records/19950109