Authors: Amit Kumar, Mohmmad Darvesh, Rohit Kumar Singh
Abstract: This study investigates the impact of training dataset size on the performance of logistic regression models across three standard datasets: Iris, Breast Cancer, and Titanic. By gradually increasing training proportions, we evaluate the resulting accuracy trends. Results reveal that while model accuracy improves with more data, the marginal benefit diminishes past a threshold, particularly in simpler datasets. These findings inform data collection practices and highlight dataset complexity as a crucial determinant of model performance.
DOI:
International Journal of Science, Engineering and Technology