Designing an efficient gradient descent based heuristic for clusterwise linear regression for large datasets

Name: Designing an efficient gradient descent based heuristic for clusterwise linear regression for large datasets
Author: Kayış, Enis

İsim	Designing an efficient gradient descent based heuristic for clusterwise linear regression for large datasets
Yazar	Kayış, Enis
Basım Tarihi:	2021
Basım Yeri	- Springer
Konu	Clusterwise linear regression, Gradient descent, Heuristics
Tür	Belge
Dil	İngilizce
Dijital	Evet
Yazma	Hayır
Kütüphane:	Özyeğin Üniversitesi
Demirbaş Numarası	978-303083013-7
Kayıt Numarası	82be1ea6-f47c-4fb6-a638-2e69adf99993
Lokasyon	Industrial Engineering
Tarih	2021
Örnek Metin	Multiple linear regression is the method of quantifying the effects of a set of independent variables on a dependent variable. In clusterwise linear regression problems, the data points with similar regression estimates are grouped into the same cluster either due to a business need or to increase the statistical significance of the resulting regression estimates. In this paper, we consider an extension of this problem where data points belonging to the same category should belong to the same partition. For large datasets, finding the exact solution is not possible and many heuristics requires an exponentially increasing amount of time in the number of categories. We propose variants of gradient descent based heuristic to provide high-quality solutions within a reasonable time. The performances of our heuristics are evaluated across 1014 simulated datasets. We find that the comparative performance of the base gradient descent based heuristic is quite good with an average percentage gap of 0.17 % when the number of categories is less than 60. However, starting with a fixed initial partition and restricting cluster assignment changes to be one-directional speed up heuristic dramatically with a moderate decrease in solution quality, especially for datasets with a multiple number of predictors and a large number of datasets. For example, one could generate solutions with an average percentage gap of 2.81 % in one-tenth of the time for datasets with 400 categories.
DOI	10.1007/978-3-030-83014-4_8
Cilt	1446

Kaynağa git Özyeğin Üniversitesi

Aramaya Dön

Özyeğin Üniversitesi

Kaynağa git

Designing an efficient gradient descent based heuristic for clusterwise linear regression for large datasets

Yazar Kayış, Enis

Basım Tarihi 2021

Basım Yeri - Springer

Konu Clusterwise linear regression, Gradient descent, Heuristics

Tür Belge

Dil İngilizce

Dijital Evet

Yazma Hayır

Kütüphane Özyeğin Üniversitesi

Demirbaş Numarası 978-303083013-7

Kayıt Numarası 82be1ea6-f47c-4fb6-a638-2e69adf99993

Lokasyon Industrial Engineering

Tarih 2021

Örnek Metin Multiple linear regression is the method of quantifying the effects of a set of independent variables on a dependent variable. In clusterwise linear regression problems, the data points with similar regression estimates are grouped into the same cluster either due to a business need or to increase the statistical significance of the resulting regression estimates. In this paper, we consider an extension of this problem where data points belonging to the same category should belong to the same partition. For large datasets, finding the exact solution is not possible and many heuristics requires an exponentially increasing amount of time in the number of categories. We propose variants of gradient descent based heuristic to provide high-quality solutions within a reasonable time. The performances of our heuristics are evaluated across 1014 simulated datasets. We find that the comparative performance of the base gradient descent based heuristic is quite good with an average percentage gap of 0.17 % when the number of categories is less than 60. However, starting with a fixed initial partition and restricting cluster assignment changes to be one-directional speed up heuristic dramatically with a moderate decrease in solution quality, especially for datasets with a multiple number of predictors and a large number of datasets. For example, one could generate solutions with an average percentage gap of 2.81 % in one-tenth of the time for datasets with 400 categories.

DOI 10.1007/978-3-030-83014-4_8

Cilt 1446