I want to build a regression with multiple variables. In my data, I have n = 23
variables and m = 13000
training examples. Here is the schedule of my training data (apartment area vs price):
Here, the graph shows 13,000 training data. As you can see, this is quite noisy data. My question is: which regression algorithm is more suitable and justified for use in my case. I mean it is logical to use simple linear regression or is it better to use some non-linear regression algorithm.
For clarity, I will give examples. Here is an abstract example of linear regression:
As well as an abstract example of non-linear regression:
Here are some examples with hypothetical regression lines for my data:
As far as I understand, the primitive linear regression for my data will produce a large total error (error cost), since this data is noisy and scattered. On the other hand, there is also no clear non-linear dependence (for example, sinusoidal). Which regression algorithm is more rational to use in my case (apartment prices) in order to get a more accurate prediction of prices. And why is this algorithm (linear or non-linear) more rational?
Addition:
Here is my graph of linear dependence of price on all 23 parameters:
I do not know what the NONLINEAR DEPENDENCE would look like in this case. And it would be more rational than linear.