00:01
Once again, welcome to a new problem.
00:03
This time we're dealing with regression analysis.
00:06
For the most part, you have two options when it comes to regression.
00:11
We have simple linear regression, and then we also have multiple linear regression.
00:27
When it comes to simple linear regression, you have a single independent variable, which we call y.
00:35
Sometimes we call it the sorry not the y we call it the x a simple independent variable called the x and then we also have the x variable sometimes called the explanatory variable we could also call it the predict the predictor right variable and then we have the y which is the dependent variable or sometimes you could call it the response variable or the predicted variables.
01:16
So in terms of regression we have the x and the y and then we'll have a bunch of scatter points.
01:25
And then on top of that we estimate a line of best feet which we call the regression equation.
01:32
So the purpose of that line is to build a model off of these points, each one of which represents, say, for example, quantitative x variable that predicts quantitative y variable.
01:52
So in certain instances, the model is nonlinear.
01:57
So if you have non -linear models which don't follow a linear pattern, so in that sense you want to have what you call log the dependent variables so that you can end up having independent, so you can end up having a linear relationship.
02:24
Your goal is to find a linear relationship.
02:28
So you log the dependence variable.
02:32
So in this particular problem we're using housing data, housing data in the state of, sometimes they call it the commonwealth of massachusetts.
02:47
And in that sense, we have the log relationship between the dependent prize.
02:56
So we're using or logging the price variable, which is on the y -axis.
03:03
And the reason why you log the prize variable is so that you can build a linear relationship out of an unlinear relationship.
03:13
And so prize relates to distribution, or the distance, not the distribution, the distance that houses happen to be close to an incinerator.
03:31
So an incinerator, if you think about like a dump, like a garbage dump, where you burn stuff or something of that sort.
03:41
So there's a relationship between prize and the distance from an incinerator.
03:47
I'm assuming that if you have a house and it's close to the incinerator, then the smell or the toxicity will affect whether or not people want to buy the house.
04:06
So house prices will change depending on the distance from an incinerator.
04:13
So in this particular case, our goal is to interpret.
04:17
Our goal is to explain the coefficient on the distance from the incinerator and by coefficient we mean this value right there.
04:41
And then we are asking is the sign of the coefficient appropriate, so does it have an appropriate sign? that's what we're saying.
05:00
And then the other part of the problem in part two is saying is the estimator, is the estimator unbiased for for the simple linear regression.
05:25
We want to see if the estimator is unbiased, meaning the elasticity of surprise relative to the distance from incinerator.
05:51
So pretty much was saying how far the distance from the incinerator will affect the prize and then in part three we were asking are there possible factors that influence price other than the distance.
06:24
So potentially you could have other factors.
06:31
And is there a relationship between these factors and the distance? so is there a relationship between these factors and the distance? so in terms of simple linear regression, we're going to have a y equals to beta not plus beta 1x plus u where beta not is the y intercept and then beta 1 is the coefficient and then y is the dependent variable and then u is the independent variable and then u is the error in prediction.
07:49
Whenever you make predictions building models, it's potentially possible to have errors.
07:57
So looking at the equation, we can interpret the outcome.
08:03
So prize equals to 9 .4 0 .312 log distance.
08:14
Log of distance of the sample size in 135 and then the r squared is 0 .162 r squared is the coefficient of determination which simply means the variation in price accounted accounted for by distance so we always have it as a percentage so this would be 16 .2 % variation.
08:55
So that means potentially other factors contribute to the differences in price other than the distance.
09:03
Of course, this is the number of homes sold, or houses sold.
09:11
So that's what you're looking at.
09:13
The coefficient of determination is the variation in price terminated for by distance.
09:27
Also, it estimates how close the filtered regression is to the data points, data points.
09:56
So we're looking at these data points right here.
09:59
And then beta 1, which is a slope coefficient, explains the related between hives and distance from the incinerator, from the incinerator, assuming other independent factors are constant...