CC BY-NC-ND 3.0
Expliquer / prévoir la température en focntion des variables disponibles dans les données du climat.
## 'data.frame': 465540 obs. of 13 variables:
## $ year : int 2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 ...
## $ month : int 10 10 10 10 10 10 10 10 10 10 ...
## $ day : int 22 22 22 22 22 22 22 22 22 22 ...
## $ hour : int 17 17 18 18 18 18 18 18 18 18 ...
## $ minute : int 58 59 0 1 2 3 4 5 6 7 ...
## $ second : int 1 26 29 31 33 36 38 41 43 45 ...
## $ temperature : num 10.8 10.8 10.8 10.8 10.7 ...
## $ gas : int 11811196 11682953 11569892 11582347 11582347 11582347 11508021 11582347 11508021 11594828 ...
## $ humidity : num 80.3 80 79.7 79.8 80.6 ...
## $ pressure : num 1040 1040 1040 1040 1040 ...
## $ lightVisible: int 264 266 263 267 266 266 264 264 262 264 ...
## $ lightIR : int 318 313 306 307 308 307 298 290 285 283 ...
## $ lightUV : int 4 5 3 4 5 5 4 4 3 4 ...
## 'data.frame': 465540 obs. of 8 variables:
## $ temperature : num 10.8 10.8 10.8 10.8 10.7 ...
## $ gas : int 11811196 11682953 11569892 11582347 11582347 11582347 11508021 11582347 11508021 11594828 ...
## $ humidity : num 80.3 80 79.7 79.8 80.6 ...
## $ pressure : num 1040 1040 1040 1040 1040 ...
## $ lightVisible: int 264 266 263 267 266 266 264 264 262 264 ...
## $ lightIR : int 318 313 306 307 308 307 298 290 285 283 ...
## $ lightUV : int 4 5 3 4 5 5 4 4 3 4 ...
## $ date : POSIXct, format: "2018-10-22 17:58:01" "2018-10-22 17:59:26" ...
## corrplot 0.84 loaded
##
## Call:
## lm(formula = temperature ~ . - lightIR - lightUV, data = bdd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.620 -1.795 0.056 1.797 16.271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.285e+01 1.211e+00 51.91 <2e-16 ***
## gas -3.320e-07 1.787e-09 -185.80 <2e-16 ***
## humidity -1.514e-01 3.290e-04 -460.06 <2e-16 ***
## pressure -2.274e-01 2.863e-04 -794.22 <2e-16 ***
## lightVisible 1.551e-02 1.242e-04 124.90 <2e-16 ***
## date 1.223e-07 6.743e-10 181.33 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.828 on 452600 degrees of freedom
## (12934 observations deleted due to missingness)
## Multiple R-squared: 0.8901, Adjusted R-squared: 0.8901
## F-statistic: 7.33e+05 on 5 and 452600 DF, p-value: < 2.2e-16
## Start: AIC=940946.4
## temperature ~ (gas + humidity + pressure + lightVisible + lightIR +
## lightUV + date) - lightIR - lightUV
##
## Df Sum of Sq RSS AIC
## <none> 3618982 940946
## - lightVisible 1 124731 3743713 956281
## - date 1 262907 3881888 972685
## - gas 1 276027 3895008 974212
## - humidity 1 1692417 5311399 1114592
## - pressure 1 5043798 8662780 1335998
##
## Call:
## lm(formula = temperature ~ (gas + humidity + pressure + lightVisible +
## lightIR + lightUV + date) - lightIR - lightUV, data = bdd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.620 -1.795 0.056 1.797 16.271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.285e+01 1.211e+00 51.91 <2e-16 ***
## gas -3.320e-07 1.787e-09 -185.80 <2e-16 ***
## humidity -1.514e-01 3.290e-04 -460.06 <2e-16 ***
## pressure -2.274e-01 2.863e-04 -794.22 <2e-16 ***
## lightVisible 1.551e-02 1.242e-04 124.90 <2e-16 ***
## date 1.223e-07 6.743e-10 181.33 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.828 on 452600 degrees of freedom
## (12934 observations deleted due to missingness)
## Multiple R-squared: 0.8901, Adjusted R-squared: 0.8901
## F-statistic: 7.33e+05 on 5 and 452600 DF, p-value: < 2.2e-16
## Warning: package 'car' was built under R version 3.6.1
## Loading required package: carData
## gas humidity pressure lightVisible date
## 1.092252 2.156110 2.033450 1.295525 1.919477
Residuals vs Fitted
Il semble y avoir un patron dans les données avec des résidus non centrés sur zéro pour les températures froides et chaudes. Il pourrait y avoir une relation non-linéaire entre des variables explicatives et la variable à expliquer.
Normal Q-Q
La distribution des résidus n’est pas Normale pour certains points qui semblent correspondre aux températures chaudes (368431, 369705).
Scale-Location
L’égalité de la variance ne semble pas respectée pour les valeurs de température chaudes (et dans une moindre mesure froides).
Residuals vs Leverage
Il ne semble pas y avoir de points extrèmes influançant significativement la régression.
##
## Shapiro-Wilk normality test
##
## data: resid(mod0X)
## W = 0.99051, p-value = 6.412e-06
## Warning: package 'lmtest' was built under R version 3.6.1
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Durbin-Watson test
##
## data: mod0X
## DW = 1.9013, p-value = 0.06184
## alternative hypothesis: true autocorrelation is greater than 0
## temperature gas humidity pressure lightVisible lightIR lightUV
## 368431 46.08605 2898707 19.30132 984.5260 386 1877 69
## 369705 41.45754 9149660 24.65980 983.9841 364 1782 57
## 220672 13.62492 5797414 35.88843 1005.4435 442 2483 99
## date
## 368431 1563961520
## 369705 1564041769
## 220672 1554459246
getNiceLmPlot <- function(myX, myY, myYlim = c(-10, 50)){
mod041 <- lm(myY ~ myX)
myCol <- colorRampPalette(c("green", "blue", "red"))(101)
colRank <- (myY[!is.na(myX)] - predict(mod041))^2
colRank <- round((colRank - min(colRank)) /
(max(colRank) - min(colRank)) * 100) + 1
plot(x = myX[!is.na(myX)], y = myY[!is.na(myX)],
col = myCol[colRank], pch = 16,
axes = FALSE, ylim = myYlim, panel.first = {
grid()
axis(1)
axis(2)
})
points(
x = myX[!is.na(myX)][order(myX[!is.na(myX)])],
y = predict(mod041)[order(myX[!is.na(myX)])],
type = 'l', lwd = 2)
}
##
## Shapiro-Wilk normality test
##
## data: resid(mod0X3)
## W = 0.99315, p-value = 0.0001889
##
## Call:
## lm(formula = temperature ~ . - lightIR - lightUV - date - pressure,
## data = bdd[sampleP, ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.5021 -3.5643 0.2238 3.7132 15.7982
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.724e+01 1.761e+00 21.151 <2e-16 ***
## gas -7.563e-07 6.314e-08 -11.979 <2e-16 ***
## humidity -3.508e-01 9.241e-03 -37.964 <2e-16 ***
## lightVisible 1.097e-02 4.616e-03 2.376 0.0177 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.892 on 968 degrees of freedom
## (28 observations deleted due to missingness)
## Multiple R-squared: 0.6893, Adjusted R-squared: 0.6883
## F-statistic: 715.8 on 3 and 968 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = temperature ~ . - lightIR - lightUV - date - pressure,
## data = bdd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.6116 -3.6318 0.3265 3.4646 17.9590
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.653e+01 8.336e-02 438.24 <2e-16 ***
## gas -7.753e-07 3.019e-09 -256.81 <2e-16 ***
## humidity -3.419e-01 4.393e-04 -778.29 <2e-16 ***
## lightVisible 1.124e-02 2.153e-04 52.18 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.955 on 452602 degrees of freedom
## (12934 observations deleted due to missingness)
## Multiple R-squared: 0.6626, Adjusted R-squared: 0.6626
## F-statistic: 2.962e+05 on 3 and 452602 DF, p-value: < 2.2e-16
La température semblerait être liée linéairement à l’humidité relative, aux composés organiques volatiles, à la quantité de lumière, mais non-linéairement à la pression atmosphérique et à la date (pour ce dernier c’était évident).
On ne peut pas tout expliquer avec le modèle linéaire !