First, what do people mean when they use the term “January Effect” regarding a stock investment? As defined by a legitimate source (Wikipedia…) “The January effect is a hypothesis that there is a seasonal anomaly in the financial market where securities’ prices increase in the month of January more than in any other month.” Believers support this hypothesis because they say investors sell in December for tax harvesting reasons, and then buy back the stocks in January. I don’t believe it, and neither should you. Actually, at one point, it may have been true. However, once it became public information, any “effect” would have been eroded.

I performed my own January experiment, though I’m sure somewhere out there in the myriad of financial articles someone already beat me to it. This is a behavior finance study. If stocks, and by stocks I mean the S&P 500, are down for the month of January, do investors “flee” from stocks, and therefore stocks have a bad eleven months?

Let’s start with a simple linear regression. We will have two variables. The dependent variable (Y) will be the return of the accumulated eleven months *after* January. The explanatory variable (X), will be the return for January. We will use data from Yahoo finance and analyze only the price movement, which means we ignore dividends. The time period will be from 1980 to 2015. Below are the results.

*Residuals:*

* Min * *1Q Median 3Q * *Max*

*-0.43070 -0.06902 0.01152 0.09488 0.28261*

*Coefficients:*

* Estimate * *Std. Error * *t value * *Pr(>|t|) *

*(Intercept) 0.08493* *0.02558 * *3.320 * *0.00216 ***

*janreturn**0.14626 0.51329 **0.285 **0.77741 *

*—*

*Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1*

*Residual standard error: 0.1518 on 34 degrees of freedom*

*Multiple R-squared: 0.002383, Adjusted R-squared: -0.02696*

*F-statistic: 0.0812 on 1 and 34 DF, p-value: 0.7774*

What does the statistics output and graph tell us? First let’s review the assumptions that must hold true when conducting a regression test. There are four of them:

- Linearity between variables – For example, if X increases by 1 percentage point, then Y increases by 2 percentage points. A graph is a good way to quickly tell if there is a linear relationship.
- Normal distribution of errors – The errors are random and stay within a certain range.
- Errors are independent – The errors do not exhibit a pattern. This can be shown through a graph.
- Equal variance – The errors must have the same variance.

From the above graph, we see there is minimum linear relationship between the X and Y variables. If you were to draw a “least-squares” line, it would probably be very flat and go through the zero on the Y-axis. Therefore, one assumption is already violated. The other three assumptions involve the error of the regression. There are actually multiple ways to display the error. Below are the graphs.

There is the residual (aka error; this is calculated as the actual observation of Y subtracted by the predicted value of Y). This value is on the Y axis of three of the four graphs. The other value is called the “studentized” residual. Simply stated, this helps show any outlier errors (observations). If a point lies outside of the + / – of 3 then we may be concerned. Especially since we have a relatively same sample size of 35. Now, what we are looking for is a “cloud” of points. In three of the four graphs, that is exactly what we see. However, the first graph is actually a very straight line. This shows that when the residuals/errors are plotted again the Y variable, a pattern is corrected. This violates another assumption – equal variance. You can tell that as the Y value increases, so does the residual/error.

We now have at least two violations of the four assumptions. This means that the variables are not accurate for a regression. In other words, the return of January does not predict the rest of the year’s return. We can see this statistically in the output. For the janreturn (X) coefficient (the row is bolded), the t stat is below the absolute value of 2. This means that we are not 95% confident that January’s return predicts the cumulative returns for the other months. In other words, this regression is not statistically significant – it does not explain anything!

Let’s not stop there. We tried a simple regression. Now let’s try a logistic regression. This is used when we want our Y variable to be binary – 0 or 1. We will classify 0 to mean that the cumulative returns between February and December are negative or 0. We will classify 1 to mean that the cumulative returns are above 0, or positive. Our X variable will remain the same. Below is the output.

*Deviance Residuals:*

* Min 1Q Median * *3Q Max *

*-1.333 1.068 1.144 1.187 1.335 *

*Coefficients:*

* * *Estimate * *Std. Error z value Pr(>|z|) *

*(Intercept) 1.254262 0.405750 3.091 * *0.00199 ***

*janconvert -0.001984 0.081380 -0.024**0.98055 *

*—*

*Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1*

*(Dispersion parameter for binomial family taken to be 1)*

*Null deviance: 38.139 on 35 degrees of freedom*

*Residual deviance: 38.138 on 34 degrees of freedom*

*AIC: 42.138*

*Number of Fisher Scoring iterations: 4*

The result is the same as before. The coefficient of our X variable is not statistically significant. For this study, we can use the z stat and the t stat interchangeably.

Okay, one last shot, still using logistic regression. Let’s convert our X variable into a binary variable, similar to our Y variable from our second regression. Instead of caring about what the actual return is in January, we just care if it’s positive or negative. Any luck?

*Deviance Residuals:*

*Min 1Q Median * *3Q Max *

*-1.8465 0.6335 0.6335 0.8203 0.8203 *

*Coefficients:*

* * *Estimate Std. Error z value Pr(>|z|)*

*(Intercept) * *0.9163 0.5916 1.549* *0.121*

*januaryyesorno 0.5878 0.8097 0.726**0.468*

*(Dispersion parameter for binomial family taken to be 1)*

*Null deviance: 38.139 on 35 degrees of freedom*

*Residual deviance: 37.614 on 34 degrees of freedom*

*AIC: 41.614*

*Number of Fisher Scoring iterations: 4*

Unfortunately, again there is no statistical evidence that the return of January accurately predicts the return for the rest of the year.

I know I said that was our last try, but I want to do one last trick to see if we can find some significance. I will assume no intercept for our last regression. Below is the output.

*Deviance Residuals:*

*Min 1Q Median * *3Q Max *

*-1.8465 0.6335 0.6335 1.1774 1.1774 *

*Coefficients:*

* * *Estimate Std. Error z value Pr(>|z|) *

*januaryyesorno 1.5041 0.5528 2.721 0.00651 ***

*—*

*Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1*

*(Dispersion parameter for binomial family taken to be 1)*

*Null deviance: 49.907 on 36 degrees of freedom*

*Residual deviance: 40.270 on 35 degrees of freedom*

*AIC: 42.27*

*Number of Fisher Scoring iterations: 4*

Look at that!! We discovered the key to the market! This equation is technically statistically significant, which means the January return explains the rest of the year. We will make millions together!!!

But wait a minute.

Actually, all this proves is that you can tinker with data until you find something you like, aka data mining. Would you trust your hundreds of thousands or millions of dollars on the above equation? Neither would I.

This leads into my next article that will talk about how 2+2 may not always equal 4, meaning numbers can lie. Or rather – numbers can be deceitful.

As always, this if for informational and educational purposes. Nothing contained herein constitutes tax, legal, insurance or investment advice, or the recommendation of or an offer to sell, or the solicitation of an offer to buy or invest in any investment product, vehicle, service or instrument. What has worked in the past may not work in the future. Past returns do not guarantee future results.