Tuesday, June 19, 2012

Head First Data Analysis - Regression: Prediction using Clojure

I’ve been reading the excellent book from the Head First Series: Head First Data Analysis. The chapter on regression contains a problem been solved in the book using R. I have translated the solution to clojure using Incanter.

The problem is pretty simple. We have a dataset that contains information about the raises given to employees, the raise requested, if they negotiated the raise, the gender and the year. People want to know what to ask for. And they want to know what they’ll get, given what they’ve asked for. And the regression line predicts what raises people will receive.

Here is the plot with the data points, each data point is a person that negotiated their salary. In the X-axis is how much they requested for a raised and in the Y-axis how much they received. And in Blue is the Regression Line.

Here is the code:
(use '(incanter core stats charts io))
(def salaries (read-dataset "http://www.headfirstlabs.com/books/hfda/hfda_ch10_employees.csv" :header true))
(def salary-plot (scatter-plot
;; Select the col requested where the negotiaded col is TRUE
(-> (sel salaries :filter #(= (nth % 3) "TRUE"))
(sel :cols :requested ))
(-> (sel salaries :filter #(= (nth % 3) "TRUE"))
(sel :cols :received ))))
(view salary-plot)
;;; Calculate the correlation coefficient
(correlation (-> (sel salaries :filter #(= (nth % 3) "TRUE"))
(sel :cols :requested ))
(-> (sel salaries :filter #(= (nth % 3) "TRUE"))
(sel :cols :received )))
;;; => 0.6656481025557299
;;; Calculate the Linear Model
(def salary-lm (linear-model
(-> (sel salaries :filter #(= (nth % 3) "TRUE"))
(sel :cols :requested ))
(-> (sel salaries :filter #(= (nth % 3) "TRUE"))
(sel :cols :received ))))
;;; The Coefficients
(:coefs salary-lm)
;;; => (3.0297198624654484 0.6110990511023138)
;; add regression line to scatter plot
;; define the function, y = 3.02 + 0.61 x
(defn reg-line [x] (+ 3.02 (* 0.61 x)))
(add-function salary-plot reg-line 0 22)
view raw gistfile1.clj hosted with ❤ by GitHub

No comments:

Post a Comment