incanter
Clojure-based, R-like statistical computing and graphics environment for the JVM
Incanter: Statistical Computing and Graphics Environment for Clojure
I'm trying to import a CSV file with rows of many different lengths into Incanter using the read-dataset function. Unfortunately, it appears to truncate the rows down to the length of the first row. Short of reordering the dataset, or searching for the largest row and adding a row at the top of that width, is there a way to solve this problem? The documentation doesn't seem to offer any optional parameters to read-dataset.
Source: (StackOverflow)
I am learning Clojure - it's a lot of fun! I am trying to use Incanter and Clojure Soup in the same file:
(require '[jsoup.soup :as soup])
(use '(incanter core stats io charts datasets))
And I get the following error:
CompilerException java.lang.IllegalStateException: $ already refers to: #'jsoup.soup/$ in namespace: user, compiling
I think I understand why, but how can I solve this problem? Appreciate this website and all the gurus on it!
Thanks.
Source: (StackOverflow)
I'd like to be able to transform an individual column in an incanter data set, and save the resulting data set to a new (csv) file. What is the simplest way to do that?
Essentially, I'd like to be able to map a function over a column in the data set, and replace the original column with this result.
Source: (StackOverflow)
I'm currently looking into Clojure and Incanter as an alternative to R. (Not that I dislike R, but it just interesting to try out new languages.) I like Incanter and find the syntax appealing, but vectorized operations are quite slow as compared e.g. to R or Python.
As an example I wanted to get the first order difference of a vector
using Incanter vector operations, Clojure map and R . Below is the code and timing for all
versions. As you can see R is clearly faster.
Incanter and Clojure:
(use '(incanter core stats))
(def x (doall (sample-normal 1e7)))
(time (def y (doall (minus (rest x) (butlast x)))))
"Elapsed time: 16481.337 msecs"
(time (def y (doall (map - (rest x) (butlast x)))))
"Elapsed time: 16457.850 msecs"
R:
rdiff <- function(x){
n = length(x)
x[2:n] - x[1:(n-1)]}
x = rnorm(1e7)
system.time(rdiff(x))
user system elapsed
1.504 0.900 2.561
So I was wondering is there a way to speed up the vector operations in Incanter/Clojure? Also solutions involving the use of loops, Java arrays and/or libraries from Clojure are welcome.
I have also posted this question to Incanter Google group with no responses so far.
UPDATE: I have marked Jouni's answer as accepted, see below for my own answer where I have cleaned up his code a bit and added some benchmarks.
Source: (StackOverflow)
I'm looking to use Clojure and Incanter for processing of a large scientific dataset; specifically, the 0.5 degree version of this dataset (only available in binary format).
My question is, what recommendations do you have for elegant ways to deal with this problem in Java/Clojure? Is there a simple way to get this dataset into Incanter, or some other java matrix package?
I managed to read the binary data into a java.nio.ByteBuffer
using the following code:
(defn to-float-array [^String str]
(-> (io/to-byte-array (io/to-file str))
java.nio.ByteBuffer/wrap
(.order java.nio.ByteOrder/LITTLE_ENDIAN)))
Now, I'm really struggling with how I can begin to manipulate this ByteBuffer
as an array. I've been using Python's NumPy, which makes it very easy to manipulate these huge datasets. Here's the python code for what I'm looking to do:
// reshape row vector into (time, lat_slices, lon_slices)
// then cut out every other row
rain_data = np.fromfile("path/to/file", dtype="f")
rain_data = rain_data.reshape(24, 360, 720);
rain_data = rain_data[0:23:2,:,:];
After this slicing, I want to return a vector of these twelve arrays. (I need to manipulate them each separately as future function inputs.)
So, any advice on how to get this dataset into Incanter would be much appreciated.
Source: (StackOverflow)
After finding this enormously helpful guide in R, it got me wondering how I might do something similar in Incanter. Being relatively new to Incanter, it would be lovely if someone could reproduce this answer.
In addition to illustrating a nested model, the discussion on that answer also included some good discussion of how to iteratively generate a list of un-nested models. I'd be curious as to what is the most idiomatic way of doing that in Clojure/Incanter is.
Source: (StackOverflow)
I would like to use Clojure's Incanter, but I'd like to mix in calls to Python's extensive Numpy/Scipy numerical libraries. Is there an interoperability bridge between Incanter and Numpy that allows an embedded runtime of CPython to be run from Clojure and that interconverts Numpy's and Incanter's matrix data structures?
Jython isn't sufficient since Numpy requires CPython.
I am aware of (but have never used) http://jepp.sourceforge.net/, which allows Java programs to control an embedded CPython runtime -- but Numpy/Incanter matrix interconversion is still needed.
I'm looking for something similar to https://github.com/jolby/rincanter (which i have also not yet used) but for CPython/Numpy instead of R.
Source: (StackOverflow)
I am following the linear regression example here
(use '(incanter core stats datasets))
(def plant-growth (to-matrix (get-dataset :plant-growth) :dummies true))
(def y (sel plant-growth :cols 0))
(def x (sel plant-growth :cols [1 2]))
(def lm (linear-model y x))
However I get this error:
=> (def lm (linear-model y x))
ClassCastException clojure.lang.LazySeq cannot be cast to java.lang.Number clojure.lang.Numbers.lt (Numbers.java:219)
What is going on here?
Update: Neither does this example from the latest 1.4.1 (Stable) docs:
(use '(incanter core stats datasets charts))
(def iris (to-matrix (get-dataset :iris) :dummies true))
(def y (sel iris :cols 0))
(def x (sel iris :cols (range 1 6)))
(def iris-lm (linear-model y x)) ; with intercept term
Output:
=> (def iris-lm (linear-model y x))
ClassCastException clojure.lang.LazySeq cannot be cast to java.lang.Number clojure.lang.Numbers.lt (Numbers.java:219)
I'm using Clojure 1.5.1 and Incanter 1.4.1. Is this a bug that needs fixing? Where can I find authoritative, working examples?
Source: (StackOverflow)
I'm trying to implement a simple logistic regression example in Clojure using the Incanter data analysis library. I've successfully coded the Sigmoid and Cost functions, but Incanter's BFGS minimization function seems to be causing me quite some trouble.
(ns ml-clj.logistic
(:require [incanter.core :refer :all]
[incanter.optimize :refer :all]))
(defn sigmoid
"compute the inverse logit function, large positive numbers should be
close to 1, large negative numbers near 0,
z can be a scalar, vector or matrix.
sanity check: (sigmoid 0) should always evaluate to 0.5"
[z]
(div 1 (plus 1 (exp (minus z)))))
(defn cost-func
"computes the cost function (J) that will be minimized
inputs:params theta X matrix and Y vector"
[X y]
(let
[m (nrow X)
init-vals (matrix (take (ncol X) (repeat 0)))
z (mmult X init-vals)
h (sigmoid z)
f-half (mult (matrix (map - y)) (log (sigmoid (mmult X init-vals))))
s-half (mult (minus 1 y) (log (minus 1 (sigmoid (mmult X init-vals)))))
sub-tmp (minus f-half s-half)
J (mmult (/ 1 m) (reduce + sub-tmp))]
J))
When I try (minimize (cost-func X y) (matrix [0 0]))
giving minimize
a function and starting params the REPL throws an error.
ArityException Wrong number of args (2) passed to: optimize$minimize clojure.lang.AFn.throwArity (AFn.java:437)
I'm very confused as to what exactly the minimize function is expecting.
For reference, I rewrote it all in python, and all of the code runs as expected, using the same minimization algorithm.
import numpy as np
import scipy as sp
data = np.loadtxt('testSet.txt', delimiter='\t')
X = data[:,0:2]
y = data[:, 2]
def sigmoid(X):
return 1.0 / (1.0 + np.e**(-1.0 * X))
def compute_cost(theta, X, y):
m = y.shape[0]
h = sigmoid(X.dot(theta.T))
J = y.T.dot(np.log(h)) + (1.0 - y.T).dot(np.log(1.0 - h))
cost = (-1.0 / m) * J.sum()
return cost
def fit_logistic(X,y):
initial_thetas = np.zeros((len(X[0]), 1))
myargs = (X, y)
theta = sp.optimize.fmin_bfgs(compute_cost, x0=initial_thetas,
args=myargs)
return theta
outputting
Current function value: 0.594902
Iterations: 6
Function evaluations: 36
Gradient evaluations: 9
array([ 0.08108673, -0.12334958])
I don't understand why the Python code can run successfully, but my Clojure implementation fails. Any suggestions?
Update
rereading the docstring for minimize
i've been trying to calculate the derivative of cost-func
which throws a new error.
(def grad (gradient cost-func (matrix [0 0])))
(minimize cost-func (matrix [0 0]) (grad (matrix [0 0]) X))
ExceptionInfo throw+: {:exception "Matrices of different sizes cannot be differenced.", :asize [2 1], :bsize [1 2]} clatrix.core/- (core.clj:950)
using trans
to convert the 1xn col matrix to a nx1 row matrix just yields the same error with opposite errors.
:asize [1 2], :bsize [2 1]}
I'm pretty lost here.
Source: (StackOverflow)
I'm trying to include a legend in an Incanter chart, but I'm having some troubles getting what I want:
I want to be able to instantiate a chart with no data first (using [] []
as my x y arguments), then add the data points in a separate step. However the only way to add a legend is to specify :legend true
after the initial x y points are given in the constructor. Cannot specify :legend true
without x y arguments, and I have not found any add-legend
function.
The legend option captures the code I use when adding the chart data, which means if I don't want ugly code to appear in the legend I have to create a nice-looking vars for the X and Y points, rather than just calling a function in line.
Therefore the legend that is created includes the [][]
used when creating the blank plot, it includes the function calls used when getting the data for the points, and it includes the name-mangled anonymous function (fn*[p1__3813#](second p1__3813#))
which is non-communicative to consumers of my chart.
I just want to be able to associate a string with each group of points in the legend like in matlab, excel, etc.
Here is my current code;
(def lux-ratios-plot
(doto (scatter-plot [] [] :legend true
:title "Lux/CH0 vs. CH1/CH0"
:x-label "CH1/CH0"
:y-label "Lux/CH0")
(view)))
(doseq [dut [incs hals cfls leds]]
(add-points lux-ratios-plot (get-vals :CH1/CH0 dut) (get-vals :Lux/CH0 dut) :points true))
; Show the trend line for each bulb
(doseq [fit [inc-fit hal-fit cfl-fit led-fit]]
(add-lines lux-ratios-plot (map #(second %) (:x fit)) (:fitted fit)))
Therefore is there any way in Incanter plots to specify a legend string with each (add-lines ...)
or (add-points ...)
call?
Thanks a lot
Michael
Source: (StackOverflow)
I'm using Incanter and Parallel Colt for a project, and need to have a function that returns the modified Bessel function of an order n for a value v.
The Colt library has two methods for order 0 and order 1, but beyond that, only a method that return the Bessel function of order n for a value v (cern.jet.math.tdouble.Bessel/jn).
I'm trying to build the R function, dskellam(x,lambda1, lambda2) for the Skellam distribution, in Clojure/Java
Is there something I can do with the return value of the Bessel method to convert it to a modified Bessel?
Source: (StackOverflow)
New to Incanter, and was wondering what a vectorized solution to creating a matrix based on the results of the pair-wise product of two lists, would look like. To be clearer, I have two lists that I create with
(def x (pdf-poisson (range 4) :lambda 2.2))
(def y (pdf-poisson (range 4) :lambda 1.5)).
I would now like a 4x4 matrix M such that M(1,1) is the product of x(1) and y(1), M(1,2) is the product of x(1) and y(2) etc.
Taking the outer product in Octave is easy, so was hoping that Incanter supported this also.
I can easily hand code this by mapping a function across the vectors, but wanted an idiomatic or vectorized approach, if that is possible.
Thanks,
JT
Source: (StackOverflow)
Does anyone know how to display an incanter chart to jpanel without reverting to jfreechart?
Source: (StackOverflow)
How do I use the random number generators in Parallel Colt from incanter?
I've listed these dependencies in my project.clj file:
:dependencies [[org.clojure/clojure "1.2.0"]
[org.clojure/clojure-contrib "1.2.0"]
[incanter/core "1.2.3"]
[incanter/parallelcolt "0.9.4"]]
And then I tried (import cern.jet.random.tdouble Normal) and I get a class java.lang.ClassNotFoundException.
What am I doing wrong here?
Source: (StackOverflow)