EzDevInfo.com

SpookyJS

Drive CasperJS from Node.js

How to inject script in SpookyJS?

I am trying to inject punycode script in my SpookyJS program. But it is not working.

try {
    var Spooky = require('spooky');
} catch (e) {
    var Spooky = require('../lib/spooky');
}

var spooky = new Spooky({
    child: {
        transport: 'http'
    },
    casper: {
        logLevel: 'debug',
        verbose: true,
        options: {
            clientScripts: ["punycode.js"]
        }
    }
}, function (err) {
    if (err) {
        e = new Error('Failed to initialize SpookyJS');
        e.details = err;
        throw e;
    }

    spooky.start("http://www.google.com");

    spooky.then(function(){
        this.evaluate(function() {
            console.log("testing");
            var x = punycode.encode("hi");
            console.log("x: "+x);
        });
    });

    spooky.run();
});

spooky.on('error', function (e, stack) {
    console.error(e);
    if (stack) {
        console.log(stack);
    }
});

spooky.on('console', function (line) {
    console.log(line);
});

spooky.on('remote.message', function(message) {
    console.log('[Inside Evaluate] ' + message);
});

I don't see x value in console output.

$ node test.js 
[info] [phantom] Starting...
[info] [phantom] Running suite: 3 steps
[debug] [phantom] opening url: http://www.google.com/, HTTP GET
[debug] [phantom] Navigation requested: url=http://www.google.com/, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] Navigation requested: url=http://www.google.co.kr/?gfe_rd=cr&ei=y1EKVPjzK6eL8Qfl4oDoBA, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "http://www.google.co.kr/?gfe_rd=cr&ei=y1EKVPjzK6eL8Qfl4oDoBA"
[debug] [phantom] Successfully injected Casper client-side utilities
[debug] [phantom] start page is loaded
[info] [phantom] Step anonymous 3/3 http://www.google.co.kr/?gfe_rd=cr&ei=y1EKVPjzK6eL8Qfl4oDoBA (HTTP 200)
[Inside Evaluate] testing
[info] [phantom] Step anonymous 3/3: done in 1293ms.
[info] [phantom] Done 3 steps in 1311ms

Any idea why it isn't working?


Source: (StackOverflow)

How to call a function inside SpookyJS?

I have a function called clickMore:

function clickMore(max, i){
   i = i || 0;
   if ((max == null || i < max) && this.visible(moreButton)) { // synchronous
      // asynchronous steps...
      this.thenClick(moreButton);  // sometimes the click is not properly dispatched
      this.echo('click');
      this.waitUntilVisible(loadingButton);
      this.waitUntilVisible(moreButton, null, function onTimeout(){
         // only placeholder so that the script doesn't "die" here if the end is reached
      });
      this.then(function(){

         //this.capture("business_"+i+".png");   //captures a screenshot of the page
         clickMore.call(this, max, i+1); // recursion
      });
   }
}

I would like to call that function from spooky here:

spooky.then(function(){
              clickMore.call(spooky);
          })

I've looked through the Spooky docs, and know that I'll probably need to use a function tuple, but not sure how to implement. How can I go about doing this?

UPDATE:

Tried using a function tuple from the SpookyJS documentation with no luck:

spooky.then([{
   clickMore: clickMore
}, function(){
    clickMore.call(spooky);
}]);

Source: (StackOverflow)

Advertisements

How to load a webpage after spooky.js is initialized

Is it possible to chain missions to spookyjs after initialization? Something like this:

Spooky = require('spooky');
var spooky = new Spooky({
    child: {
        transport: 'http'
    },
    casper: {
        logLevel: 'debug',
        verbose: true
    }
}, function (err) {
    if (err) {
        e = new Error('Failed to initialize SpookyJS');
        e.details = err;
        throw e;
    }

    spooky.start(
        'http://en.wikipedia.org/wiki/Spooky_the_Tuff_Little_Ghost');
    spooky.run();
});

spooky.thenOpen('http://google.com', function () {
    console.log('bla');
});

spooky.on('console', function (line) {
    console.log(line);
});

But when I do this I get this error:

TypeError: Object # has no method 'thenOpen'

Source: (StackOverflow)

Express & Casper: Can't set headers after they are sent

I'm getting a Can't set headers after they are sent error when trying to run a SpookyJS (a driver for CasperJS) script after posted to a URL. I've found several other posts about people running this issue with Express and it has to do with the headers being sent multiple times and stuff. I'm just not sure how this relates with what I'm doing here. I have to have res.send to send the status of the request, right? Because if I don't have it, the form doesn't post.

Any ideas what I'm doing wrong here?

app.post('/submit', function (req, res) {
    // Send callback
    res.send(req.status);

    var spooky = require('spooky').create({   
        verbose: true,
        logLevel: 'debug',
        userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.172 Safari/537.22',
        pageSettings: {
            loadImages:  false,
            loadPlugins: false
        }
    });

    spooky.start('http://google.com', function () {
        this.echo(this.getTitle());
    });

    // Run Spooky
    spooky.run();
});

Source: (StackOverflow)

spookyjs stops running without error

I'm building a tool that logs into a website and visits a high number of pages that are listed in an array. Everytime I run this CasperJS seems to hang when visiting the 36th link. I tried removing the 36th link, but then it just hangs at the next one.

Could it be a memory problem? When CasperJS hangs in the debug log there is no error. When I run top on the server I'm not seeing any phantomJS processes running anymore.

spooky.then([{user: account.user, pass: account.pass, urls: urls}, function(){
    this.wait(2000, function() {
        this.fill(".signin-form", {
          email: user,
          password: pass
           }, true);
        var i = 0;
         var spookyObj = this
         function visitPages () {
            spookyObj.wait(5000, function(){
              spookyObj.thenOpen(urls[i], function(url){
               spookyObj.emit('visitedURL', url[i]);
               i ++
               if (i < urls.length) visitPages();
              });
            });
          }
          visitPages();
    });

}]);

Debug log

[debug] [phantom] url changed to "<URL>"                                                          
[debug] [phantom] Navigation requested: url=about:blank, type=Other,    willNavigate=true, isMainFrame=false                                         
[debug] [phantom] Navigation requested: url=<URL>, type=Other, willNavigate=true, isMainFrame=false    

Source: (StackOverflow)

ReferenceError: Can't find variable with SpookyJS

I try to call an external function in SpookyJS by doing the same thing than in the wiki: https://github.com/WaterfallEngineering/SpookyJS/wiki/Introduction

But when I try the following code, I have this error:

ReferenceError: Can't find variable: test

try {
    var Spooky = require('spooky');
} catch (e) {
    var Spooky = require('../lib/spooky');
}

var urls = ["http://www.google.fr",
            "http://www.yahoo.com"
          ];

exports.clicker = function(req, res)
{
  console.log("FIRST: " + visitUrl + " \n\n\n END FIRST");


  var visitUrl = function(urlIndex, nbClicked)
  {
      console.log("HELLO");
  };

  var spooky = new Spooky(
    {
      child: {
        // transport: 'http'
      },
      casper: {
        logLevel: 'debug',
        verbose: true
      }
    }, function (err)
    {
      if (err)
      {
        e = new Error('Failed to initialize SpookyJS');
        e.details = err;
        throw e;
      }

      spooky.start(urls[0]);

      console.log("SECOND: " + visitUrl + " \n\n\n END SECOND");

      spooky.then([{
        test: visitUrl
      }, function(){

        console.log("THIRD: " + test + " \n\n\n END THIRD");
      }]);

      spooky.run();
    });

    // Uncomment this block to see all of the things Casper has to say.
    // There are a lot.
    // He has opinions.
    spooky.on('console', function (line) {
      console.log(line);
    });

    spooky.on('hello', function (greeting) {
      console.log(greeting);
    });

    spooky.on('log', function (log) {
      if (log.space === 'remote') {
        console.log(log.message.replace(/ \- .*/, ''));
      }
    });
}

These two following logs work:

console.log("FIRST: " + visitUrl + " \n\n\n END FIRST");
console.log("SECOND: " + visitUrl + " \n\n\n END SECOND");

But the third one is responsible for the error message:

console.log("THIRD: " + test + " \n\n\n END THIRD");

Any suggestion?


Source: (StackOverflow)

SpookyJS hangs if there is any non-english characters in spooky.then()

SpookyJS hangs if there are any international characters inside spooky.then function.

try {
    var Spooky = require('spooky');
} catch (e) {
    var Spooky = require('../lib/spooky');
}

var spooky = new Spooky({
    child: {
        transport: 'http'
    },
    casper: {
        logLevel: 'debug',
        verbose: true
    }
}, function (err) {
    if (err) {
        e = new Error('Failed to initialize SpookyJS');
        e.details = err;
        throw e;
    }

    spooky.start('http://en.wikipedia.org/wiki/Spooky_the_Tuff_Little_Ghost');
    spooky.then(function () {
        // 안녕
        this.emit('hello', 'Hello World');
    });
    spooky.run();
});

spooky.on('hello', function (greeting) {
    console.log(greeting);
});

spooky.on('console', function (line) {
    console.log(line);
});

Is there any workaround for this issue?


Source: (StackOverflow)

How to respond to requests on completion of a SpookyJS script?

I need to periodically login and scrape some data from a particular site. I wrote a CasperJS script to run on Heroku in order to take care of it.

Here is what I want to be able to do:

app.get('/test', function(request, response) {
  scrapeStuff(function(data) {
    response.send(data);
  });
});

Then, at the final step of the spooky script:

spooky.then(function() {
  callback(this.getHTML());
});

Unfortunately it doesn't seem to be possible for some reason as the function passed to scrapeStuff doesn't make it inside the .then(). (can't find variable: callback) Instead I have to use this.emit() and monitor it with spooky.on - you can see an example of how this is done here.

The problem with using emit is that I want to receive the HTML of the scraped page upon request. So I want to access /scrape, then wait 10 seconds while it's working and receive the page, not call it, assume it succeeded and request another URL to finally get the HTML.

Can this be done with SpookyJS? Maybe there is a better way using CasperJS directly.


Source: (StackOverflow)

All XPaths return a non-existent error in CasperJS?

Just to be clear I am using SpookyJS which is a library that allows for a headless CasperJS.

I am able to click and select other XPaths just fine on all other pages,the problem is only on a particular page, the page loads perfectly but all of the XPaths return this error.

Cannot dispatch mousedown event on nonexistent selector

I have a screenshot taken before the function attempts to click the xPath and the screenshot shows that the page is loaded perfectly.

if I trying using the waitForSelector function I get the timeout error, I've tried different XPaths on different pages and none of them work.

Here is my code in CoffeeScript don't mind the spooky.then just think of it as casper.then:

// 3 steps occur before this and they work perfectly
spooky.then([{x:selectXPath}, () ->
  @wait(3000, () ->
    eval(x) // This loads the xPath function
    @capture('server/components/spooky/img.png')
    @click(xPath('//*[@id="wp-page-header-middle"]/table/tbody/tr/td[1]/a'))
  )
])

The table I'm interested in is inside of an iframe.


Source: (StackOverflow)

SpookyJS has no Start Method while using it in Meteor

I have an weird error and can't find the cause of it for the last few hours...

I have a meteor app, that scrapes some webpages for information and everything works fine as long as I use reuqest and cheerio for static pages, but now I have a dynamic site and I wanted to use phantomjs, casperjs and spookyjs for this one, but here I get some bug... My code is as follows, I import the npm modules at the start:

    if (Meteor.isServer) {
    var cheerio = Meteor.npmRequire('cheerio');
    var request = Meteor.npmRequire('request');
    var phantomJS = Meteor.npmRequire('phantomjs');
    var spooky = Meteor.npmRequire('spooky');

And sometime later I want to use spooky to scrape some webpage:

 spooky.start("https://www.coursera.org/");

  spooky.then( function () {
    this.fill("form", {email: user, password: pass}, true);
  });`

But as soon as I call the method I get the following error message:

    20150224-21:16:39.100(-5)? Exception while invoking method 'getLecturesCoursera' TypeError: Object function Spooky(options, callback) {
    ....
    I20150224-21:16:39.281(-5)? } has no method 'start'
    I20150224-21:16:39.281(-5)?     at [object         Object].Meteor.methods.getLecturesCoursera (app/moocis.js:72:14)

I am doing something completly wrong and I have no clue why it isn't working... I tried to verify that spookyjs and phantomjs are installed correctly in my app, but that isn't as easy as it sounds for someone who uses them for the first time...


Source: (StackOverflow)

SpookyJS: Console.log Doesn't Work Inside Then

try {
  var Spooky = require("spooky");
} catch (e) {
  console.log(e);
}

var spooky = new Spooky({
  capser: {
    logLevel: "debug",
    verbose: true
  },
  child: {
    command: "./casperjs/bin/casperjs",
    port: 8081,
    spooky_lib: "./node_modules/spooky/"
  }
}, function (err) {
  if(err) {
    console.log(err);
  }
  spooky.start("http://www.google.com");
  spooky.then(function () {
    console.log("7331");
    this.emit("printmsg", "1337");
  });
  spooky.run();
});

spooky.on("printmsg", function (msg) {
  console.log(msg);
});

spooky.on("error", function (e) {
  console.error(e);
});

When run, 1337 will be displayed, but 7331 will not. Why is this? The reason I'm asking is because it makes it difficult to debug when you want to log the values of certain variables.

Also, if you want to change the then function like so:

spooky.then(function () {
  var self = this;
  this.evaluate(function () {
    self.emit("printmsg", "Hello World!");
  });
});

This won't work because evaluate doesn't have access to the self variable. In PhantomJS you can make it page.evaluate(function (self) { but that isn't working when I try it with Spooky. So it's very difficult to log data when you want to.

Is there a way around this?


Source: (StackOverflow)

How to perfectly isolate and clear environments between each test?

I'm trying to connect to SoundCloud using CasperJS. What is interesting is once you signed in and rerun the login feature later, the previous login is still active. Before going any further, here is the code:

casper.thenOpen('https://soundcloud.com/', function() {
  casper.click('.header__login');

  popup = /soundcloud\.com\/connect/;

  casper.waitForPopup(popup, function() {
    casper.withPopup(popup, function() {
      selectors = {
        '#username': username,
        '#password': password
      };

      casper.fillSelectors('form.log-in', selectors, false);

      casper.click('#authorize');
    });
  });
});

If you run this code at least twice, you should see the following error appears:

CasperError: Cannot dispatch mousedown event on nonexistent selector: .header__login

If you analyse the logs you will see that the second time, you were redirected to https://soundcloud.com/stream meaning that you were already logged in.

I did some research to clear environments between each test but it seems that the following lines don't solve the problem.

phantom.clearCookies()
casper.clear()
localStorage.clear()
sessionStorage.clear()

Technically, I'm really interested about understanding what is happening here. Maybe SoundCloud built a system to also store some variables server-side. In this case, I would have to log out before login. But my question is how can I perfectly isolate and clear everything between each test? Does someone know how to make the environment unsigned between each test?


Source: (StackOverflow)

Store values from spookyjs environment into mongoDB

I am trying to scrape data from site by spookyjs and store in mongoDB.I am able to get data from the website.But not able to save scraped data from spookyjs environment to mongoDB.To save scraped data,I passed my database model instance to spookyjs .I refered below link for it.

https://github.com/SpookyJS/SpookyJS/wiki/Introduction

Below is my code where I extracted data in prod_link_info variable and pass its values into mongoDB

   var product_model = require('./product').product_model;

     //get results
       spooky.then([{product_model:product_model},function(){
                this.waitForSelector('li[id^="product_"]', function() {
                   //  Get info on all elements matching this CSS selector
                    var prod_link_info = this.evaluate(function() {
                        var nodes = document.querySelectorAll('li[id^="product_"]');

                        return [].map.call(nodes, function(node) { // Alternatively: return Array.prototype.map.call(...
                            return node.querySelector('a').getAttribute('href')+"\n";
                        });
                    });

            //insert values in mongodb
            for (var i = 0; i < prod_link_info.length; i++) {
                product_model.create(
                    {
                        prod_link_info:prod_link_info[i],
                    }, function(err, product){
                        if(err) console.log(err);
                        else console.log(product);
                    });
            } });
    }]);

Below is the code of database schema and model used in above code.

var mongoose=require('mongoose');
var Schema = mongoose.Schema;
// create a schema
var productSchema = new Schema({
    prod_link_info: String,

});

var product_model= mongoose.model('product_model', productSchema);

module.exports = {
    product_model: product_model
}

But when I run above code it gives me following error ReferenceError: Can't find variable: product_model.

I want to store the data extracted from spookyjs to mongoDB.Please suggest where am I doing wrong.


Source: (StackOverflow)

Get an array from SpookyJs to Meteor

After a lot of hard work, my SpookyJS script works as I should and I got my spoils of war, an array of values I want to use to query my Collection in my Meteor app, but I have a huge problem.

I can't find a way to call any Meteor specific methods from spooky...

So my code is like this for the spooky.on function:

spooky.on('fun', function (courses) {
  console.log(courses);
  // Meteor.call('edxResult', courses); // doesn't work...
});

The console.log gives me the result I want:

[ 'course-v1:MITx+6.00.2x_3+1T2015',
'HarvardX/CS50x3/2015',
'course-v1:LinuxFoundationX+LFS101x.2+1T2015',
'MITx/6.00.1x_5/1T2015' ]

What I need is a way to call a Meteor.method with courses as my argument or a way to get access to the array in the current Meteor.method, after spookyjs finished it's work (Sadly I have no idea how to check whether spooky is finished)

My last idea would be to give the Meteor.method a callback function and store the array in the session or something, but that sounds like extremly bad design, there has to be a better way, I hope.

I am extremly proud of my little ghost, so any help to get it the last few pieces over the finish line would be extremly appricated.


Source: (StackOverflow)

Error running sample code using Spooky.js

I am new to the whole stack of node.js, phantom.js, casper.js and spooky.js. I have everything installed (in Windows), with PATH updated and followed this example:

https://github.com/WaterfallEngineering/SpookyJS

I got this error:

C:\node_modules\spooky>node examples/hello.js

events.js:68
        throw arguments[1]; // Unhandled 'error' event
                       ^
Error: Child terminated with non-zero exit code 127
    at Spooky._spawnChild.Spooky._instances.(anonymous function) (C:\node_module
s\spooky\lib\spooky.js:82:17)
    at ChildProcess.EventEmitter.emit (events.js:96:17)
    at Process._handle.onexit (child_process.js:678:10)

Anyone has any clue why and how to fix it? I run casperjs googlelinks.js just fine. But with node.js and spooky.js, it gave me trouble.


Source: (StackOverflow)