phantomjs interview questions
Top phantomjs frequently asked interview questions
I'm looking for an example of requesting a webpage, waiting for the JavaScript to render (JavaScript modifies the DOM), and then grabbing the HTML of the page.
This should be a simple example with an obvious use-case for PhantomJS. I can't find a decent example, the documentation seems to be all about command line use.
Source: (StackOverflow)
I am using Chutzpah to execute my JavaScript unit tests.
I reference paths to my source files and below have a series of tests. Text Explorer in Visual Studio lists my tests and I can execute them directly from the IDE, so everything seems to be working correctly.
However I would like to step into the source code that is being executed when my tests are run.
Is this possible?
Source: (StackOverflow)
I'm using PhantomJS v1.4.1 to load some web pages. I don't have access to their server-side, I just getting links pointing to them. I'm using obsolete version of Phantom because I need to support Adobe Flash on that web pages.
The problem is many web-sites are loading their minor content async and that's why Phantom's onLoadFinished callback (analogue for onLoad in HTML) fired too early when not everything still has loaded. Can anyone suggest how can I wait for full load of a webpage to make, for example, a screenshot with all dynamic content like ads?
Source: (StackOverflow)
We are using Selenium to automate our UI testing. Recently we have seen majority of our users using Chrome. So we wanted to know - pros and cons of using PhantomJS vs Selenium:
- Is there any real advantage in terms of performance, e.g. time taken to execute the test cases?
- When should one prefer PhantomJS over Selenium?
Source: (StackOverflow)
Just installed phantomjs, mac os x yosemite. Whenever I run /bin/phantomjs, with any parameter, I get Killed: 9
. Any idea?
Source: (StackOverflow)
How do I click an element in PhantomJS?
page.evaluate(function() {
document.getElementById('idButtonSpan').click();
});
This gives me an error "undefined is not a function..."
If I instead
return document.getElementById('idButtonSpan');
and then print it,
then it prints [object object], so the element does exist.
The element acts as a button, but it's actually just a span element, not a submit input.
I was able to get this button click to work with Casper, but Casper had other limitations so I'm back to PhantomJS.
Source: (StackOverflow)
I already did some research in this field, but didn't find any solution. I have a site, where asynchron ajax calls are made to facebook (using JSONP). I'm recording all my HTTP requests on the Ruby side with VCR, so I thought it would be cool, to use this feature for AJAX calls as well.
So I played a little bit around, and came up with a proxy attempt. I'm using PhantomJS as a headless browser and poltergeist for the integration inside Capybara. Poltergeist is now configured to use a proxy like this:
Capybara.register_driver :poltergeist_vcr do |app|
options = {
:phantomjs_options => [
"--proxy=127.0.0.1:9100",
"--proxy-type=http",
"--ignore-ssl-errors=yes",
"--web-security=no"
],
:inspector => true
}
Capybara::Poltergeist::Driver.new(app, options)
end
Capybara.javascript_driver = :poltergeist_vcr
For testing purposes, I wrote a proxy server based on WEbrick, that integrates VCR:
require 'io/wait'
require 'webrick'
require 'webrick/httpproxy'
require 'rubygems'
require 'vcr'
module WEBrick
class VCRProxyServer < HTTPProxyServer
def service(*args)
VCR.use_cassette('proxied') { super(*args) }
end
end
end
VCR.configure do |c|
c.stub_with :webmock
c.cassette_library_dir = '.'
c.default_cassette_options = { :record => :new_episodes }
c.ignore_localhost = true
end
IP = '127.0.0.1'
PORT = 9100
reader, writer = IO.pipe
@pid = fork do
reader.close
$stderr = writer
server = WEBrick::VCRProxyServer.new(:BindAddress => IP, :Port => PORT)
trap('INT') { server.shutdown }
server.start
end
raise 'VCR Proxy did not start in 10 seconds' unless reader.wait(10)
This works well with every localhost call, and they get well recorded. The HTML, JS and CSS files are recorded by VCR. Then I enabled the c.ignore_localhost = true
option, cause it's useless (in my opinion) to record localhost calls.
Then I tried again, but I had to figure out, that the AJAX calls that are made on the page aren't recorded. Even worse, they doesn't work inside the tests anymore.
So to come to the point, my question is: Why are all calls to JS files on the localhost recorded, and JSONP calls to external ressources not? It can't be the jsonP thing, cause it's a "normal" ajax request. Or is there a bug inside phantomjs, that AJAX calls aren't proxied? If so, how could we fix that?
If it's running, I want to integrate the start and stop procedure inside
------- UPDATE -------
I did some research and came to the following point: the proxy has some problems with HTTPS calls and binary data through HTTPS calls.
I started the server, and made some curl calls:
curl --proxy 127.0.0.1:9100 http://d3jgo56a5b0my0.cloudfront.net/images/v7/application/stories_view/icons/bug.png
This call gets recorded as it should. The request and response output from the proxy is
GET http://d3jgo56a5b0my0.cloudfront.net/images/v7/application/stories_view/icons/bug.png HTTP/1.1
User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5
Host: d3jgo56a5b0my0.cloudfront.net
Accept: */*
Proxy-Connection: Keep-Alive
HTTP/1.1 200 OK
Server: WEBrick/1.3.1 (Ruby/1.9.3/2012-10-12)
Date: Tue, 20 Nov 2012 10:13:10 GMT
Content-Length: 0
Connection: Keep-Alive
But this call doesn't gets recorded, there must be some problem with HTTPS:
curl --proxy 127.0.0.1:9100 https://d3jgo56a5b0my0.cloudfront.net/images/v7/application/stories_view/icons/bug.png
The header output is:
CONNECT d3jgo56a5b0my0.cloudfront.net:443 HTTP/1.1
Host: d3jgo56a5b0my0.cloudfront.net:443
User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5
Proxy-Connection: Keep-Alive
HTTP/1.1 200 OK
Server: WEBrick/1.3.1 (Ruby/1.9.3/2012-10-12)
Date: Tue, 20 Nov 2012 10:15:48 GMT
Content-Length: 0
Connection: close
So, I thought maybe the proxy can't handle HTTPS, but it can (as long as I'm getting the output on the console after the cURL call). Then I thought, maybe VCR can't mock HTTPS requests. But using this script, VCR mocks out HTTPS requests, when I don't use it inside the proxy:
require 'vcr'
VCR.configure do |c|
c.hook_into :webmock
c.cassette_library_dir = 'cassettes'
end
uri = URI("https://d3jgo56a5b0my0.cloudfront.net/images/v7/application/stories_view/icons/bug.png")
VCR.use_cassette('https', :record => :new_episodes) do
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
response = http.request_get(uri.path)
puts response.body
end
So what is the problem? VCR handles HTTPS and the proxy handles HTTPS. Why they don't play together?
Source: (StackOverflow)
I'm trying to put list of possible solutions for browser automatic tests suits and headless browser platforms capable of scraping.
BROWSER TESTING / SCRAPING:
- Selenium - polyglot flagship in browser automation, bindings for Python, Ruby, JavaScript, C#, Haskell and more, IDE for Firefox (as an extension) for faster test deployment. Can act as a Server and has tons of features.
JAVASCRIPT
- PhantomJS - JavaScript, headless testing with screen capture and automation, uses Webkit. As of version 1.8 Selenium's WebDriver API is implemented, so you can use any WebDriver binding and tests will be compatible with Selenium
- SlimerJS - similar to PhantomJS, uses Gecko (Firefox) instead of WebKit
- CasperJS - JavaScript, build on both PhantomJS and SlimerJS, has extra features
- Ghost Driver - JavaScript implementation of the WebDriver Wire Protocol for PhantomJS.
- new PhantomCSS - CSS regression testing. A CasperJS module for automating visual regression testing with PhantomJS and Resemble.js.
- new WebdriverCSS - plugin for Webdriver.io for automating visual regression testing
- new PhantomFlow - Describe and visualize user flows through tests. An experimental approach to Web user interface testing.
- new trifleJS - ports the PhantomJS API to use the Internet Explorer engine.
- new CasperJS IDE (commercial)
NODE.JS
- Node-phantom - bridges the gap between PhantomJS and node.js
- WebDriverJs - Selenium WebDriver bindings for node.js by Selenium Team
- WD.js - node module for WebDriver/Selenium 2
- yiewd - WD.js wrapper using latest Harmony generators! Get rid of the callback pyramid with yield
- ZombieJs - Insanely fast, headless full-stack testing using node.js
- NightwatchJs - Node JS based testing solution using Selenium Webdriver
- Chimera - Chimera: can do everything what phantomJS does, but in a full JS environment
- Dalek.js - Automated cross browser testing with JavaScript through Selenium Webdriver
- Webdriver.io - better implementation of WebDriver bindings with predefined 50+ actions
- Nightmare - Electron bridge with a high-level API.
- jsdom - Tailored towards web scraping. A very lightweight DOM implemented in Node.js, it supports pages with javascript.
WEB SCRAPING / MINING
- Scrapy - Python, mainly a scraper/miner - fast, well documented and, can be linked with Django Dynamic Scraper for nice mining deployments, or Scrapy Cloud for PaaS (server-less) deployment, works in terminal or an server stand-alone proces, can be used with Celery, built on top of Twisted
- Snailer - node.js module, untested yet.
- Node-Crawler - node.js module, untested yet.
ONLINE TOOLS
RELATED LINKS & RESOURCES
Questions:
- Any pure Node.js solution or Nodejs to PhanthomJS/CasperJS module that actually works and is documented?
Answer: Chimera seems to go in that direction, checkout Chimera
Answer: Checkout the list created by rjk with ruby based solutions
- Do you know any related tech or solution?
Feel free to reedit this question and add content as you wish! Thank you for your contributions!
Updates
- added SlimerJS to the list
- added Snailer and Node-Crawler and Node-phantom
- added Yiewd WebDriver wrapper
- added WebDriverJs and WD.js
- added Ghost Driver
- added Comparsion of Webscraping software on Screen Scraper Blog
- added ZombieJs
- added Resemble.js and PhantomCSS and PhantomFlow, categorised and reedited content
- 04.01.2014, added Chimera, answered 2 questions
- added NightWatchJs
- added DalekJS
- added WebdriverCSS
- added CasperBox
- added trifleJS
- added CasperJS IDE
- added Nightmare
- added jsdom
- added Online HTTP client,
updated CasperBox (dead)
Source: (StackOverflow)
I'm trying to use phantomJS (what an awesome tool btw!) to submit a form for a page that I have login credentials for, and then output the content of the destination page to stdout. I'm able to access the form and set its values successfully using phantom, but I'm not quite sure what the right syntax is to submit the form and output the content of the subsequent page. What I have so far is:
var page = new WebPage();
var url = phantom.args[0];
page.open(url, function (status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
console.log(page.evaluate(function () {
var arr = document.getElementsByClassName("login-form");
var i;
for (i=0; i < arr.length; i++) {
if (arr[i].getAttribute('method') == "POST") {
arr[i].elements["email"].value="mylogin@somedomain.com";
arr[i].elements["password"].value="mypassword";
// This part doesn't seem to work. It returns the content
// of the current page, not the content of the page after
// the submit has been executed. Am I correctly instrumenting
// the submit in Phantom?
arr[i].submit();
return document.querySelectorAll('html')[0].outerHTML;
}
}
return "failed :-(";
}));
}
phantom.exit();
}
Source: (StackOverflow)
Is it possible to create a page from a string?
example:
html = '<html><body>blah blah blah</body></html>'
page.open(html, function(status) {
// do something
});
I have already tried the above with no luck....
Also, I think it's worth mentioning that I'm using nodejs with phantomjs-node(https://github.com/sgentle/phantomjs-node)
Thanks!
Source: (StackOverflow)
I am using PhantomJS to make calls to a web page, like this:
page.open('http://example.com', function (s) {
console.log(page.content);
phantom.exit();
});
I am using this in the context of Drupal Simpletests, which require me to set a special USERAGENT in order to use the test database instead of the real database. I would like to fetch the web page a specific user agent. For example, in PHP with Curl, I can do this with CURLOPT_USERAGENT before making a cUrl call.
Thanks!
Albert
Source: (StackOverflow)
I followed these instructions (except for copying the executable to my PATH because I cannot seem to find it and it does not seem necessary). Then I made a file called image_render.js in my public javascripts directory with
console.log('Hello, world!');
phantom.exit();
inside it, saved it, and ran phantomjs render_image.js
in my terminal. However, my terminal does not recognize the command:
-bash: phantomjs: command not found
What have I done wrong?
Source: (StackOverflow)
I'm going through the documentation for the Selenium WebDriver, and it can drive Chrome for example. I got thinking, wouldn't it be far more efficient to 'drive' PhantomJS?
Is there a way to use selenium with PhathomJS?
My intended use would be webscraping: The sites I scrape are loaded with AJAX and lots of lovely javascript, and I'm thinking this setup could be a good replacement for the scrappy python framework that I'm currently working with.
Source: (StackOverflow)
I'm trying to set up remote debugging with PhantomJS, without much luck. I am following the instructions at https://github.com/ariya/phantomjs/wiki/Troubleshooting. I have a little program named debug.js
:
var system = require('system' ), fs = require('fs'), webpage = require('webpage');
(function(phantom){
var page=webpage.create();
function debugPage(){
console.log("Refresh a second debugger-port page and open a second webkit inspector for the target page.");
console.log("Letting this page continue will then trigger a break in the target page.");
debugger; // pause here in first web browser tab for steps 5 & 6
page.open(system.args[1]);
page.evaluateAsync(function() {
debugger; // step 7 will wait here in the second web browser tab
});
}
debugPage();
}(phantom));
Now I run this from the command line:
$ phantomjs --remote-debugger-port=9001 --remote-debugger-autorun=yes debug.js my.xhtml
The console.log
messages are now displayed in the shell window. I open a browser page to localhost:9001
. It is at this point that the documentation says "get first web inspector for phantom context" However, I see only a single entry for about:blank
. When I click on that, I get an inspector for the irrelevant about:blank page, with the URL http://localhost:9001/webkit/inspector/inspector.html?page=1
. The documentation talks about executing __run()
, but I can't seem to get to the page where I would do that; about:html
seems to contina a __run()
which is a no-op.
FWIW, I am using PhantomJS 1.9.1 under W8.
What am I missing?
Source: (StackOverflow)