adobe-acrobat interview questions
Top adobe-acrobat frequently asked interview questions
After cropping anything by using the crop tool in Adobe Acrobat, how do I ensure that the cropped area is fixed and can't be seen even when I increase the crop margin?
For example: how would I crop the following example (image and text) to ensure the image and the part "I don't want to include this text" are really removed, and not hidden somewhere in the result?
Adobe Reader's crop tool only seems to hide the cropped part; it does not really remove it:
Source: (StackOverflow)
I have a scanned course and it has two pages, consecutive are showing as one page, how can I automatically split all the pages in one pass. Usually this is done by cropping odd and even pages and then merge them back together but this could take very much?
How can I split pages on scanned PDF in a single pass?
Source: (StackOverflow)
We get PDF's from our professor to read for homework but they're often scanned documents, is there a way to adjust the contrast of the text to make it easier to read?
Edit: I've got Photoshop but is there a way to do it from a PDF reader?
Edit2: Windows XP, 7 ** Windows or Ubuntu Only **
Source: (StackOverflow)
This has been discussed a year ago here:
Batch OCR for many PDF files (not already OCRed)?
Is there any way to batch OCR PDFs that haven't been already OCRed? This is, I think, the current state of things dealing with two issues:
Batch OCR PDFs
Windows
Acrobat – This is the most straightfoward ocr engine that will batch OCR. The only problem seems to be 1) it wont skip files that have already been OCRed 2) try throwing a bunch of PDFs at it (some old) and watch it crash. It is a little buggy. It will warn you at each error it runs into (though you can tell the software to not notify. But again, it dies horribly on certain types of PDFs so your mileage may vary.
ABBYY FineReader (Batch/Scansnap), Omnipage – These have got to be some of the worst programmed pieces of software known to man. If you can find out how to fully automate (no prompting) batch OCR of PDFs saving with the same name then please post here. It seems the only solutions I could find failed somewhere--renaming, not fully automated, etc. etc. At best, there is a way to do it, but the documentation and programming is so horrible that you'll never find out.
ABBYY FineReader Engine, ABBYY Recognition Server - These really are more enterprise solutions, you probably would be better off just getting acrobat to run over a folder and try and weed out pdfs that give you errors/crash the program than going through the hassle of trying to install evaluation software (assuming you are a simple end-user). Doesn't seem cost competitive for the small user.
** Autobahn DX workstation ** the cost of this product is so prohibitive, you probably could buy 6 copies of acrobat. Not really an end-user solution. If you're an enterprise setup, this may be worth it for you.
Linux
- WatchOCR – no longer developed, and basically impossible to run on modern Ubuntu distros
- pdfsandwich – no longer developed, basically impossible to run on modern Ubuntu distros
- ** ABBY LINUX OCR ** - this should be scriptable, and seems to have some good results:
http://www.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison
However, like a lot of these other ABBYY products they charge by the page, again, you might be better off trying to get Acrobat Batch OCR to work.
*Ocrad, GOCR, OCRopus, tesseract, * – these may work but there are a few problems:
- OCR results are not as great as, say, acrobat for some of these (see above link).
- None of the programs take in a PDF file and output a PDF file. You have to create a script and break apart the PDF first and run the programs over each and then reassemble the file as a pdf
- Once you do, you may find, like I did, that (tesseract) creates an OCR layer that is shifted over. So if you search for the word 'the', you'll get a highlight of the part of the word next to it.
Batch DjVu → Convert to PDF – haven't looked into it, but seems like a horrible round-a-bout solution.
Online
- PDFcubed.com – come on, not really a batch solution.
- ABBYY Cloud OCR - not sure if this is really a batch solution, either way, you have to pay by the page and this could get quite pricey.
Identifying non-OCRed PDFs
This is a slightly easier problem, that can be solved easily in Linux and much less so in Windows. I was able to code a perl script using pdffont
to identify whether fonts are embedded to determine which files are not-OCRed.
Current "solutions"
Use a script to identify non-OCRed pdfs (so you don't rerun over thousands of OCRed PDFs) and copy these to a temporary directory (retaining the correct directory tree) and then use Acrobat on Windows to run over these hoping that the smaller batches won't crash.
use the same script but get one of the linux ocr tools to properly work, risking ocr quality.
I think I'm going to try #1, I'm just worried too much about the results of the Linux OCR tools (I don't suppose anyone has done a comparison) and breaking the files apart and stitching them together again seems to be unnecessary coding if Adobe can actually batch OCR a directory without choking.
If you want a completely free solution, you'll have to use a script to identify the non-OCRed pdfs (or just rerun over OCRed ones), and then use one of the linux tools to try and OCR them. Teseract seems to have the best results, but again, some of these tools are not supported well in modern versions of Ubuntu, though if you can set it up and fix the problem I had where the image layer not matching the text-matching layer (with tesseract) then you would have a pretty workable solution and once again Linux > Windows.
Do you have a working solution to fully automate, batch OCR PDFs, skipping already OCRed files keeping the same name, with high quality? If so, I would really appreciate the input.
Perl script to move non-OCRed files to a temp directory. Can't guarantee this works and probably need to be rewritten, but if someone makes it work (assuming it doesn't work) or work better, let me know and I'll post a better version here.
#!/usr/bin/perl
# move non-ocred files to a directory
# change variables below, you need a base dir (like /home/joe/), and a sourcedirectory and output
# direcotry (e.g books and tempdir)
# move all your pdfs to the sourcedirectory
use warnings;
use strict;
# need to install these modules with CPAN or your distros installer (e.g. apt-get)
use CAM::PDF;
use File::Find;
use File::Basename;
use File::Copy;
#use PDF::OCR2;
#$PDF::OCR2::CHECK_PDF = 1;
#$PDF::OCR2::REPAIR_XREF = 1;
my $basedir = '/your/base/directory';
my $sourcedirectory = $basedir.'/books/';
my @exts = qw(.pdf);
my $count = 0;
my $outputroot = $basedir.'/tempdir/';
open( WRITE, >>$basedir.'/errors.txt' );
#check file
#my $pdf = PDF::OCR2->new($basedir.'/tempfile.pdf');
#print $pdf->page(10)->text;
find(
{
wanted => \&process_file,
# no_chdir => 1
},
$sourcedirectory
);
close(WRITE);
sub process_file {
#must be a file
if ( -f $_ ) {
my $file = $_;
#must be a pdf
my ( $dir, $name, $ext ) = fileparse( $_, @exts );
if ( $ext eq '.pdf' ) {
#check if pdf is ocred
my $command = "pdffonts \'$file\'";
my $output = `$command`;
if ( !( $output =~ /yes/ || $output =~ /no/ ) ) {
#print "$file - Not OCRed\n";
my $currentdir = $File::Find::dir;
if ( $currentdir =~ /$sourcedirectory(.+)/ ) {
#if directory doesn't exist, create
unless(-d $outputroot.$1){
system("mkdir -p $outputroot$1");
}
#copy over file
my $fromfile = "$currentdir/$file";
my $tofile = "$outputroot$1/$file";
print "copy from: $fromfile\n";
print "copy to: $tofile\n";
copy($fromfile, $tofile) or die "Copy failed: $!";
# `touch $outputroot$1/\'$file\'`;
}
}
}
}
}
Source: (StackOverflow)
Have several copies of Acrobat Professional that were purchased previously, installed and activated. However there is no documentation of the serial numbers, the Adobe online account ID or any details for these.
Need to move the licenses to upgraded Windows 7 PCs (current ones are on Windows XP that are about to be decommissioned).
Requirement is to ONLY move the licenses to the upgraded workstations. NOT to have multiple instances of the same license running concurrently.
Note: Adobe support is not very helpful since there isn't much information about the licenses.
DO NOT want to use 3rd party tools to extract serial numbers.
Is there a way to get this information from the registry or any other location so that the licenses can be transferred without breaking the activation? If so how?
Source: (StackOverflow)
I have Acrobat Adobe 11.0.07 for OSX and I can't find a way to set "Enable scrolling" (in "View" -> "Page Display" as a default option.
Is there a way?
Source: (StackOverflow)
I want to edit the the fields of a PDF in Acrobat using Forms->Add or Edit Fields
which I know how to use.
Unfortunately, a new form I want to edit seems to have been made in LiveCycle designer and so keeps trying to open it in LiveCycle.
Is there a way to change this behavior and make that form editable in Acrobat Pro?
Source: (StackOverflow)
See title. I'd like to convert a bunch of XML files to PDF using Adobe Acrobat (9.0).
Currently, I'm opening each of these files with IE and then convert them manually using the Acrobat Plugin.
I'm curious if this can be achieved without loading each of the files by hand since the generation of the PDFs is a part of an otherwise automated process.
Source: (StackOverflow)
I received an Adobe PDF scan of a document that displays upside-down.
I rotated it inside Adobe Acrobat and chose Save As to make a new document, however, the rotation is not saved and when I open the new document, it is upside-down again.
How can I correct this upside-down document as a new PDF file?
Source: (StackOverflow)
When I open a PDF in a non-Acrobat PDF viewer, they look fine, but when I open one in Acrobat, the fonts look absolutely terrible. They look very low resolution and I'm not sure how to increase them.
I've tried PDFs from internet, exporting from Word, and making LaTeX documents.
How can I correct this problem?
Source: (StackOverflow)
I am constantnly getting the error message "there were no pages selected to print" with acrobat v9 on XP. Anyone know how to get rid of this problem - other than a re-install?
Source: (StackOverflow)
For a given PDF which uses a number of fonts (e.g., in Acrobat Reader, the fonts used can be seen when selecting Files > Properties > Fonts) how can I find out where a certain font is used in the document (using Adobe Acrobat 7, Reader, or a free PDF tool)
Just to be clear: I don't want to find which font is used on a certain piece of text (I know how to do that using Acrobat 9 Professional, see this Super User question). Instead I want to find where a specific font is used.
Source: (StackOverflow)
I've got Acrobat Pro 8 and Word 2003.
I've got a form in Word creating using form fields. What I want to do is convert this document to a "fill in" PDF form automatically. In other words, replacing Word's form fields with Acrobat's form fields.
I can't seem to find any way to do this using Acrobat's integration with Word. Anybody know of a way?
Source: (StackOverflow)
I have a scanned PDF (two vertical pages on one horizonatal page). How can I split them to be single pages in Adobe Acrobat Pro Extended?
Source: (StackOverflow)