That's what they did. They just happen to use a fairly high end scanner that layers the image so it's easier to OCR.
There's no Illustrator work. The only guy using Illustrator is the guy on the video. It's a PDF do ent, not an Illustrator do ent.
Maybe dumb question, but why not straight up copy the original and create a PDF that contains the scanned image. The copier at our office does this with a few clicks and emails you the PDF attachment. Wtf is with all the Illustrator work? Just sayin.
That's what they did. They just happen to use a fairly high end scanner that layers the image so it's easier to OCR.
There's no Illustrator work. The only guy using Illustrator is the guy on the video. It's a PDF do ent, not an Illustrator do ent.
That's exactly my point. Why try to OCR the doc?
The online copy you can download from the WH was exported using the 'Preview' app from MacOS X (using Mac OS X 10.6.7). I would venture the original was probably a multi-layer tiff, and Preview was used to convert to PDF.
I think he's suggesting it's a default for the scanner they used.
Fair enough.
ElNono, thanks for taking the time to explain. Sounds plausible.
Now, would it be fair to say the original from which the scan was made wouldn't have the artifacts described by the first video such as the Signature with no grayscale pixels?
They didn't. There's no OCR text in it. If OCR was done, the PDF would contain the text. What's likely is that they use a background removal function of the scanner (some Fujitsu models have that) so they can place the scan over a generic background.
The reason you would want to do that is because the background might opaque some of the text.
And, are you saying the process took the date and changed some of the characters into black and white with no gray pixels while leaving one number with the grayscale pixels? I think it was the 19 and the 1 that are straight black and white with the 6 being grayscale. I don't recall without going back and re-watching the video.
The variations on the actual text don't really matter. The signature could be explained for many reasons: type of pen used, whether it's a single pen trace versus the pen going through multiple time over the line, etc. The machine text looks like a typewriter, which has the same problem.
What's going to tell you if the do ent is doctored is if the white outline that surrounds the text (and it's on the background layer) matches the text above it. If you notice a pattern on it (signs of the clone tool) or a discontinuation on the gradient then that would raise an alarm. If the guy that made the video would've spotted something like that, it would've been worth looking at.
The scale stuff is really amateur stuff. Most docs are scanned at 150 or 300 dpi. PDF uses 72dpi as native res. The math isn't that hard.
I don't know much about .pdf's but I've messed with images on a microscale for a couple of decades. All scans of do ents produce letters or handwriting where the pixels gray out at the edges. The only time I've ever seen a completely black pixeled piece of text or handwriting is when the image was converted to black and white.
My question was, does the OCR do that to some but not other ink artifacts on scanned do ents?
What do you do, if you don't mind me asking?
The first reason for that is that you're looking at this on a 72 dpi screen. You're basically compressing pixels to fit the screen, so you're missing basically 3 pixels for each one you see on screen.
With that in mind, the white outline you see in the background IS that edge grayscale. What happens when you do background removal is you compress the gray colorspace from 256 elements, to, say 200. The first 'whitest' 56 elements (white to light gray) are converted into white with a scaled alpha value. That way the gray contour 'fades' into 'transparent' (or the new background), instead of fading to white. Obviously, you also need to re-scale the remaining opaque image from 200 to 256 colors, which will remove some of the shading and 'wash' it a bit.
On top of that, the scanner could be set to sharpen the image (don't know that it is in this case). That would enhance the contours, sharpening the text.
But, on some and not others?
The graphics work has been a hobby since computer imaging came around. I've done some logo designs and graphics work for publication.
None of which is related to my paying job. But, in that capacity, I've manipulated images on a pixel-by-pixel level a number of times.
What do you mean on some and not others? Isn't there a white outline shading to the background around the text?
When he talks about the doctor's signature. The first letter of the name clearly has grayscale pixels that surround the letter. The rest of the signature is black.
Not enough ink flowing from the pen when he starts writing?
Does the whole signature has a white outline around that matches?
It's not an issue with how the ink was applied to the paper. The first letter is an actual scan of the letter. The rest of the signature is a black and white (transparent) image of the rest of the signature.
Forget the background for a minute because, if you assembled several originals on a transparency and then scanned them with the green background, it would create the white space you describe. That doesn't change the fact that the first letter of the signature is a full color (or at the very least a grayscale) image of the first letter and the remainder is a black and white image.
The same phenomenon occurs on the date at the bottom and in a couple of other places.
This is what I'm talking about.
I can't forget about the background because the background isn't created with transparencies, it's removed digitally by reducing the colorspace resolution, and actually using the image data to do it. You effectively lose levels of gray by doing that. The actual intensity or porosity of the trace of the ink will have an impact in the grayscale levels you see on the scan.
There's also some form of enhancement done on the text probably to make it OCR easier. It isn't just the signature that is entirely opaque. If you look at the actual text of the form it's also entirely opaque. However, even that text has the properly shaped surrounding white outline.
In order to doctor any of the black text, you would also need to doctor the surrounding white outline, or it won't match.
Okay, so will your OCR software do what I just posted?
I see what you're talking about. I have the original PDF with me here.
Notice the white surrounding outline matching the text. You can't 'add that text later' without fixing the outline too.
Some laser copiers already do that.
Okay, back to my other questions...to some and not others?
Why not the 1?
There are currently 1 users browsing this thread. (0 members and 1 guests)