You find a great article as a PDF. The document is nicely formatted, and allows you to copy out the text. However, when you copy-paste to a word processor, what was once nicely formatted text:
turns into text that either wraps in really stupid places:
Or doesn't fill up the available width, which looks pretty stupid as well
Why it happens
For whatever stupid reason, most PDFs are encoded with hard "line break" characters at the end of each printed line. This means that if your word document has different margins or font size than the PDF (as it surely will), the text will have extra line breaks in places it shouldn't. We can see these extra line break characters in a text editor:
The CR and LF characters are interpreted by word as "Start a new paragraph here." That's the problem.
We need to remove all the extra line breaks from the document. Word and OpenOffice both allow you to do this using the "Find and Replace" dialog, but the technique is not obvious.
- Hit CTRL+F or click Edit>Find and Replace
- In word, click the tab at the top of the dialog that says "Replace"
- In the Find box, enter ^p for word or $ for OpenOffice
- In Openoffice, click "More" and then check "Regular Expressions"
- In the Replace box, type a single space character (IE hit the spacebar once)
- Click "Replace All"
This is what the box looks like in Word:
And this is what it looks like in OpenOffice:
After you hit "Replace All", all the line breaks will be replaced with spaecs, so your document looks much better.
- If you want to replace the line breaks one at a time, use "replace" instead of "replace all"
- It's usually easier to clean the text in it's own document and then copy/paste again to your card-cutting document
- This technique is also useful for removing extra line breaks after some bozo began a new page by hitting "enter" a bunch of times instead of adding a new page break. Use ^p^p in the find box