You find a great article as a PDF. The document is nicely formatted, and allows you to copy out the text. However, when you copy-paste to a word processor, what was once nicely formatted text:
turns into text that either wraps in really stupid places:
Or doesn't fill up the available width, which looks pretty stupid as well
Why it happens
For whatever stupid reason, most PDFs are encoded with hard "line break" characters at the end of each printed line. This means that if your word document has different margins or font size than the PDF (as it surely will), the text will have extra line breaks in places it shouldn't. We can see these extra line break characters in a text editor:
The CR and LF characters are interpreted by word as "Start a new paragraph here." That's the problem.
The Solution
We need to remove all the extra line breaks from the document. Word and OpenOffice both allow you to do this using the "Find and Replace" dialog, but the technique is not obvious.
- Hit CTRL+F or click Edit>Find and Replace
- In word, click the tab at the top of the dialog that says "Replace"
- In the Find box, enter ^p for word or $ for OpenOffice
- In Openoffice, click "More" and then check "Regular Expressions"
- In the Replace box, type a single space character (IE hit the spacebar once)
- Click "Replace All"
This is what the box looks like in Word:
And this is what it looks like in OpenOffice:
After you hit "Replace All", all the line breaks will be replaced with spaecs, so your document looks much better.
Notes
- If you want to replace the line breaks one at a time, use "replace" instead of "replace all"
- It's usually easier to clean the text in it's own document and then copy/paste again to your card-cutting document
- This technique is also useful for removing extra line breaks after some bozo began a new page by hitting "enter" a bunch of times instead of adding a new page break. Use ^p^p in the find box
22 comments:
CR LF LOL
Many templates have macros that do that, which might be easier
Thank you! I clean up hundreds of documents each month and had looked near and far for an easier way. Your method is MUCH simpler than anything I've found.
Also you can use: texthandler.com online tools that can remove line. Copy text from PDF, select options "Every paragraph began by capital " and click the "execute" button.
texthandler.com
Hooray! Exactly what I was looking for! We have several e-mails we want to print, but didn't want to waste all the paper by printing them with all the line breaks that kept adding up. Thanks for a quick, easy solution!
Thank you very much, this was exactly what I was looking for.
Much appreiciated.
Thank you, I had forgotten what the code was for line break --- ^p
Thank you very much; you have saved lot of time of mine and others.
:) saved my life
Thanks a lot! This is really helpful. I was using online tool textfixer.com for this but this way is easier.
Absolute lifesaver -- currently writing up notes for my dissertation and you've probably just doubled how productive I am.
Thank you!
Glad I looked for this now - you're a massive time-saver! THANK YOU!
thanks this saved me lots of time!
Wonderful, I should have looked for this years ago! Thank you very much,
Claudio
The texthandler website didn't work well for my text (didn't split paragraphs with either option) so I wrote my own which splits into paragraphs where it finds a '.' at the end of a line: Format Text Page.
Hope it helps someone.
I just paste the text to Firefox address box. It removes the extra line breaks automatically. Ctrl+V, Ctrl+A, Ctrl+X
It will remove my "real-linebreak" (paragraph break) at the same time. What can I do?
Can I suggest an easier way is to load PDF Copy-Paster (http://www.onehourprogramming.com/blog/2010/9/1/fix-copy-and-pasting-in-pdfs.html). It's always available - no need to be on line and it works a treat.
Typing services provides superior customer services. To ensure this, they have a dedicated staff that listens to every instruction and concentrates on each and every detail. document typing services
Post a Comment