Loading

Friday, August 14, 2009

Notepad++: A guide to using regular expressions and extended search mode

The information in this post will benefit anyone looking to understand how to use Notepad++ extended search mode and regular expressions.

Notepad++ is an excellent text editor and replacement to Microsoft's notepad.exe.

Since the release of version 4.9, the Notepad++ Find and Replace commands have been updated. It now has a new Extended search mode that allows you to search for tabs(\t), newline(\r\n), and a character by its value (\o, \x, \b, \d, \t, \n, \r and \\). Unfortunately, the Notepad++ documentation is lacking in its description of these new capabilities. These slides by Anjesh Tuladhar on regular expressions in Notepad++ is very useful.

One of the major disadvantages of using regular expressions in Notepad++ was that it did not handle the newline character well — especially in Replace. Now, we can use Extended search mode to make up for this shortcoming. Together, Extended and Regular Expression search modes give you the power to search, replace and reorder your text in ways that were not previously possible in Notepad++.

In the Find (Ctrl+F) and Replace (Ctrl+H) dialogs, the three available search modes are specified in the bottom left corner. Select your desired search mode.

Example 1: For this example let's remove all lines that begin with a certain character, for example the exclamation point.

Open the Search & Replace dialog box (Ctrl+H) and select the Regular Expression search mode.
  • Find what: [!].*
  • Replace with: (leave this blank)
Press Replace All. All the error messages are gone.

Explanation:
  • [!] finds the exclamation character.
  • .* selects the rest of the line.
Example 2: Remove all blank lines.

Switch to Extended search mode in the Replace dialog.
  • Find what: \r\n\r\n
  • Replace with: (leave this blank)
Press Replace All and all the blank lines will be removed.

Explanation:
  • \r\n is a newline character (in Windows).
  • \r\n\r\n finds two newline characters (what you get from pressing Enter twice).
Example 3: Put each Item on a new line.

Switch to Regular Expression search mode.
  • Find what: (\+.*)(Item)
  • Replace with: \1\r\n\2
Press Replace All. "Item"s have been placed on new lines.

Explanation:
  • \+ finds the + character.
  • .* selects the text after the + up until the word "Item".
  • Item finds the string "Item".
  • () allow us to access whatever is inside the parentheses. The first set of parentheses may be accessed with \1 and the second set with \2.
  • \1\r\n\2 will take + and whatever text comes after it, will then add a new line, and place the string "Item" on the new line.
Example 4: Delete duplicate or redundant information.

Remove all newline characters using Extended search mode, replacing them with a unique string of text that we will use as a signpost for redundant data later in RegEx. Choose a string of text that does not appear in your file — I have chosen RegEx_Example.

Switch to Extended search mode in the Replace dialog.
  • Find what: \r\n
  • Replace with: RegEx_Example
Press Replace All. All the newline characters are gone. Your entire file is now one very long line of text.

Using our RegEx_Example signpost keyword, let's separate the different values.

Stay in Extended search mode.
  • Find what: ,
  • Replace with: ,RegEx_Example
Press Replace All. Now, RegEx_Example appears after every comma.

Example 5: Put the remaining Items on new lines.

Switch to Regular Expression search mode.
  • Find what: RegEx_Example(Item)
  • Replace with: \r\n\1
Press Replace All. All "Item"s should now be on new lines.

Example 6: Get rid of duplicate entries.
  • Find what: RegEx_Example ([^A-Za-z]*)RegEx_Example [^A-Za-z]*\,RegEx_Example
  • Replace with: \1,
Press Replace All and all duplicate entries will be removed.

Explanation:
  • A-Z finds all letters of the alphabet in upper case.
  • a-z finds all lower case letters.
  • A-Za-z will find all alphabetic characters.
  • [^...] is the inverse. So, if we put these three together: [^A-Za-z] finds any character except an alphabetic character.
  • Notice that only one of the [^A-Za-z] is in parentheses (). This is recalled by \1 in the Replace with field. The characters outside of the parentheses are discarded.
Example 7: Let's get rid of all those RegEx_Examples.
  • Find what: RegEx_Example
  • Replace with: (leave blank)
Press Replace All. The RegEx_Examples are gone.

Example 8: Separate each entry's data from the next.
  • Find what: (\**\*)
  • Replace with: \r\n\r\n\1\r\n\r\n
Press Replace All. The final product is a beautiful, comma-delimited file that is ready to be imported into Excel for further analysis.

Notepad++ rocks!

Keywords:
notepad++ extended search
\d regular notepad++
notepad replace all leaves blank
notepad++ extended search mode
notepad++ regular expression search replace
creating a newline using notepad++
does not match regular expression notepad++
find and replace with regular expressions notepad++ blank line
notepad++ documentation

4 comments:

  1. Worth mentioning that as of 2011-06-13, Notepad++ does not support alternation (|) in regular expressions. I use Notepad++ too but it's regular expression capabilities are below par (others editors like EditPad Pro and EditPlus do better on this)

    ReplyDelete
  2. Nice! This info was very useful.

    ReplyDelete
  3. I've loaded your website in Several totally different internet browsers and I have to say your blog loads a great deal faster then most. Would you mind emailing me the name of one's website hosting company? I will even sign up through your affiliate weblink if you'd like. Thanks a lot

    ReplyDelete
  4. Very useful info, Notepad++ is great-documentation is not!
    How would I add a line break at a capital letter in unorganized text block?
    Example:
    From this:
    Very useful info Notepad++ is great-documentation is not!
    To this:
    Very useful info
    Notepad++ is great-documentation is not!
    Thanks

    ReplyDelete