While Excel offers a wide range of functions and features, there are times when we need to manipulate data in ways that standard Excel functions cannot achieve. This is where Regular Expressions, commonly known as Regex, come into play. Regex enables complex text pattern matching in Excel for efficient data manipulation. But Excel lacks native regex functions. This guide shows how to use regex in Excel with formulas, VBA and UDFs.
What is regular expression?
A regular expression (REGEX) is a character sequence defining a search pattern. A REGEX pattern can consist of literal characters, such as “abc”, or special characters, such as “.”, “", “+”, “?”, and more. Special characters have special meanings and functions in REGEX.
A REGEX pattern can also contain groups enclosed by parentheses “( )”. Groups can be used to capture parts of the matched text, or to apply quantifiers or modifiers to the whole group. For example, “(ab)+” matches one or more occurrences of “ab”, and “(\d{3})-(\d{4})” matches a phone number and captures the area code and the local number.
REGEX is a powerful and flexible way to search for and match patterns in text strings. You can use REGEX to perform various tasks, such as:
Extracting specific information from a text string, such as names, dates, numbers, etc.
Replacing parts of a text string with another text string, such as correcting spelling errors, formatting data, etc.
Validating user input, such as checking if an email address or a password is valid.
Transforming data, such as splitting or joining text strings, changing cases, etc.
Does Excel support regex?
Unfortunately, there are no built-in REGEX functions in Excel. This means that you cannot use REGEX directly in formulas or functions like FIND, REPLACE, SEARCH, etc. However, some ways to use REGEX in Excel with some workarounds still exist. In the next section, you will learn 3 methods to use REGEX in Excel with examples and tips.
What are the Regex cheat sheets in Excel?
Before diving into the 3 methods to use REGEX in Excel, let’s review some of the most common and useful REGEX patterns you can use in Excel. Here is a table that summarizes some of the basic REGEX patterns that you can use in Excel:
Pattern |
Description |
Example |
---|---|---|
. |
Matches any single character except newline |
.at matches cat, bat, rat |
[ ] |
Matches any single character in brackets |
[abc] matches a,b,c |
[^ ] |
Matches any single character not in brackets |
[^abc] matches anything except a,b,c |
* |
Matches zero or more occurrences of the preceding character |
ab*c matches ac, abc, abbc |
+ |
Matches one or more occurrences of the preceding character |
ab+c matches abc, abbc |
? |
Matches zero or one occurrence of the preceding character |
ab?c matches ac,bc |
{n} |
Matches exactly n occurrences of the preceding character |
ab{2}c matches abbc |
{n,m} |
Matches at least n and at most m occurrences of the preceding character |
ab{1,3}c matches abc ,abbc ,abbbc |
( ) |
Groups and captures the pattern in parentheses |
(ab)+ matches ab ,abab ,ababab |
These are just some of the basic REGEX patterns that you can use in Excel. There are many more advanced patterns that you can use to create complex rules and logic for your data manipulation tasks. For more information on REGEX syntax and features, you can refer to this cheat sheet or this tutorial.
3 Common Ways to use REGEX to match patterns in Excel
Now that you have learned some of the basic REGEX patterns, let’s see how to use them in Excel. In this section, you will learn 3 methods to use REGEX in Excel with examples and tips. Each method has its own advantages and disadvantages, so you can choose the one that suits your needs best.
Method 1: Using Combined Formula
One way to use REGEX in Excel is to combine some of the built-in functions and formulas that can mimic some of the REGEX features. For example, you can use the SUBSTITUTE function to replace parts of a text string with another text string or the LEN function to count the number of characters in a text string.
In this example, our REGEX criteria dictate that the total character length must be 9. The first 3 characters should consist of uppercase letters, the subsequent 3 should be numeric values, and the final 3 should be lowercase letters. To accomplish this, we will employ a combination of several Excel functions, including AND, LEN, COUNT, FIND, MID, LEFT, ROW, INDIRECT, and UPPER.
Step 1: To begin, we need to establish two named ranges. Navigate to the Formulas tab and select the Defined Names group, then choose Name Manager.
Step 2: The Name Manager dialog box will appear. Click on New.
Step 3: In the subsequent New Name wizard, enter “Letters” in the Name box and the following list in the Refers to field:
="abcdefghijklmnopqrstuvwxyz"
And click OK.
Step 4: Return to the Name Manager dialog box and click New.
Step 5: In the new New Name wizard, enter "Numbers" in the Name box and the following list in the Refers to field:
= "1234567890"
Finally, click OK.
Step 6: Finally, within the Name Manager dialog box, click Close.
Step 7: In this step, we will utilize the named ranges created earlier in a formula. Input the following formula into cell C5:
=AND(LEN(B5)=9, COUNT(FIND(MID(LEFT(B5,3), ROW(INDIRECT("1:"&LEN(LEFT(B5,3)))),1), UPPER(Letters)))=LEN(LEFT(B5,3)), COUNT(FIND(MID(MID(B5,4,3), ROW(INDIRECT("1:"&LEN(MID(B5,4,3)))),1), Numbers))=LEN(MID(B5,4,3)), ISNUMBER(FIND(RIGHT(B5), Letters)))
Formula Breakdown:
COUNT(FIND(MID(LEFT(B5,3), ROW(INDIRECT("1:"&LEN(LEFT(B5,3)))),1), UPPER(Letters)))=LEN(LEFT(B5,3)): Checks if the first 3 characters are uppercase letters.
COUNT(FIND(MID(MID(B5,4,3), ROW(INDIRECT("1:"&LEN(MID(B5,4,3)))),1), Numbers))=LEN(MID(B5,4,3)): Validates whether the middle 3 characters are numeric digits.
ISNUMBER(FIND(RIGHT(B5), Letters)): Verifies if the last 3 characters are lowercase letters.
LEN(B5)=9: Ensures the total character length is 9.
Step 8: Finally, drag down the Fill Handle tool to apply the formula to other cells.
You will receive a "TRUE" outcome when the pattern aligns with the REGEX criteria; otherwise, it will display "FALSE."
Method 2: Create VBA Function
Another way to use REGEX in Excel is to create a custom VBA function that can use the RegExp object from the Microsoft VBScript Regular Expressions library. This library provides a set of methods and properties that allow you to create and execute REGEX patterns in VBA.
Example: We will construct a custom function using VBA coding to effectively identify and extract characters following the initial 4 letters that conform to a specified REGEX pattern in Excel.
Step 1: Open the Visual Basic Editor by pressing Alt + F11.
Step 2: Go to Tools > References.
Step 3: Check the box for Microsoft VBScript Regular Expressions 5.5.
Step 4: Select Insert > Module.
Step 5: Insert a new module and paste the following code:
Function match_pat(val_rng As Range) As String
Dim char_form, char_renew, char_data As String
Dim regEx As New RegExp
char_form = "^[A-Za-z]{1,4}"
char_renew = ""
If char_form "" Then
char_data = val_rng.Value
With regEx
.IgnoreCase = False
.Pattern = char_form
End With
If regEx.Test(char_data) Then
match_pat = regEx.Replace(char_data, char_renew)
Else
match_pat = " "
End If
End If
End Function
Formula Breakdown:
To begin, within the "match_pat" function, we define "val_rng" as a Range, and the function's output is a string.
In this context, we declare "char_form," "char_renew," "char_data" as String variables, and introduce "regEx" as a New RegExp object.
We assign our specific regular expression pattern, "^([A-Za-z]{1,4})", to the "char_form" variable. This pattern signifies that the initial 4 characters should be either lowercase or uppercase letters. "char_renew" is initially set to an empty string.
We employ an IF statement to implement subsequent actions for non-blank characters.
The input data range, "val_rng," is associated with the "char_data" variable.
Within the WITH statement, we define the properties of the "regEx" object.
The "Test(char_data)" function scans the input data for the specified pattern. If a match is found, it returns TRUE and proceeds to execute the subsequent line, which utilizes the REPLACE function to replace the initial 4 characters with blanks.
In the event of a FALSE result, an empty string is returned.
Step 6: Back to the sheet. In cell C5, enter the following formula:
=match_pat(B5)
Here, B5 represents our input data, and the "match_pat" function will return the characters following the initial 4 letters.
Step 7: Drag the Fill Handle downward.
As a result, you will observe the extracted content in the "Extracted Portion" column.
Method 3: Using User-Defined Function
The third way to use REGEX in Excel is to use a user-defined function (UDF) that can invoke an external tool or service that supports REGEX.
Within this section, we are establishing a versatile function designed to facilitate REGEX pattern matching in Excel. This function is user-driven, allowing the user to specify the pattern they wish to apply.
Step 1: Follow the instructions from Step 1 to Step 3 in Method 2.
Step 2: Now, proceed to input the following code into your created module:
Function matchP(val_rng As Range, char_form As String) As Variant
Dim storeV() As Variant
Dim limit_1, limit_2, R_count, C_count As Long
On Error GoTo handleER
matchP = storeV
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.IgnoreCase = False
.Pattern = char_form
End With
R_count = val_rng.Rows.Count
C_count = val_rng.Columns.Count
ReDim storeV(1 To R_count, 1 To C_count)
For limit_1 = 1 To R_count
For limit_2 = 1 To C_count
storeV(limit_1, limit_2) = regEx.Test(val_rng.Cells(limit_1, limit_2).Value)
Next
Next
matchP = storeV
Exit Function
handleER:
matchP = CVErr(xlErrValue)
End Function
Formula Breakdown:
In the "matchP" function, we declare "val_rng" as a Range and "char_form" as a String. The function returns a Variant.
We declare "storeV()" as a Variant array, and "limit_1," "limit_2," "R_count," and "C_count" as Long variables.
Initially, the "storeV" array is assigned to the function's return value.
We create the "regEx" object and utilize the WITH statement to configure its properties. Specifically, we set "IgnoreCase" to False and define the "Pattern" based on the user-specified "char_form."
"R_count" and "C_count" calculate the total number of rows and columns in the input range. Subsequently, we resize the dynamic array "storeV()" accordingly.
Utilizing two FOR loops, we test the values within the input range, iterating through all rows and columns. The results are stored in the "storeV()" array, which serves as the output.
For example: to assess patterns involving a combination of letters and numbers within the "Pattern" column, follow these steps:
Step 1: Back to Excel sheet and input the following formula in cell C5:
=matchP(B5,"\D{4}\d{4}")
"\D{4}\d{4}" represents the pattern. "\D{4}" indicates any non-digit characters in the first 4 positions, while "\d{4}" signifies any digits in the last 4 positions.
Step 2: Drag the Fill Handle down.
You will obtain "TRUE" for patterns where the first 4 positions contain letters, followed by 4 digits.
Another example, we examine various email address patterns by utilizing REGEX in Excel. Input the following formula:
=matchP(B5,"[\w\.\-]+@[A-Za-z0-9]+[A-Za-z0-9\.\-]*[A-Za-z0-9]+")
Formula Breakdown:
[\w\.\-]+: Represents the first part of an email address, which can include a name, digits, periods, or hyphens.
@: Signifies the "@" symbol.
[A-Za-z0-9]+[A-Za-z0-9\.\-]*[A-Za-z0-9]+: This part corresponds to the domain name in an email address. It may consist of uppercase or lowercase letters, numbers, hyphens, dots, and more. The "+" and "*" denote multiple occurrences.
A Comparison of 3 Methods – Which One is Best for You?
Now that you have learned 3 methods to use REGEX in Excel, let’s compare them and see which one is best for you. Here is a brief overview of the differences between the 3 methods based on the previous section:
Method |
Advantages |
Disadvantages |
---|---|---|
Using Combined Formula |
Simple and easy to use, no additional tools or codes required |
Limited and cumbersome, long and complex formulas, cannot handle advanced REGEX features |
Create VBA Function |
Powerful and flexible, can use all REGEX features and syntax |
Requires programming skills and knowledge of VBA and REGEX, may not work on some versions or platforms of Excel |
Using User-Defined Function |
Convenient and easy to use, no code or library required |
Depends on availability and reliability of external tool or service, may have limitations or differences in REGEX syntax or features |
As you can see, each method has its own pros and cons, so you can choose the one that suits your needs best. Here are some tips to help you decide:
The combined formula method is a quick and simple solution for basic data manipulation tasks. However, it may not be able to handle complex or dynamic data.
The VBA function method is a powerful and flexible solution for complex or dynamic data manipulation tasks. However, it requires some programming skills and knowledge of VBA and REGEX. You may also need to enable macros or install libraries in Excel.
The user-defined function method is a convenient and easy solution, but it depends on the availability and reliability of the external tool or service. You may also need to adjust your REGEX syntax or features accordingly.
The user-defined function method is a popular way to use regular expressions, as it is easy to use and fast. It is also supported by many tools and services, so you can use it without writing any code or installing any libraries. However, this is just one opinion based on personal experience. You may have different preferences or needs depending on your situation.
Edit your spreadsheets for free - WPS Office
If you are looking for a free office software that can help you edit your spreadsheets easily and efficiently, you should check out WPS Office. WPS Office is a free office suite that can open, create, edit, save Microsoft office files, including Word, Excel, PowerPoint, fully compatible on Windows and Mac.
WPS Office has many features and benefits that make it a great choice for editing your spreadsheets. Here are some of them:
WPS Office has a familiar and intuitive user interface that resembles Microsoft Office. You can easily access all the functions and features that you need from the ribbon menu or toolbar.
WPS Office supports all kinds of file formats, such as .xls, .xlsx, .csv, .txt, .pdf, etc. You can open and edit any spreadsheet file without losing any formatting or data.
WPS Office has a powerful spreadsheet editor that offers a wide range of functions and features, such as formulas, charts, tables, filters, pivot tables, conditional formatting, data validation, etc. You can perform any data analysis or manipulation task with ease and accuracy.
WPS Office has a cloud service that allows you to sync and backup your files online. You can also share and collaborate on your files with others in real time. You can access your files from any device or platform with WPS Office installed.
WPS Office has a mobile app that lets you edit your spreadsheets on the go. You can view, edit, create, or save your files on your phone or tablet with the same functionality and compatibility as the desktop version.
WPS Office is a popular office software that has been praised for its performance and quality. Users can edit spreadsheets without any hassle or cost, and WPS Office is compatible with a variety of devices and platforms. Overall, WPS Office is a great option for users who need a free and reliable office software.
How to free download the WPS Office?
If you want to download and install WPS Office for free, you can follow these simple steps:
Step 1: Go to https://www.wps.com/ and click on the Free Download button.
Step 2: Run the file, click Install Now and follow the instructions to install WPS Office on your device.
WPS Office installers are also available for macOS, Linux (deb/rpm packages) and Android mobile. The iOS version can be directly downloaded from the App Store.
FAQs
1. Can I use Regex in older versions of Excel?
Yes, Regex capabilities work in Excel 2007 and above using the UDF approach. For legacy Excel 2003 or earlier, advanced regex is not possible.
2. Can I use Regex functions in Excel on a Mac?
Yes, the UDF method works to add Regex support on Excel for Mac. The VBA based methods will not work as Mac Excel does not have the VBScript type libraries required.
3. Is there an easier way to validate data with Regex in Excel?
Yes, the Excel Data Validation feature can be used with Regex to validate cell inputs. But this only checks new entries rather than manipulating existing data.
Summary
Regex provides powerful pattern matching capabilities to Excel users. While Excel does not have native regex functions, formulas, VBA and UDFs allow harnessing its potential for efficient text analysis and data cleaning. WPS Office offers a free alternative to easily view, create and edit Excel workbooks with regex support. Both casual and power users can benefit from its advanced feature