
regex interview questions

Top regex frequently asked interview questions

A comprehensive regex for phone number validation

I'm trying to put together a comprehensive regex to validate phone numbers. Ideally it would handle international formats, but it must handle US formats, including the following:

  • 1-234-567-8901
  • 1-234-567-8901 x1234
  • 1-234-567-8901 ext1234
  • 1 (234) 567-8901
  • 1.234.567.8901
  • 1/234/567/8901
  • 12345678901

I'll answer with my current attempt, but I'm hoping somebody has something better and/or more elegant.

Source: (StackOverflow)

How do you pass a variable to a Regular Expression JavaScript?

I would like to create a String.replaceAll() method in JavaScript and I'm thinking that using a RegEx would be most terse way to do it. However, I can't figure out how to pass a variable in to a RegEx. I can do this already which will replace all the instances of "B" with "A".

"ABABAB".replace(/B/g, "A");

But I want to do something like this:

String.prototype.replaceAll = function(replaceThis, withThis) {
    this.replace(/replaceThis/g, withThis);

But obviously this will only replace the text "replaceThis"...so how do I pass this variable in to my RegEx string?

Source: (StackOverflow)


Regular expression to match line that doesn't contain a word?

I know it's possible to match a word and then reverse the matches using other tools (e.g. grep -v). However, I'd like to know if it's possible to match lines that don't contain a specific word (e.g. hede) using a regular expression?




# grep "Regex for doesn't contain hede" Input

Desired output:


Source: (StackOverflow)

Validate email address in JavaScript?

How can an email address be validated in JavaScript?

Source: (StackOverflow)

startsWith() and endsWith() functions in PHP

How can I write two functions that would take a string and return if it starts with the specified character/string or ends with it?

For example:

$str = '|apples}';

echo startsWith($str, '|'); //Returns true
echo endsWith($str, '}'); //Returns true

Source: (StackOverflow)

Using a regular expression to validate an email address

Over the years I have slowly developed a regular expression that validates MOST email addresses correctly, assuming they don't use an IP address as the server part.

I use it in several PHP programs, and it works most of the time. However, from time to time I get contacted by someone that is having trouble with a site that uses it, and I end up having to make some adjustment (most recently I realized that I wasn't allowing 4-character TLDs).

What's the best regular expression you have or have seen for validating emails?

I've seen several solutions that use functions that use several shorter expressions, but I'd rather have one long complex expression in a simple function instead of several short expression in a more complex function.

Source: (StackOverflow)

How to negate specific word in regex?

I know that I can negate group of chars as in [^bar] but I need a regular expression where negation applies to the specific word - so in my example how do I negate an actual "bar" and not "any chars in bar"?

Source: (StackOverflow)

How to count string occurrence in string?

How can I count the number of times a particular string occurs in another string. For example, this is what I am trying to do in Javascript:

var temp = "This is a string.";
alert(temp.count("is")); //should output '2'

Source: (StackOverflow)

jQuery selector regular expressions

I am after documentation on using wildcard or regular expressions (not sure on the exact terminology) with a jQuery selector.

I have looked for this myself but have been unable to find information on the syntax and how to use it. Does anyone know where the documentation for the syntax is?

EDIT: The attribute filters allow you to select based on patterns of an attribute value.

Source: (StackOverflow)

Split Java String by New Line

I'm trying to split text in a JTextArea using a regex to split the String by \n However, this does not work and I also tried by \r\n|\r|n and many other combination of regexes. Code:

public void insertUpdate(DocumentEvent e) {
    String split[], docStr = null;
    Document textAreaDoc = (Document)e.getDocument();

    try {
        docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset());
    } catch (BadLocationException e1) {
        // TODO Auto-generated catch block

    split = docStr.split("\\n");

Source: (StackOverflow)

How can I make my match non greedy in vim?

I have a big HTML file that has lots of markup that looks like this:

<p class="MsoNormal" style="margin: 0in 0in 0pt;">
  <span style="font-size: small; font-family: Times New Roman;">stuff here</span>

I'm trying to do a Vim search-and-replace to get rid of all class="" and style="" but I'm having trouble making the match ungreedy.

My first attempt was this


but Vim doesn't seem to like the ?. Unfortunately removing the ? makes the match too greedy.

How can I make my match ungreedy?

Source: (StackOverflow)

How to do a regular expression replace in MySQL?

I have a table with ~500k rows; varchar(255) UTF8 column filename contains a file name;

I'm trying to strip out various strange characters out of the filename - thought I'd use a character class: [^a-zA-Z0-9()_ .\-]

Now, is there a function in MySQL that lets you replace through a regular expression? I'm looking for a similar functionality to REPLACE() function - simplified example follows:

SELECT REPLACE('stackowerflow', 'ower', 'over');

Output: "stackoverflow"

/* does something like this exist? */
SELECT X_REG_REPLACE('Stackoverflow','/[A-Zf]/','-'); 

Output: "-tackover-low"

I know about REGEXP/RLIKE, but those only check if there is a match, not what the match is.

(I could do a "SELECT pkey_id,filename FROM foo WHERE filename RLIKE '[^a-zA-Z0-9()_ .\-]'" from a PHP script, do a preg_replace and then "UPDATE foo ... WHERE pkey_id=...", but that looks like a last-resort slow & ugly hack)

Source: (StackOverflow)

Regular expression to match DNS hostname or IP Address?

Does anyone have a regular expression handy that will match any legal DNS hostname or IP address?

It's easy to write one that works 95% of the time, but I'm hoping to get something that's well tested to exactly match the latest RFC specs for DNS hostnames.

Source: (StackOverflow)

Regular Expression for alphanumeric and underscores

I would like to have a regular expression that checks if a string contains only upper and lowercase letters, numbers, and underscores.

Source: (StackOverflow)

Can you provide some examples of why it is hard to parse XML and HTML with a regex?

One mistake I see people making over and over again is trying to parse XML or HTML with a regex. Here are a few of the reasons parsing XML and HTML is hard:

People want to treat a file as a sequence of lines, but this is valid:


People want to treat < or <tag as the start of a tag, but stuff like this exists in the wild:

<img src="imgtag.gif" alt="<img>" />

People often want to match starting tags to ending tags, but XML and HTML allow tags to contain themselves (which traditional regexes cannot handle at all):

<span id="outer"><span id="inner">foo</span></span>

People often want to match against the content of a document (such as the famous "find all phone numbers on a given page" problem), but the data may be marked up (even if it appears to be normal when viewed):

<span class="phonenum">(<span class="area code">703</span>)
<span class="prefix">348</span>-<span class="linenum">3020</span></span>

Comments may contain poorly formatted or incomplete tags:

<a rel='nofollow' href="foo">foo</a>
<!-- FIXME:
    <a rel='nofollow' href="
<a rel='nofollow' href="bar">bar</a>

What other gotchas are you aware of?

Source: (StackOverflow)