EzDevInfo.com

Regex

A Swift µframework providing an NSRegularExpression-backed Regex type

Regular expression to match line that doesn't contain a word?

I know it's possible to match for a word and then reverse the matches using other tools (e.g. grep -v). However, I'd like to know if it's possible to match lines that don't contain a specific word (e.g. hede) using a regular expression?

Input:

hoho
hihi
haha
hede

# grep "Regex for do not contain hede" Input

Output:

hoho
hihi
haha

Source: (StackOverflow)

jQuery selector regular expressions

I am after documentation on using wildcard or regular expressions (not sure on the exact terminology) with a jQuery selector.

I have looked for this myself but have been unable to find information on the syntax and how to use it. Does anyone know where the documentation for the syntax is?

EDIT: The attribute filters allow you to select based on patterns of an attribute value.


Source: (StackOverflow)

Advertisements

What is a non capturing group? (?:)

After reading some tutorials I still don't get it.

Could someone explain how ?: is used and what it's good for?


Source: (StackOverflow)

How do you access the matched groups in a JavaScript regular expression?

I want to match a portion of a string using a regular expression and then access that parenthesized substring:

var myString = "something format_abc"; // I want "abc"

var arr = /(?:^|\s)format_(.*?)(?:\s|$)/.exec(myString);

console.log(arr);     // Prints: [" format_abc", "abc"] .. so far so good.
console.log(arr[1]);  // Prints: undefined  (???)
console.log(arr[0]);  // Prints: format_undefined (!!!)

What am I doing wrong?


I've discovered that there was nothing wrong with the regular expression code above: the actual string which I was testing against was this:

"date format_%A"

Reporting that "%A" is undefined seems a very strange behaviour, but it is not directly related to this question, so I've opened a new one, Why is a matched substring returning "undefined" in JavaScript?.


The issue was that console.log takes its parameters like a printf statement, and since the string I was logging ("%A") had a special value, it was trying to find the value of the next parameter.


Source: (StackOverflow)

How do you pass a variable to a Regular Expression JavaScript?

I would like to create a String.replaceAll() method in JavaScript and I'm thinking that using a RegEx would be most terse way to do it. However, I can't figure out how to pass a variable in to a RegEx. I can do this already which will replace all the instances of "B" with "A".

"ABABAB".replace(/B/g, "A");

But I want to do something like this:

String.prototype.replaceAll = function(replaceThis, withThis) {
    this.replace(/replaceThis/g, withThis);
};

But obviously this will only replace the text "replaceThis"...so how do I pass this variable in to my RegEx string?


Source: (StackOverflow)

How can I make my match non greedy in vim?

I have a big HTML file that has lots of markup that looks like this:

<p class="MsoNormal" style="margin: 0in 0in 0pt;">
  <span style="font-size: small; font-family: Times New Roman;">stuff here</span>
</p>

I'm trying to do a Vim search-and-replace to get rid of all class="" and style="" but I'm having trouble making the match ungreedy.

My first attempt was this

%s/style=".*?"//g

but Vim doesn't seem to like the ?. Unfortunately removing the ? makes the match too greedy.

How can I make my match ungreedy?


Source: (StackOverflow)

How to do a regular expression replace in MySQL?

I have a table with ~500k rows; varchar(255) UTF8 column filename contains a file name;

I'm trying to strip out various strange characters out of the filename - thought I'd use a character class: [^a-zA-Z0-9()_ .\-]

Now, is there a function in MySQL that lets you replace through a regular expression? I'm looking for a similar functionality to REPLACE() function - simplified example follows:

SELECT REPLACE('stackowerflow', 'ower', 'over');

Output: "stackoverflow"

/* does something like this exist? */
SELECT X_REG_REPLACE('Stackoverflow','/[A-Zf]/','-'); 

Output: "-tackover-low"

I know about REGEXP/RLIKE, but those only check if there is a match, not what the match is.

(I could do a "SELECT pkey_id,filename FROM foo WHERE filename RLIKE '[^a-zA-Z0-9()_ .\-]'" from a PHP script, do a preg_replace and then "UPDATE foo ... WHERE pkey_id=...", but that looks like a last-resort slow & ugly hack)


Source: (StackOverflow)

How to replace plain URLs with links?

I am using the function below to match URLs inside a given text and replace them for HTML links. The regular expression is working great, but currently I am only replacing the first match.

How I can replace all the URL? I guess I should be using the exec command, but I did not really figure how to do it.

function replaceURLWithHTMLLinks(text) {
    var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/i;
    return text.replace(exp,"<a rel='nofollow' href='$1'>$1</a>"); 
}

Source: (StackOverflow)

Can you provide some examples of why it is hard to parse XML and HTML with a regex?

One mistake I see people making over and over again is trying to parse XML or HTML with a regex. Here are a few of the reasons parsing XML and HTML is hard:

People want to treat a file as a sequence of lines, but this is valid:

<tag
attr="5"
/>

People want to treat < or <tag as the start of a tag, but stuff like this exists in the wild:

<img src="imgtag.gif" alt="<img>" />

People often want to match starting tags to ending tags, but XML and HTML allow tags to contain themselves (which traditional regexes cannot handle at all):

<span id="outer"><span id="inner">foo</span></span>

People often want to match against the content of a document (such as the famous "find all phone numbers on a given page" problem), but the data may be marked up (even if it appears to be normal when viewed):

<span class="phonenum">(<span class="area code">703</span>)
<span class="prefix">348</span>-<span class="linenum">3020</span></span>

Comments may contain poorly formatted or incomplete tags:

<a rel='nofollow' href="foo">foo</a>
<!-- FIXME:
    <a rel='nofollow' href="
-->
<a rel='nofollow' href="bar">bar</a>

What other gotchas are you aware of?


Source: (StackOverflow)

Generic htaccess redirect www to non-www

I would like to redirect www.example.com to example.com. The following htaccess code makes this happen:

RewriteCond %{HTTP_HOST} ^www\.example\.com [NC]
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]

But, is there a way to do this in a generic fashion without hardcoding the domain name?


Source: (StackOverflow)

Regular expression to search for Gadaffi

I'm trying to search for the word Gadaffi. What's the best regular expression to search for this?

My best attempt so far is:

\b[KG]h?add?af?fi$\b

But I still seem to be missing some journals. Any suggestions?

Update: I found a pretty extensive list here: http://blogs.abcnews.com/theworldnewser/2009/09/how-many-different-ways-can-you-spell-gaddafi.html

The answer below matches all the 30 variants:

Gadaffi
Gadafi
Gadafy
Gaddafi
Gaddafy
Gaddhafi
Gadhafi
Gathafi
Ghadaffi
Ghadafi
Ghaddafi
Ghaddafy
Gheddafi
Kadaffi
Kadafi
Kaddafi
Kadhafi
Kazzafi
Khadaffy
Khadafy
Khaddafi
Qadafi
Qaddafi
Qadhafi
Qadhdhafi
Qadthafi
Qathafi
Quathafi
Qudhafi
Kad'afi

Source: (StackOverflow)

\d is less efficient than [0-9]

I made a comment yesterday on an answer where someone had used [0123456789] in a regular expression rather than [0-9] or \d. I said it was probably more efficient to use a range or digit specifier than a character set.

I decided to test that out today and found out to my surprise that (in the C# regex engine at least) \d appears to be less efficient than either of the other two which don't seem to differ much. Here is my test output over 10000 random strings of 1000 random characters with 5077 actually containing a digit:

Regular expression \d           took 00:00:00.2141226 result: 5077/10000
Regular expression [0-9]        took 00:00:00.1357972 result: 5077/10000  63.42 % of first
Regular expression [0123456789] took 00:00:00.1388997 result: 5077/10000  64.87 % of first

It's a surprise to me for two reasons:

  1. I would have thought the range would be implemented much more efficiently than the set.
  2. I can't understand why \d is worse than [0-9]. Is there more to \d than simply shorthand for [0-9]?

Here is the test code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Text.RegularExpressions;

namespace SO_RegexPerformance
{
    class Program
    {
        static void Main(string[] args)
        {
            var rand = new Random(1234);
            var strings = new List<string>();
            //10K random strings
            for (var i = 0; i < 10000; i++)
            {
                //Generate random string
                var sb = new StringBuilder();
                for (var c = 0; c < 1000; c++)
                {
                    //Add a-z randomly
                    sb.Append((char)('a' + rand.Next(26)));
                }
                //In roughly 50% of them, put a digit
                if (rand.Next(2) == 0)
                {
                    //Replace one character with a digit, 0-9
                    sb[rand.Next(sb.Length)] = (char)('0' + rand.Next(10));
                }
                strings.Add(sb.ToString());
            }

            var baseTime = testPerfomance(strings, @"\d");
            Console.WriteLine();
            var testTime = testPerfomance(strings, "[0-9]");
            Console.WriteLine("  {0:P2} of first", testTime.TotalMilliseconds / baseTime.TotalMilliseconds);
            testTime = testPerfomance(strings, "[0123456789]");
            Console.WriteLine("  {0:P2} of first", testTime.TotalMilliseconds / baseTime.TotalMilliseconds);
        }

        private static TimeSpan testPerfomance(List<string> strings, string regex)
        {
            var sw = new Stopwatch();

            int successes = 0;

            var rex = new Regex(regex);

            sw.Start();
            foreach (var str in strings)
            {
                if (rex.Match(str).Success)
                {
                    successes++;
                }
            }
            sw.Stop();

            Console.Write("Regex {0,-12} took {1} result: {2}/{3}", regex, sw.Elapsed, successes, strings.Count);

            return sw.Elapsed;
        }
    }
}

Source: (StackOverflow)

RegEx match open tags except XHTML self-contained tags

I need to match all of these opening tags:

<p>
<a rel='nofollow' href="foo">

But not these:

<br />
<hr class="foo" />

I came up with this and wanted to make sure I've got it right. I am only capturing the a-z.

<([a-z]+) *[^/]*?>

I believe it says:

  • Find a less-than, then
  • Find (and capture) a-z one or more times, then
  • Find zero or more spaces, then
  • Find any character zero or more times, greedy, except /, then
  • Find a greater-than

Do I have that right? And more importantly, what do you think?


Source: (StackOverflow)

Regular expression search replace in Sublime Text 2

I'm looking to do search replace on regular expressions in Sublime Text 2. The documentation on this is rather anemic. Specifically, I want to do a replace on groups, so something like converting this text:

Hello my name is bob

And this search term:

Find what: my name is (\w)+

Replace with: my name used to be $(1)

The search term works just fine but I can't figure out a way to actually do a replace using the regexp group.


Source: (StackOverflow)

Split Java String by New Line

I'm trying to split text in a JTextArea using a regex to split the String by \n However, this does not work and I also tried by \r\n|\r|n and many other combination of regexes. Code:

public void insertUpdate(DocumentEvent e) {
    String split[], docStr = null;
    Document textAreaDoc = (Document)e.getDocument();

    try {
        docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset());
    } catch (BadLocationException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }

    split = docStr.split("\\n");
}

Source: (StackOverflow)