Tech Monger

Programming, Web Development and Computer Science.

Skip to main content| Skip to information by topic

Generate Regular Expression from Glob Patterns

Not all of us are pro in writing regular expressions but many times we need to use regular expression in our programs. Often we know corresponding Linux glob expression pattern to match string corpus we are working with. But majority of programming API function call requires regex as an input. In this tutorial we will learn how to use Python's inbuilt fnmatch module to convert unix pattern matching expression into to full fledged regular expressions. No familiarity with python is expected to follow this tutorial.

Glob Expressions - Quick Primer

Glob expression in linux are quick way to pattern match filenames in unix like filesystem. Globs are simple yet very powerful. Below are the some of the examples of glob patterns.

# Match all files ending with sh
ls  *sh
# Match all files with word execute in it
ls  *execute*
# Match all files starting with number
ls  [0-9]*

Glob to Regular Expression Conversion using Python

Python's inbuilt module fnmatch provides a function called translate which takes glob expression as an argument and returns regular expression as an output.

$ python

>>> import fnmatch

>>> fnmatch.translate('*sh')
'.*sh\\Z(?ms)'

>>> fnmatch.translate('*execute*')
'.*execute.*\\Z(?ms)'

>>> fnmatch.translate('[0-9]*')
'[0-9].*\\Z(?ms)'

Compile result with re

You can use above output regular expressions in any language of your choice. However if you are using python then you can make use of inbuilt re module to match string against regular expression. For matching simply compile output produced by fnmatch.translate like below.

$ python
>>> import fnmatch
>>> import re

# Storing output regular expression string in variable
>>> reg_exp = fnmatch.translate('c*t')

# Compiling stored regular expression
>>> pattern = re.compile(reg_exp)

# Matching regular expression against string
>> bool(pattern.match('cat'))
True

>> bool(pattern.match('cap'))
False

Conclusion

By using translate method of python's fnmatch module you can convert any glob expression into complex regular expressions and make use of those in any programming language for the pattern matching.

Tagged Under : Linux Python