Not all of us are pro in writing regular expressions but many times we need to use regular expression in our programs. Often we know corresponding Linux glob expression pattern to match string corpus we are working with. But majority of programming API function call requires regex as an input. In this tutorial we will learn how to use Python's inbuilt fnmatch
module to convert unix pattern matching expression into to full fledged regular expressions. No familiarity with python is expected to follow this tutorial.
Glob Expressions - Quick Primer
Glob expression in linux are quick way to pattern match filenames in unix like filesystem. Globs are simple yet very powerful. Below are the some of the examples of glob patterns.
# Match all files ending with sh
ls *sh
# Match all files with word execute
in it
ls *execute*
# Match all files starting with number
ls [0-9]*
Glob to Regular Expression Conversion using Python
Python's inbuilt module fnmatch
provides a function called translate
which takes glob expression as an argument and returns regular expression as an output.
$ python
>>> import fnmatch
>>> fnmatch.translate('*sh')
'.*sh\\Z(?ms)'
>>> fnmatch.translate('*execute*')
'.*execute.*\\Z(?ms)'
>>> fnmatch.translate('[0-9]*')
'[0-9].*\\Z(?ms)'
Compile result with re
You can use above output regular expressions in any language of your choice. However if you are using python then you can make use of inbuilt re
module to match string against regular expression. For matching simply compile output produced by fnmatch.translate
like below.
$ python
>>> import fnmatch
>>> import re
# Storing output regular expression string in variable
>>> reg_exp = fnmatch.translate('c*t')
# Compiling stored regular expression
>>> pattern = re.compile(reg_exp)
# Matching regular expression against string
>> bool(pattern.match('cat'))
True
>> bool(pattern.match('cap'))
False
Conclusion
By using translate method of python's fnmatch module you can convert any glob expression into complex regular expressions and make use of those in any programming language for the pattern matching.