Not all of us are pro in writing regular expressions but many times we need to use regular expression in our programs. Often we know corresponding Linux glob expression pattern to match string corpus we are working with. But majority of programming API function call requires regex as an input. In this tutorial we will learn how to use Python's inbuilt
fnmatch module to convert unix pattern matching expression into to full fledged regular expressions. No familiarity with python is expected to follow this tutorial.
Glob Expressions - Quick Primer
Glob expression in linux are quick way to pattern match filenames in unix like filesystem. Globs are simple yet very powerful. Below are the some of the examples of glob patterns.
# Match all files ending with sh ls *sh
# Match all files with word
executein it ls *execute*
# Match all files starting with number ls [0-9]*
Glob to Regular Expression Conversion using Python
Python's inbuilt module
fnmatch provides a function called
translate which takes glob expression as an argument and returns regular expression as an output.
$ python >>> import fnmatch >>> fnmatch.translate('*sh') '.*sh\\Z(?ms)' >>> fnmatch.translate('*execute*') '.*execute.*\\Z(?ms)' >>> fnmatch.translate('[0-9]*') '[0-9].*\\Z(?ms)'
Compile result with re
You can use above output regular expressions in any language of your choice. However if you are using python then you can make use of inbuilt
re module to match string against regular expression. For matching simply compile output produced by
fnmatch.translate like below.
$ python >>> import fnmatch >>> import re # Storing output regular expression string in variable >>> reg_exp = fnmatch.translate('c*t') # Compiling stored regular expression >>> pattern = re.compile(reg_exp) # Matching regular expression against string >> bool(pattern.match('cat')) True >> bool(pattern.match('cap')) False
By using translate method of python's fnmatch module you can convert any glob expression into complex regular expressions and make use of those in any programming language for the pattern matching.