Using Glob With Directories

An Explanation of DIR.BLOG and How to Use It in Ruby

Little girl learning to code on desktop computer at home
Imgorthand/Getty Images

"Globbing" files (with Dir.glob) means you can use regular expression-like pattern matching to select just the files you want, such as all the XML files in a directory. 

The opposite, iterating over all the files in a directory, can be done with the Dir.foreach method.

Note: Even though Dir.blog is like regular expressions, it is not. It's very limited compared to Ruby's regular expressions and is more closely related to shell expansion wildcards.

Example of a Glob

The following glob will match all files ending in .rb in the current directory. It uses a single wildcard, the asterisk. The asterisk will match zero or more characters, so any file ending in .rb will match this glob, including a file called simply .rb, with nothing before the file extension and its preceding period. The glob method will return all files that match the globbing rules as an array, which can be saved for later use or iterated over.

 #!/usr/bin/env ruby
 
 Dir.glob('*.rb').each do|f|
 puts f
 end

Wildcards and More Information on Globs

There are only a few wildcards to learn:

  • * - Match zero or more characters. A glob consisting of only the asterisk and no other characters or wildcards will match all files in the current directory. The asterisk is usually combined with a file extension, if not more characters to narrow down the search.

  • ** - Match all directories recursively. This is used to descend into the directory tree and find all files in sub-directories of the current directory, rather than just files in the current directory. This wildcard is explored in the example code below.

  • ? - Match any one character. This is useful for finding files whose name are in a particular format. For example, 5 characters and a .xml extension could be expressed as ?????.xml.

  • [a-z] - Match any character in the character set. The set can be either a list of characters, or a range separated with the hyphen character. Character sets follow the same syntax as and behave in the same manner as character sets in regular expressions.

  • {a,b} - Match pattern a or b. Though this looks like a regular expression quantifier, it isn't. For example, in regular expression, the pattern a{1,2} will match 1 or 2 'a' characters. In globbing, it will match the string a1 or a2. Other patterns can be nested inside of this construct.

One thing to consider is case sensitivity. It's up to the operating system to determine whether TEST.txt and TeSt.TxT refer to the same file. On Linux and other systems, these are different files. On Windows, these will refer to the same file.

The operating system is also responsible for the order in which the results are displayed. It may differ if you're on Windows versus Linux, for example.

One final thing to note is the Dir[globstring] convenience method. This is functionally the same as Dir.glob(globstring) and is also semantically correct (you are indexing a directory, much like an array). For this reason, you may see Dir[] more often than Dir.glob, but they are the same thing.

Examples Using Wildcards

The following example program will demonstrate as many patterns as it can in many different combinations.

 #!/usr/bin/env ruby
 
 # Get all .xml files
 Dir['*.xml']
 
 # Get all files with 5 characters and a .jpg extension
 Dir['?????.jpg']
 
 # Get all jpg, png and gif images
 Dir['*.{jpg,png,gif}']
 
 # Descend into the directory tree and get all jpg images
 # Note: this will also file jpg images in the current directory
 Dir['**/*.jpg']
 
 # Descend into all directories starting with Uni and find all
 # jpg images.
 # Note: this only descends down one directory
 Dir['Uni**/*.jpg']
 
 # Descend into all directories starting with Uni and all
 # subdirectories of directories starting with Uni and find
 # all .jpg images
 Dir['Uni**/**/*.jpg']