What is the most elegant way to check if the directory a file is going to be written to exists, and if not, create the directory using Python? Here is what I tried:
import os file_path = "/my/directory/filename.txt" directory = os.path.dirname(file_path) try: os.stat(directory) except: os.mkdir(directory) f = file(filename)
Somehow, I missed
os.path.exists (thanks kanja, Blair, and Douglas). This is what I have now:
def ensure_dir(file_path): directory = os.path.dirname(file_path) if not os.path.exists(directory): os.makedirs(directory)
Is there a flag for “open”, that makes this happen automatically?
How to create a nested directory in Python? Answer #1:
On Python ≥ 3.5, use
from pathlib import Path Path("/my/directory").mkdir(parents=True, exist_ok=True)
For older versions of Python, I see two answers with good qualities, each with a small flaw, so I will give my take on it:
import os if not os.path.exists(directory): os.makedirs(directory)
As noted in comments and elsewhere, there’s a race condition – if the directory is created between the
os.path.exists and the
os.makedirs calls, the
os.makedirs will fail with an
OSError. Unfortunately, blanket-catching
OSError and continuing is not foolproof, as it will ignore a failure to create the directory due to other factors, such as insufficient permissions, full disk, etc.
One option would be to trap the
OSError and examine the embedded error code:
import os, errno try: os.makedirs(directory) except OSError as e: if e.errno != errno.EEXIST: raise
Alternatively, there could be a second
os.path.exists, but suppose another created the directory after the first check, then removed it before the second one – we could still be fooled.
Depending on the application, the danger of concurrent operations may be more or less than the danger posed by other factors such as file permissions. The developer would have to know more about the particular application being developed and its expected environment before choosing an implementation.
Modern versions of Python improve this code quite a bit, both by exposing
FileExistsError (in 3.3+)…
try: os.makedirs("path/to/directory") except FileExistsError: # directory already exists pass
…and by allowing a keyword argument to
exist_ok (in 3.2+).
os.makedirs("path/to/directory", exist_ok=True) # succeeds even if directory exists.
import pathlib pathlib.Path('/my/directory').mkdir(parents=True, exist_ok=True)
pathlib.Path.mkdir as used above recursively creates the directory and does not raise an exception if the directory already exists. If you don’t need or want the parents to be created, skip the
If using Python 3.4, even though it comes with
pathlib, it is missing the useful
exist_ok option. The backport is intended to offer a newer and superior implementation of
mkdir which includes this missing option.
import os os.makedirs(path, exist_ok=True)
os.makedirs as used above recursively creates the directory and does not raise an exception if the directory already exists. It has the optional
exist_ok argument only if using Python 3.2+, with a default value of
False. This argument does not exist in Python 2.x up to 2.7. As such, there is no need for manual exception handling as with Python 2.7.
import os try: os.makedirs(path) except OSError: if not os.path.isdir(path): raise
While a naive solution may first use
os.path.isdir followed by
os.makedirs, the solution above reverses the order of the two operations. In doing so, it prevents a common race condition having to do with a duplicated attempt at creating the directory, and also disambiguates files from directories.
Note that capturing the exception and using
errno is of limited usefulness because
OSError: [Errno 17] File exists, i.e.
errno.EEXIST, is raised for both files and directories. It is more reliable simply to check if the directory exists.
mkpath creates the nested directory, and does nothing if the directory already exists. This works in both Python 2 and 3.
import distutils.dir_util distutils.dir_util.mkpath(path)
Per Bug 10948, a severe limitation of this alternative is that it works only once per python process for a given path. In other words, if you use it to create a directory, then delete the directory from inside or outside Python, then use
mkpath again to recreate the same directory,
mkpath will simply silently use its invalid cached info of having previously created the directory, and will not actually make the directory again. In contrast,
os.makedirs doesn’t rely on any such cache. This limitation may be okay for some applications.
With regard to the directory’s mode, please refer to the documentation if you care about it.
Using try except and the right error code from errno module gets rid of the race condition and is cross-platform:
import os import errno def make_sure_path_exists(path): try: os.makedirs(path) except OSError as exception: if exception.errno != errno.EEXIST: raise
In other words, we try to create the directories, but if they already exist we ignore the error. On the other hand, any other error gets reported. For example, if you create dir ‘a’ beforehand and remove all permissions from it, you will get an
OSError raised with
errno.EACCES (Permission denied, error 13).
I would personally recommend that you use
os.path.isdir() to test instead of
>>> os.path.exists('/tmp/dirname') True >>> os.path.exists('/tmp/dirname/filename.etc') True >>> os.path.isdir('/tmp/dirname/filename.etc') False >>> os.path.isdir('/tmp/fakedirname') False
If you have:
>>> dir = raw_input(":: ")
And a foolish user input:
… You’re going to end up with a directory named
filename.etc when you pass that argument to
os.makedirs() if you test with
Insights on the specifics of this situation
You give a particular file at a certain path and you pull the directory from the file path. Then after making sure you have the directory, you attempt to open a file for reading. To comment on this code:
filename = "/my/directory/filename.txt" dir = os.path.dirname(filename)
We want to avoid overwriting the builtin function,
filepath or perhaps
fullfilepath is probably a better semantic name than
filename so this would be better written:
import os filepath = '/my/directory/filename.txt' directory = os.path.dirname(filepath)
Your end goal is to open this file, you initially state, for writing, but you’re essentially approaching this goal (based on your code) like this, which opens the file for reading:
if not os.path.exists(directory): os.makedirs(directory) f = file(filename)
Assuming opening for reading
Why would you make a directory for a file that you expect to be there and be able to read?
Just attempt to open the file.
with open(filepath) as my_file: do_stuff(my_file)
If the directory or file isn’t there, you’ll get an
IOError with an associated error number:
errno.ENOENT will point to the correct error number regardless of your platform. You can catch it if you want, for example:
import errno try: with open(filepath) as my_file: do_stuff(my_file) except IOError as error: if error.errno == errno.ENOENT: print 'ignoring error because directory or file is not there' else: raise
Assuming we’re opening for writing
This is probably what you’re wanting.
In this case, we probably aren’t facing any race conditions. So just do as you were, but note that for writing, you need to open with the
w mode (or
a to append). It’s also a Python best practice to use the context manager for opening files.
import os if not os.path.exists(directory): os.makedirs(directory) with open(filepath, 'w') as my_file: do_stuff(my_file)
However, say we have several Python processes that attempt to put all their data into the same directory. Then we may have contention over creation of the directory. In that case it’s best to wrap the
makedirs call in a try-except block.
import os import errno if not os.path.exists(directory): try: os.makedirs(directory) except OSError as error: if error.errno != errno.EEXIST: raise with open(filepath, 'w') as my_file: do_stuff(my_file)
The direct answer to this is, assuming a simple situation where you don’t expect other users or processes to be messing with your directory:
if not os.path.exists(d): os.makedirs(d)
or if making the directory is subject to race conditions (i.e. if after checking the path exists, something else may have already made it) do this:
import errno try: os.makedirs(d) except OSError as exception: if exception.errno != errno.EEXIST: raise
But perhaps an even better approach is to sidestep the resource contention issue, by using temporary directories via
import tempfile d = tempfile.mkdtemp()
Here’s the essentials from the online doc:
mkdtemp(suffix='', prefix='tmp', dir=None) User-callable function to create and return a unique temporary directory. The return value is the pathname of the directory. The directory is readable, writable, and searchable only by the creating user. Caller is responsible for deleting the directory when done with it.
New in Python 3.5:
There’s a new
Path object (as of 3.4) with lots of methods one would want to use with paths – one of which is
(For context, I’m tracking my weekly rep with a script. Here’s the relevant parts of code from the script that allow me to avoid hitting Stack Overflow more than once a day for the same data.)
First the relevant imports:
from pathlib import Path import tempfile
We don’t have to deal with
os.path.join now – just join path parts with a
directory = Path(tempfile.gettempdir()) / 'sodata'
Then I idempotently ensure the directory exists – the
exist_ok argument shows up in Python 3.5:
Here’s the relevant part of the documentation:
FileExistsErrorexceptions will be ignored (same behavior as the
POSIX mkdir -pcommand), but only if the last path component is not an existing non-directory file.
Here’s a little more of the script – in my case, I’m not subject to a race condition, I only have one process that expects the directory (or contained files) to be there, and I don’t have anything trying to remove the directory.
todays_file = directory / str(datetime.datetime.utcnow().date()) if todays_file.exists(): logger.info("todays_file exists: " + str(todays_file)) df = pd.read_json(str(todays_file))
Path objects have to be coerced to
str before other APIs that expect
str paths can use them.
Perhaps Pandas should be updated to accept instances of the abstract base class,
Hope you learned something from this post.
Follow Programming Articles for more!