How to deal with SettingWithCopyWarning in Pandas? [Answered]

Sample problem:

Background

I just upgraded my Pandas from 0.11 to 0.13.0rc1. Now, the application is popping out many new warnings. One of them like this:

E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE

I want to know what exactly it means? Do I need to change something?

How should I suspend the warning if I insist to use quote_df['TVol'] = quote_df['TVol']/TVOL_SCALE?

The function that gives errors

def _decode_stock_quote(list_of_150_stk_str):
    """decode the webpage and return dataframe"""

    from cStringIO import StringIO

    str_of_all = "".join(list_of_150_stk_str)

    quote_df = pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
    quote_df.rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
    quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]
    quote_df['TClose'] = quote_df['TPrice']
    quote_df['RT']     = 100 * (quote_df['TPrice']/quote_df['TPCLOSE'] - 1)
    quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE
    quote_df['TAmt']   = quote_df['TAmt']/TAMT_SCALE
    quote_df['STK_ID'] = quote_df['STK'].str.slice(13,19)
    quote_df['STK_Name'] = quote_df['STK'].str.slice(21,30)#.decode('gb2312')
    quote_df['TDate']  = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])
    
    return quote_df

More error messages

E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE
E:\FinReporter\FM_EXT.py:450: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TAmt']   = quote_df['TAmt']/TAMT_SCALE
E:\FinReporter\FM_EXT.py:453: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TDate']  = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])

How to deal with SettingWithCopyWarning in Pandas? Answer #1:

The SettingWithCopyWarning was created to flag potentially confusing “chained” assignments, such as the following, which does not always work as expected, particularly when the first selection returns a copy. [see GH5390 and GH5597 for background discussion.]

df[df['A'] > 2]['B'] = new_val  # new_val not set in df

The warning offers a suggestion to rewrite as follows:

df.loc[df['A'] > 2, 'B'] = new_val

However, this doesn’t fit your usage, which is equivalent to:

df = df[df['A'] > 2]
df['B'] = new_val

While it’s clear that you don’t care about writes making it back to the original frame (since you are overwriting the reference to it), unfortunately this pattern cannot be differentiated from the first chained assignment example. Hence the (false positive) warning. The potential for false positives is addressed in the docs on indexing, if you’d like to read further. You can safely disable this new warning with the following assignment.

import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'

Answer #2:

How to deal with SettingWithCopyWarning in Pandas?

This post is meant for readers who,

  1. Would like to understand what this warning means
  2. Would like to understand different ways of suppressing this warning
  3. Would like to understand how to improve their code and follow good practices to avoid this warning in the future.

Setup

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (3, 5)), columns=list('ABCDE'))
df
   A  B  C  D  E
0  5  0  3  3  7
1  9  3  5  2  4
2  7  6  8  8  1

What is the SettingWithCopyWarning?

To know how to deal with this warning, it is important to understand what it means and why it is raised in the first place.

When filtering DataFrames, it is possible slice/index a frame to return either a view, or a copy, depending on the internal layout and various implementation details. A “view” is, as the term suggests, a view into the original data, so modifying the view may modify the original object. On the other hand, a “copy” is a replication of data from the original, and modifying the copy has no effect on the original.

As mentioned by other answers, the SettingWithCopyWarning was created to flag “chained assignment” operations. Consider df in the setup above. Suppose you would like to select all values in column “B” where values in column “A” is > 5. Pandas allows you to do this in different ways, some more correct than others. For example,

df[df.A > 5]['B']
 
1    3
2    6
Name: B, dtype: int64

And,

df.loc[df.A > 5, 'B']

1    3
2    6
Name: B, dtype: int64

These return the same result, so if you are only reading these values, it makes no difference. So, what is the issue? The problem with chained assignment, is that it is generally difficult to predict whether a view or a copy is returned, so this largely becomes an issue when you are attempting to assign values back. To build on the earlier example, consider how this code is executed by the interpreter:

df.loc[df.A > 5, 'B'] = 4
# becomes
df.__setitem__((df.A > 5, 'B'), 4)

With a single __setitem__ call to df. OTOH, consider this code:

df[df.A > 5]['B'] = 4
# becomes
df.__getitem__(df.A > 5).__setitem__('B', 4)

Now, depending on whether __getitem__ returned a view or a copy, the __setitem__ operation may not work.

In general, you should use loc for label-based assignment, and iloc for integer/positional based assignment, as the spec guarantees that they always operate on the original. Additionally, for setting a single cell, you should use at and iat.

More can be found in the documentation.

Note
All boolean indexing operations done with loc can also be done with iloc. The only difference is that iloc expects either integers/positions for index or a numpy array of boolean values, and integer/position indexes for the columns.

For example,

df.loc[df.A > 5, 'B'] = 4

Can be written nas

df.iloc[(df.A > 5).values, 1] = 4

And,

df.loc[1, 'A'] = 100

Can be written as

df.iloc[1, 0] = 100

And so on.


Just tell me how to suppress the warning!

Consider a simple operation on the “A” column of df. Selecting “A” and dividing by 2 will raise the warning, but the operation will work.

df2 = df[['A']]
df2['A'] /= 2
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

df2
     A
0  2.5
1  4.5
2  3.5

There are a couple of ways of directly silencing this warning:

  1. (recommended) Use loc to slice subsets: df2 = df.loc[:, ['A']] df2['A'] /= 2 # Does not raise
  2. Change pd.options.mode.chained_assignment
    Can be set to None"warn", or "raise""warn" is the default. None will suppress the warning entirely, and "raise" will throw a SettingWithCopyError, preventing the operation from going through. pd.options.mode.chained_assignment = None df2['A'] /= 2
  3. Make a deepcopy df2 = df[['A']].copy(deep=True) df2['A'] /= 2

A nice way of non-intrusively changing the mode (modified from this gist) using a context manager, to set the mode only as long as it is required, and then reset it back to the original state when finished.

class ChainedAssignent:
    def __init__(self, chained=None):
        acceptable = [None, 'warn', 'raise']
        assert chained in acceptable, "chained must be in " + str(acceptable)
        self.swcw = chained

    def __enter__(self):
        self.saved_swcw = pd.options.mode.chained_assignment
        pd.options.mode.chained_assignment = self.swcw
        return self

    def __exit__(self, *args):
        pd.options.mode.chained_assignment = self.saved_swcw

The usage is as follows:

# some code here
with ChainedAssignent():
    df2['A'] /= 2
# more code follows

Or, to raise the exception

with ChainedAssignent(chained='raise'):
    df2['A'] /= 2

SettingWithCopyError: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

The “XY Problem”: What am I doing wrong?

A lot of the time, users attempt to look for ways of suppressing this exception without fully understanding why it was raised in the first place. This is a good example of an XY problem, where users attempt to solve a problem “Y” that is actually a symptom of a deeper rooted problem “X”. Questions will be raised based on common problems that encounter this warning, and solutions will then be presented.

Question 1
I have a DataFrame

df
       A  B  C  D  E
    0  5  0  3  3  7
    1  9  3  5  2  4
    2  7  6  8  8  1

I want to assign values in col “A” > 5 to 1000. My expected output is

      A  B  C  D  E
0     5  0  3  3  7
1  1000  3  5  2  4
2  1000  6  8  8  1

Wrong way to do this:

df.A[df.A > 5] = 1000         # works, because df.A returns a view
df[df.A > 5]['A'] = 1000      # does not work
df.loc[df.A > 5]['A'] = 1000   # does not work

Right way using loc:

df.loc[df.A > 5, 'A'] = 1000

Question 21
I am trying to set the value in cell (1, ‘D’) to 12345. My expected output is

   A  B  C      D  E
0  5  0  3      3  7
1  9  3  5  12345  4
2  7  6  8      8  1

I have tried different ways of accessing this cell, such as df['D'][1]. What is the best way to do this?

1. This question isn’t specifically related to the warning, but it is good to understand how to do this particular operation correctly so as to avoid situations where the warning could potentially arise in future.

You can use any of the following methods to do this.

df.loc[1, 'D'] = 12345
df.iloc[1, 3] = 12345
df.at[1, 'D'] = 12345
df.iat[1, 3] = 12345

Question 3
I am trying to subset values based on some condition. I have a DataFrame

   A  B  C  D  E
1  9  3  5  2  4
2  7  6  8  8  1

I would like to assign values in “D” to 123 such that “C” == 5. I tried

df2.loc[df2.C == 5, 'D'] = 123

Which seems fine but I am still getting the SettingWithCopyWarning! How do I fix this?

This is actually probably because of code higher up in your pipeline. Did you create df2 from something larger, like

df2 = df[df.A > 5]

? In this case, boolean indexing will return a view, so df2 will reference the original. What you’d need to do is assign df2 to a copy:

df2 = df[df.A > 5].copy()
# Or,
# df2 = df.loc[df.A > 5, :]

Question 4
I’m trying to drop column “C” in-place from

   A  B  C  D  E
1  9  3  5  2  4
2  7  6  8  8  1

But using

df2.drop('C', axis=1, inplace=True)

Throws SettingWithCopyWarning. Why is this happening?

This is because df2 must have been created as a view from some other slicing operation, such as

df2 = df[df.A > 5]

The solution here is to either make a copy() of df, or use loc, as before.

Answer #3:

In general the point of the SettingWithCopyWarning is to show users (and especially new users) that they may be operating on a copy and not the original as they think. There are false positives (IOW if you know what you are doing it could be ok). One possibility is simply to turn off the (by default warn) warning as @Garrett suggest.

Here is another option:

In [1]: df = DataFrame(np.random.randn(5, 2), columns=list('AB'))

In [2]: dfa = df.ix[:, [1, 0]]

In [3]: dfa.is_copy
Out[3]: True

In [4]: dfa['A'] /= 2
/usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  #!/usr/local/bin/python

You can set the is_copy flag to False, which will effectively turn off the check, for that object:

In [5]: dfa.is_copy = False

In [6]: dfa['A'] /= 2

If you explicitly copy then no further warning will happen:

In [7]: dfa = df.ix[:, [1, 0]].copy()

In [8]: dfa['A'] /= 2

The code the OP is showing above, while legitimate, and probably something I do as well, is technically a case for this warning, and not a false positive. Another way to not have the warning would be to do the selection operation via reindex, e.g.

quote_df = quote_df.reindex(columns=['STK', ...])

Or,

quote_df = quote_df.reindex(['STK', ...], axis=1)  # v.0.21

Answer #4:

Pandas dataframe copy warning

When you go and do something like this:

quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]

pandas.ix in this case returns a new, stand alone dataframe.

Any values you decide to change in this dataframe, will not change the original dataframe.

This is what pandas tries to warn you about.


Why .ix is a bad idea

The .ix object tries to do more than one thing, and for anyone who has read anything about clean code, this is a strong smell.

Given this dataframe:

df = pd.DataFrame({"a": [1,2,3,4], "b": [1,1,2,2]})

Two behaviors:

dfcopy = df.ix[:,["a"]]
dfcopy.a.ix[0] = 2

Behavior one: dfcopy is now a stand alone dataframe. Changing it will not change df

df.ix[0, "a"] = 3

Behavior two: This changes the original dataframe.


Use .loc instead

The pandas developers recognized that the .ix object was quite smelly[speculatively] and thus created two new objects which helps in the accession and assignment of data. (The other being .iloc)

.loc is faster, because it does not try to create a copy of the data.

.loc is meant to modify your existing dataframe inplace, which is more memory efficient.

.loc is predictable, it has one behavior.


The solution

What you are doing in your code example is loading a big file with lots of columns, then modifying it to be smaller.

The pd.read_csv function can help you out with a lot of this and also make the loading of the file a lot faster.

So instead of doing this

quote_df = pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
quote_df.rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]

Do this

columns = ['STK', 'TPrice', 'TPCLOSE', 'TOpen', 'THigh', 'TLow', 'TVol', 'TAmt', 'TDate', 'TTime']
df = pd.read_csv(StringIO(str_of_all), sep=',', usecols=[0,3,2,1,4,5,8,9,30,31])
df.columns = columns

This will only read the columns you are interested in, and name them properly. No need for using the evil .ix object to do magical stuff.

Hope you learned something from this post.

Follow Programming Articles for more!

About ᴾᴿᴼᵍʳᵃᵐᵐᵉʳ

Linux and Python enthusiast, in love with open source since 2014, Writer at programming-articles.com, India.

View all posts by ᴾᴿᴼᵍʳᵃᵐᵐᵉʳ →