To delete a column in a DataFrame, I can successfully use:
del df['column_name']
But why can’t I use the following?
del df.column_name
Since it is possible to access the column/Series as df.column_name
, I expected this to work.
How to delete a column from a Pandas DataFrame?
The best way to do this in Pandas is to use drop
:
df = df.drop('column_name', axis=1)
where 1
is the axis number (0
for rows and 1
for columns.)
To delete the column without having to reassign df
you can do:
df.drop('column_name', axis=1, inplace=True)
Finally, to drop by column number instead of by column label, try this to delete, e.g. the 1st, 2nd and 4th columns:
df = df.drop(df.columns[[0, 1, 3]], axis=1) # df.columns is zero-based pd.Index
Also working with “text” syntax for the columns:
df.drop(['column_nameA', 'column_nameB'], axis=1, inplace=True)
Note: Introduced in v0.21.0 (October 27, 2017), the drop()
method accepts index/columns keywords as an alternative to specifying the axis.
So we can now just do:
df = df.drop(columns=['column_nameA', 'column_nameB'])
Answer #2:
As you’ve guessed, the right syntax is
del df['column_name']
It’s difficult to make del df.column_name
work simply as the result of syntactic limitations in Python. del df[name]
gets translated to df.__delitem__(name)
under the covers by Python.
Answer #3:
Use:
columns = ['Col1', 'Col2', ...]
df.drop(columns, inplace=True, axis=1)
This will delete one or more columns in-place. Note that inplace=True
was added in pandas v0.13 and won’t work on older versions. You’d have to assign the result back in that case:
df = df.drop(columns, axis=1)
Answer #4:
Drop by index
Delete first, second and fourth columns:
df.drop(df.columns[[0,1,3]], axis=1, inplace=True)
Delete first column:
df.drop(df.columns[[0]], axis=1, inplace=True)
There is an optional parameter inplace
so that the original data can be modified without creating a copy.
Popped
Column selection, addition, deletion
Delete column column-name
:
df.pop('column-name')
Examples:
df = DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6]), ('C', [7,8, 9])], orient='index', columns=['one', 'two', 'three'])
print df
:
one two three
A 1 2 3
B 4 5 6
C 7 8 9
df.drop(df.columns[[0]], axis=1, inplace=True)
print df
:
two three
A 2 3
B 5 6
C 8 9
three = df.pop('three')
print df
:
two
A 2
B 5
C 8
Answer #5:
The actual question posed, missed by most answers here is:
Why can’t I use del df.column_name
?
At first, we need to understand the problem, which requires us to dive into Python magic methods.
del df['column']
maps to the Python magic method df.__delitem__('column')
which is implemented in Pandas to drop the column.
However, as pointed out in the link above about Python magic methods:
In fact,
__del__
should almost never be used because of the precarious circumstances under which it is called; use it with caution!
You could argue that del df['column_name']
should not be used or encouraged, and thereby del df.column_name
should not even be considered.
However, in theory, del df.column_name
could be implemented to work in Pandas using the magic method __delattr__
. This does however introduce certain problems, problems which the del df['column_name']
implementation already has, but to a lesser degree.
Example Problem
What if I define a column in a dataframe called “dtypes” or “columns”?
Then assume I want to delete these columns.
del df.dtypes
would make the __delattr__
method confused as if it should delete the “dtypes” attribute or the “dtypes” column.
Architectural questions behind this problem
- Is a dataframe a collection of columns?
- Is a dataframe a collection of rows?
- Is a column an attribute of a dataframe?
Pandas answers:
- Yes, in all ways
- No, but if you want it to be, you can use the
.ix
,.loc
or.iloc
methods. - Maybe, do you want to read data? Then yes, unless the name of the attribute is already taken by another attribute belonging to the dataframe. Do you want to modify data? Then no.
TLDR;
You cannot do del df.column_name
, because Pandas has a quite wildly grown architecture that needs to be reconsidered in order for this kind of cognitive dissonance not to occur to its users.
Pro tip:
Don’t use df.column_name. It may be pretty, but it causes cognitive dissonance.
Zen of Python quotes that fits in here:
There are multiple ways of deleting a column.
There should be one– and preferably only one –obvious way to do it.
Columns are sometimes attributes but sometimes not.
Special cases aren’t special enough to break the rules.
Does del df.dtypes
delete the dtypes attribute or the dtypes column?
In the face of ambiguity, refuse the temptation to guess.
Delete a column from a Pandas DataFrame
We can remove or delete a specified column or specified columns by the drop() method.
Suppose df is a dataframe.
Column to be removed = column0
Code:
df = df.drop(column0, axis=1)
To remove multiple columns col1, col2, . . . , coln, we have to insert all the columns that needed to be removed in a list. Then remove them by the drop() method.
Code:
df = df.drop([col1, col2, . . . , coln], axis=1)
Answer #6:
Another way of deleting a column in a Pandas DataFrame
If you’re not looking for in-place deletion then you can create a new DataFrame by specifying the columns using DataFrame(...)
function as:
my_dict = { 'name' : ['a','b','c','d'], 'age' : [10,20,25,22], 'designation' : ['CEO', 'VP', 'MD', 'CEO']}
df = pd.DataFrame(my_dict)
Create a new DataFrame as
newdf = pd.DataFrame(df, columns=['name', 'age'])
Answer #7:
It’s good practice to always use the []
notation. One reason is that attribute notation (df.column_name
) does not work for numbered indices:
In [1]: df = DataFrame([[1, 2, 3], [4, 5, 6]])
In [2]: df[1]
Out[2]:
0 2
1 5
Name: 1
In [3]: df.1
File "<ipython-input-3-e4803c0d1066>", line 1
df.1
^
SyntaxError: invalid syntax
71
A nice addition is the ability to drop columns only if they exist. This way you can cover more use cases, and it will only drop the existing columns from the labels passed to it:
Simply add errors=’ignore’, for example.:
df.drop(['col_name_1', 'col_name_2', ..., 'col_name_N'], inplace=True, axis=1, errors='ignore')
- This is new from pandas 0.16.1 onward.
Answer #8:
Pandas 0.21+ answer
Pandas version 0.21 has changed the drop
method slightly to include both the index
and columns
parameters to match the signature of the rename
and reindex
methods.
df.drop(columns=['column_a', 'column_c'])
Personally, I prefer using the axis
parameter to denote columns or index because it is the predominant keyword parameter used in nearly all pandas methods. But, now you have some added choices in version 0.21.
Answer #9:
If your original dataframe df
is not too big, you have no memory constraints, and you only need to keep a few columns, or, if you don’t know beforehand the names of all the extra columns that you do not need, then you might as well create a new dataframe with only the columns you need:
new_df = df[['spam', 'sausage']]
Hope you learned something from this post.
Follow Programming Articles for more!