How to update the value of a row in a Python Dataframe?

In this article, we’ll learn how to update the value of a row in a Python Dataframe.

If you want a direct answer and are not interested in the explanation and details, go to step 3 below.

As part of our data wrangling process, we are often required to modify data previously acquired from a csv, text, json, API, database or other data source.

Replace existing data in Pandas DataFrames

We’ll look into several cases:

  1. Replacing values in an entire DF.
  2. Updating values in specific cells by index
  3. Changing values in an entire DF row
  4. Replace cells content according to condition
  5. Set values for an entire column / series.

Creating the dataframe

Let’s define a simple survey DataFrame:


# Import DA packages

import pandas as pd
import numpy as np

# Create test Data

survey_dict = {             'language': ['Python', 'Java', 'Haskell', 'Go', 'C++'],
                          'salary': [120,85,95,80,90],
                          'num_candidates': [18,22,34,10, np.nan]
                                       }

# Initialize the survey DataFrame
survey_df = pd.DataFrame(survey_dict)

# Review our DF
languagesalarynum_candidates
0Python12018.0
1Java8522.0
2Haskell9534.0
3Go8010.0
4C++90NaN

1. Set cell values in the entire DF using replace()

We’ll use the DataFrame replace method to modify DF sales according to their value. In the example, we’ll replace the null value in the last row. Note that we could accomplish the same result with the more elegant fillna() method.

survey_df.replace(to_replace= np.nan, value = 17, inplace=True )
survey_df

Note: The replace method is pretty self-explanatory, note the usage of inplace=True to persist the updates in the DataFrame going forward.

Here’s the output:

languagesalarynum_candidates
0Python12018.0
1Java9022.0
2Haskell9534.0
3Go9010.0
4C++9017.0

2. Change the value of cell content by index in pandas dataframe

To pick a specific row index to be modified, we’ll use the iloc indexer. Note that we could also use the loc indexer to update the cell by row/column label.

survey_df.iloc[0].replace(to_replace=120, value = 130)

Our output:

language          Python
salary               130
num_candidates      18.0
Name: 0, dtype: object

3. How to update the value of a row in a Python Dataframe?

Similar to before, but this time we’ll pass a list of values to replace and their respective replacements:

survey_df.loc[0].replace(to_replace=(130,18), value=(120, 20))

4. Update cells based on conditions in Dataframe

In reality, we’ll update our data based on specific conditions. Here’s an example on how to update cells with conditions. Let’s assume that we would like to update the salary figures in our data so that the minimal salary will be $90/hour.

We’ll first slide the DataFrame and find the relevant rows to update:

cond = survey_df['salary'] < 90

We’ll then pass the rows and columns labels to be updated into the loc indexer:

survey_df.loc[cond,'salary'] = 90
survey_df

Here’s our output:

languagesalarynum_candidates
0Python12018.0
1Java9022.0
2Haskell9534.0
3Go9010.0
4C++9017.0

Important note: We can obviously write more complex conditions as needed. Below if an example with multiple conditions.


cond = (survey_df['salary'] < 90) | (survey_df['num_candidates'] <18)

5. Replace values for an entire column

Let’s now assume that we would like to modify the num_candidates figure for all the DF entries. That’s fairly easy:

survey_df['num_candidates'] = 25

How to update the value of a row in pandas dataframe based on condition?

Sample question:

Say I have the following dataframe:

table

What is the most efficient way to update the values of the columns feat and another_feat where the stream is number 2?

Is this it?

for index, row in df.iterrows():
    if df1.loc[index,'stream'] == 2:
       # do something

UPDATE: What to do if I have more than a 100 columns? I don’t want to explicitly name the columns that I want to update. I want to divide the value of each column by 2 (except for the stream column).

So to be clear what my goal is:

Dividing all values by 2 of all rows that have stream 2, but not changing the stream column.

Answer #1:

You can use loc if you need update two columns to same value:

df1.loc[df1['stream'] == 2, ['feat','another_feat']] = 'aaaa'
print df1
   stream        feat another_feat
a       1  some_value   some_value
b       2        aaaa         aaaa
c       2        aaaa         aaaa
d       3  some_value   some_value

If you need update separate, one option is use:

df1.loc[df1['stream'] == 2, 'feat'] = 10
print df1
   stream        feat another_feat
a       1  some_value   some_value
b       2          10   some_value
c       2          10   some_value
d       3  some_value   some_value

Another common option is use numpy.where:

df1['feat'] = np.where(df1['stream'] == 2, 10,20)
print df1
   stream  feat another_feat
a       1    20   some_value
b       2    10   some_value
c       2    10   some_value
d       3    20   some_value

EDIT: If you need to divide all columns without stream where condition is True, use:

print df1
   stream  feat  another_feat
a       1     4             5
b       2     4             5
c       2     2             9
d       3     1             7

#filter columns all without stream
cols = [col for col in df1.columns if col != 'stream']
print cols
['feat', 'another_feat']

df1.loc[df1['stream'] == 2, cols ] = df1 / 2
print df1
   stream  feat  another_feat
a       1   4.0           5.0
b       2   2.0           2.5
c       2   1.0           4.5
d       3   1.0           7.0

If working with multiple conditions is possible use multiple numpy.where or numpy.select:

df0 = pd.DataFrame({'Col':[5,0,-6]})

df0['New Col1'] = np.where((df0['Col'] > 0), 'Increasing', 
                          np.where((df0['Col'] < 0), 'Decreasing', 'No Change'))

df0['New Col2'] = np.select([df0['Col'] > 0, df0['Col'] < 0],
                            ['Increasing',  'Decreasing'], 
                            default='No Change')

print (df0)
   Col    New Col1    New Col2
0    5  Increasing  Increasing
1    0   No Change   No Change
2   -6  Decreasing  Decreasing

Answer #2:

You can do the same with .ix, like this:

In [1]: df = pd.DataFrame(np.random.randn(5,4), columns=list('abcd'))

In [2]: df
Out[2]: 
          a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484 -0.905302 -0.435821  1.934512
3  0.266113 -0.034305 -0.110272 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315

In [3]: df.ix[df.a>0, ['b','c']] = 0

In [4]: df
Out[4]: 
          a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484  0.000000  0.000000  1.934512
3  0.266113  0.000000  0.000000 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315

EDIT

After the extra information, the following will return all columns – where some condition is met – with halved values:

>> condition = df.a > 0
>> df[condition][[i for i in df.columns.values if i not in ['a']]].apply(lambda x: x/2)

About ᴾᴿᴼᵍʳᵃᵐᵐᵉʳ

Linux and Python enthusiast, in love with open source since 2014, Writer at programming-articles.com, India.

View all posts by ᴾᴿᴼᵍʳᵃᵐᵐᵉʳ →