In this article, we’ll learn how to update the value of a row in a Python Dataframe.
If you want a direct answer and are not interested in the explanation and details, go to step 3 below.
As part of our data wrangling process, we are often required to modify data previously acquired from a csv, text, json, API, database or other data source.
Replace existing data in Pandas DataFrames
We’ll look into several cases:
- Replacing values in an entire DF.
- Updating values in specific cells by index
- Changing values in an entire DF row
- Replace cells content according to condition
- Set values for an entire column / series.
Creating the dataframe
Let’s define a simple survey DataFrame:
# Import DA packages
import pandas as pd
import numpy as np
# Create test Data
survey_dict = { 'language': ['Python', 'Java', 'Haskell', 'Go', 'C++'],
'salary': [120,85,95,80,90],
'num_candidates': [18,22,34,10, np.nan]
}
# Initialize the survey DataFrame
survey_df = pd.DataFrame(survey_dict)
# Review our DF
language | salary | num_candidates | |
---|---|---|---|
0 | Python | 120 | 18.0 |
1 | Java | 85 | 22.0 |
2 | Haskell | 95 | 34.0 |
3 | Go | 80 | 10.0 |
4 | C++ | 90 | NaN |
1. Set cell values in the entire DF using replace()
We’ll use the DataFrame replace method to modify DF sales according to their value. In the example, we’ll replace the null value in the last row. Note that we could accomplish the same result with the more elegant fillna() method.
survey_df.replace(to_replace= np.nan, value = 17, inplace=True )
survey_df
Note: The replace method is pretty self-explanatory, note the usage of inplace=True to persist the updates in the DataFrame going forward.
Here’s the output:
language | salary | num_candidates | |
---|---|---|---|
0 | Python | 120 | 18.0 |
1 | Java | 90 | 22.0 |
2 | Haskell | 95 | 34.0 |
3 | Go | 90 | 10.0 |
4 | C++ | 90 | 17.0 |
2. Change the value of cell content by index in pandas dataframe
To pick a specific row index to be modified, we’ll use the iloc indexer. Note that we could also use the loc indexer to update the cell by row/column label.
survey_df.iloc[0].replace(to_replace=120, value = 130)
Our output:
language Python salary 130 num_candidates 18.0 Name: 0, dtype: object
3. How to update the value of a row in a Python Dataframe?
Similar to before, but this time we’ll pass a list of values to replace and their respective replacements:
survey_df.loc[0].replace(to_replace=(130,18), value=(120, 20))
4. Update cells based on conditions in Dataframe
In reality, we’ll update our data based on specific conditions. Here’s an example on how to update cells with conditions. Let’s assume that we would like to update the salary figures in our data so that the minimal salary will be $90/hour.
We’ll first slide the DataFrame and find the relevant rows to update:
cond = survey_df['salary'] < 90
We’ll then pass the rows and columns labels to be updated into the loc indexer:
survey_df.loc[cond,'salary'] = 90
survey_df
Here’s our output:
language | salary | num_candidates | |
---|---|---|---|
0 | Python | 120 | 18.0 |
1 | Java | 90 | 22.0 |
2 | Haskell | 95 | 34.0 |
3 | Go | 90 | 10.0 |
4 | C++ | 90 | 17.0 |
Important note: We can obviously write more complex conditions as needed. Below if an example with multiple conditions.
cond = (survey_df['salary'] < 90) | (survey_df['num_candidates'] <18)
5. Replace values for an entire column
Let’s now assume that we would like to modify the num_candidates figure for all the DF entries. That’s fairly easy:
survey_df['num_candidates'] = 25
How to update the value of a row in pandas dataframe based on condition?
Sample question:
Say I have the following dataframe:

What is the most efficient way to update the values of the columns feat and another_feat where the stream is number 2?
Is this it?
for index, row in df.iterrows():
if df1.loc[index,'stream'] == 2:
# do something
UPDATE: What to do if I have more than a 100 columns? I don’t want to explicitly name the columns that I want to update. I want to divide the value of each column by 2 (except for the stream column).
So to be clear what my goal is:
Dividing all values by 2 of all rows that have stream 2, but not changing the stream column.
Answer #1:
You can use loc
if you need update two columns to same value:
df1.loc[df1['stream'] == 2, ['feat','another_feat']] = 'aaaa'
print df1
stream feat another_feat
a 1 some_value some_value
b 2 aaaa aaaa
c 2 aaaa aaaa
d 3 some_value some_value
If you need update separate, one option is use:
df1.loc[df1['stream'] == 2, 'feat'] = 10
print df1
stream feat another_feat
a 1 some_value some_value
b 2 10 some_value
c 2 10 some_value
d 3 some_value some_value
Another common option is use numpy.where
:
df1['feat'] = np.where(df1['stream'] == 2, 10,20)
print df1
stream feat another_feat
a 1 20 some_value
b 2 10 some_value
c 2 10 some_value
d 3 20 some_value
EDIT: If you need to divide all columns without stream
where condition is True
, use:
print df1
stream feat another_feat
a 1 4 5
b 2 4 5
c 2 2 9
d 3 1 7
#filter columns all without stream
cols = [col for col in df1.columns if col != 'stream']
print cols
['feat', 'another_feat']
df1.loc[df1['stream'] == 2, cols ] = df1 / 2
print df1
stream feat another_feat
a 1 4.0 5.0
b 2 2.0 2.5
c 2 1.0 4.5
d 3 1.0 7.0
If working with multiple conditions is possible use multiple numpy.where
or numpy.select
:
df0 = pd.DataFrame({'Col':[5,0,-6]})
df0['New Col1'] = np.where((df0['Col'] > 0), 'Increasing',
np.where((df0['Col'] < 0), 'Decreasing', 'No Change'))
df0['New Col2'] = np.select([df0['Col'] > 0, df0['Col'] < 0],
['Increasing', 'Decreasing'],
default='No Change')
print (df0)
Col New Col1 New Col2
0 5 Increasing Increasing
1 0 No Change No Change
2 -6 Decreasing Decreasing
Answer #2:
You can do the same with .ix
, like this:
In [1]: df = pd.DataFrame(np.random.randn(5,4), columns=list('abcd'))
In [2]: df
Out[2]:
a b c d
0 -0.323772 0.839542 0.173414 -1.341793
1 -1.001287 0.676910 0.465536 0.229544
2 0.963484 -0.905302 -0.435821 1.934512
3 0.266113 -0.034305 -0.110272 -0.720599
4 -0.522134 -0.913792 1.862832 0.314315
In [3]: df.ix[df.a>0, ['b','c']] = 0
In [4]: df
Out[4]:
a b c d
0 -0.323772 0.839542 0.173414 -1.341793
1 -1.001287 0.676910 0.465536 0.229544
2 0.963484 0.000000 0.000000 1.934512
3 0.266113 0.000000 0.000000 -0.720599
4 -0.522134 -0.913792 1.862832 0.314315
EDIT
After the extra information, the following will return all columns – where some condition is met – with halved values:
>> condition = df.a > 0
>> df[condition][[i for i in df.columns.values if i not in ['a']]].apply(lambda x: x/2)