sub and gsub functions in R

In this article, we’ll learn about the sub() and gsub() functions of R in detail with examples.

You can replace the string or the characters in a vector or a data frame using the sub() and gsub() function in R.

Hello folks, we are going to focus on the most useful and beneficial functions in R, i.e. sub() and gsub() functions.

The sub() and gsub() functions in R, will replace the string with a specific string. You can even use regular expressions with the gsub() function.

Let’s move forward and explore these functions using relevant illustrations.

Sub function in R

Need to selectively replace the text in an R string? The R sub function can handle this, scanning the string for the text you want to replace and returning a revised version of the string.

Sub() differs from gsub() because it only replaces the first instance of the search string, not every instance in the text you are searching.

How to Use Sub() in R? – Examples

The basic syntax of sub in r:

sub(search_term, replacement_term, string_searched, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

Breaking down the components:

  • The search term – can be a text fragment or a regular expression.
  • Replacement term – usually a text fragment
  • String searched – must be a string
  • Ignore case – allows you to ignore case when searching
  • Perl – ability to use perl regular expressions
  • Fixed – option which forces the sub function to treat the search term as a string, overriding any other instructions (useful when a search string can also be interpreted as a regular expression.

A working code example – sub in r with basic text:

# sub in R
> base <- "Diogenes the cynic searched Athens for an honest man."
> sub("an honest man", "himself", base)
[1] "Diogenes the cynic searched Athens for himself."

Sub in R – Regular Expressions

R’s sub() function can work with regular expressions, which gives it a fair amount of power. We’re going to show a very basic version of this below, where we protect the privacy of some address data with a generic string substitution.

Sub actually works extremely well in this case. We know the typical US address has a street number in front. This number is of unknown length (number of digits). Furthermore, other numbers may also exist within an address that we want to preserve (eg. 4th Street, 57th Street SE) – so we really do only want to perform a single replacement on the first number.

# sub in R
> base <- "1155 East Main Street, Anytown, AL"
> sub("[[:digit:]]+","_", base)
[1] "_ East Main Street, Anytown, AL"

Sub in R – Searching for patterns

You can use regular expressions to look for more advanced patterns. In the example below, we’re going to grab the first sequence of 1 – 3 n’s and replace them with a star (not harming any additional n’s in excess of that amount).

# sub in r - regular expression pattern matching
> base <- "bnnnnnannannasplit"
> sub("n{1,3}","*",base)
[1] "b*nnannannasplit"

As you can see, we find the initial sequence of 5 n’s… replace the first three, then preserve the remaining two. A second example, looking for a word… or something that fits the pattern… is shown below…

# sub in R
> base <- "I love my dog"
> sub("l[A-z]*e", "like", base)
[1] "I like my dog"

In this example, we’re looking for a word of uncertain length which starts with l and ends with e. Since we’re not into that whole love thing, we’re going to demote it to like and call it a day. This is a way to clean up text.

Sub in R – Finding Alternative Matches

Sometimes what you’re looking for may involve more than one thing. In the example below, we want to adjust a pet specific text (dog, cat, etc.) to refer the companion animal as a more generic “pet”. We use the | operator within a regular expression to set this up.

# sub in r - regular expression for alternatives
> base <- "I love my dog"
> sub("dog|cat|hamster|goat|pig","pet", base)
[1] "I love my pet"

Gsub function in R

Need to selectively replace multiple occurrences of a text within an R string? Never fear, the R gsub () function is here! This souped-up version of the sub() function doesn’t just stop at the first instance of the string you want to replace. It gets them ALLLL…..

So when you want to utterly sanitize an entire string full of data, clearing out every instance of heretical thought, gsub() in r is your go-to solution.

How To Use gsub () in R

The basic syntax of gsub in r:.

gsub(search_term, replacement_term, string_searched, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

Breaking down the components:

  • The search term – can be a text fragment or a regular expression.
  • Replacement term – usually a text fragment
  • String searched – must be a string
  • Ignore case – allows you to ignore case when searching
  • Perl – ability to use perl regular expressions
  • Fixed – option which forces the sub function to treat the search term as a string, overriding any other instructions (useful when a search string can also be interpreted as a regular expression.

A working code example – gsub in r with basic text:

# gsub in R
> base <- "Diogenes the cynic searched Athens for an honest man."
> gsub("an honest man", "himself", base)
[1] "Diogenes the cynic searched Athens for himself."

GSub in R – Regular Expressions

R’s gsub() function can work with regular expressions. Here’s an example of this below, where we are going to remove all of the punctuation from a phone number.

# gsub in R - regular expressions
> phone <-"(206) 555 - 1212"
> gsub("[[:punct:][:blank:]]","",phone)
[1] "2065551212"

As you can see, that phone number got a lot skinnier in a hurry! It will also now fit neatly in a numeric field within a database, which is a much easier way to store and manage this type of information.

Sub in R – Searching for patterns

You can use regular expressions to look for more advanced patterns. In the example below, we’re going to grab the first sequence of 1 – 3 n’s and replace them with a star (not harming any additional n’s in excess of that amount).

# sub in r - regular expression pattern matching
> base <- "bnnnnnannannasplit"
> gsub("n{1,3}","*",base)
[1] "b**a*a*asplit"

As you can see, it tagged multiple subsets of n’s – far more than the original version of this example in our explanation on sub above.

Sub in R – Finding Alternative Matches

Sometimes what you’re looking for may involve more than one thing. In the example below, we want to adjust a pet-specific text (dog, cat, etc.) to refer the companion animal as a more generic “pet”. We use the | operator within a regular expression to set this up.

# sub in r - regular expression for alternatives
> base <- "I love my dog even though it may annoy with my cat"
> gsub("dog|cat|hamster|goat|pig","pet", base)
[1] "I love my pet even though it may annoy with my pet"

Summary on sub and gsub in R

sub(pattern, replacement, x)
gsub(pattern, replacement, x)
Replace the first occurrence of a pattern with sub or replace all occurrences with gsub.

  • pattern – A pattern to search for, which is assumed to be a regular expression. Use an additional argument fixed=TRUE to look for a pattern without using regular expressions.
  • replacement – A character string to replace the occurrence (or occurrences for gsub) of pattern.
  • x – A character vector to search for pattern. Each element will be searched separately.

Example. The simple example below does not make use of regular expressions.

> 
> x <- c("This is a sentence about axis",
+        "A second pattern is also listed here")
> sub("is", "XY", x)
[1] "ThXY is a sentence about axis"       
[2] "A second pattern XY also listed here"
> gsub("is", "XY", x)
[1] "ThXY XY a sentence about axXY"       
[2] "A second pattern XY also lXYted here"
> 

Alternative explanations of sub and gsub functions in R

sub() and gsub() function in R are replacement functions, which replaces the occurrence of a substring with other substring. gsub() function and sub() function in R is used to replace the occurrence of a string with other in Vector and the column of a dataframe. gsub() function can also be used with the combination of regular expression. Lets see an example for each

  • sub() Function in R replaces the first instance of a substring
  • gsub() function in R replaces all the instances of a substring
  • Replacing the occurrence of the string using sub() and gsub() function of the column in R dataframe
  • Replacing the occurrence of the string in vector using gsub() and sub() function

Syntax for sub() and gsub() function in R:

  1. sub(old, new, string)

2. gsub(old, new, string)

old – Already exiting pattern to be replaced.
new –  New string to be used for replacement.
String – 
string, character vector/ dataframe column for replacement

Example of sub() function in R:

sub() function in R replaces only the first occurrence of a substring. The sub function finds the first instance of the old substring and replaces it with the new substring. let’s see with an example.

1234# sub function in R mysentence <- "England is Beautiful. England is not the part of EU"sub("England", "UK", mysentence)

only England in the first occurrence is replaced with UK. so the output will be[1] “UK is Beautiful. England is not the part of EU”

Example of gsub() function in R:

   gsub() function in R is global replace function, which replaces all instances of the substring not just the first. Lets see the same example

1234# gsub function in R mysentence <- "England is Beautiful. England is not the part of EU"gsub("England", "UK", mysentence)

all the occurrences of England is replaced with UK. so the output will be[1] “UK is Beautiful. UK is not the part of EU”

Example of gsub() function with regular expression in R:

 The old argument in the syntax can be a regular expression, which allows you to match patterns in which you want to replace a substring. Lets see an example

1234# gsub function in R with regular expression mysentence <- "UK is Beautiful. UK is not the part of EU since 2016"gsub("[0-9]*", "", mysentence)

In the above example we have removed all the numbers from the sentence with the help of regular expression.

So the output will be[1] “UK is Beautiful. UK is not the part of EU since “

Example of gsub() function in the column of a dataframe :

First lets create the dataframe as depicted below

123df = data.frame (NAME =c ('Alisa','Bobby','jodha','jack','raghu','Cathrine', 'Alisa','Bobby','kumar','Alisa','jack','Cathrine'), Age = c (26,24,26,22,23,24,26,24,22,26,22,25), Score =c(85,63,55,74,31,77,85,63,42,85,74,78))  df

so the resultant dataframe will be

gsub() function in the column of R dataframe to replace a substring:

gsub() function is also applicable in the column of the dataframe in R. Lets see the below example.

1234## Replace substring of the column in R dataframe df$NAME = gsub("A","E",df$NAME)df

As mentioned  every occurrence of “A” is replaced with “E”. so the resultant dataframe will be

sub gsub function in R 3

gsub() function in the column of R dataframe to replace a substring:

gsub() function in R along with the regular expression is used to replace the multiple occurrences of a pattern in the column of the dataframe. Lets see the below example.

1234## Replace substring of the column in R dataframe using REGEX df$NAME = gsub(".*^","MR/MRS.",df$NAME)df

As mentioned “MR/MRS.” will be added to the Name column using regular expression. so the resultant dataframe will be

In this article, we learned about the sub and gsub functions in R.

Hope you learned something from this post. The primary source of this article is StackOverflow.

Follow Programming Articles for more!

About ᴾᴿᴼᵍʳᵃᵐᵐᵉʳ

Linux and Python enthusiast, in love with open source since 2014, Writer at programming-articles.com, India.

View all posts by ᴾᴿᴼᵍʳᵃᵐᵐᵉʳ →