How to select only rows with max value on a column in SQL [Answered]


I have this table for documents (simplified version here):


How do I select one row per id and only the greatest rev?
With the above data, the result should contain two rows: [1, 3, ...] and [2, 1, ..]. I’m using MySQL.

Currently, I use checks in the while loop to detect and over-write old revs from the resultset. But is this the only method to achieve the result? Isn’t there a SQL solution?

Select only rows with max value on a column- Answer #1:

At first glance…

All you need is a GROUP BY clause with the MAX aggregate function:

SELECT id, MAX(rev)
FROM YourTable

It’s never that simple, is it?

I just noticed you need the content column as well.

This is a very common question in SQL: find the whole data for the row with some max value in a column per some group identifier. I heard that a lot during my career. Actually, it was one the questions I answered in my current job’s technical interview.

It is, actually, so common that StackOverflow community has created a single tag just to deal with questions like that: greatest-n-per-group.

Basically, you have two approaches to solve that problem:

Joining with simple group-identifier, max-value-in-group Sub-query

In this approach, you first find the group-identifier, max-value-in-group (already solved above) in a sub-query. Then you join your table to the sub-query with equality on both group-identifier and max-value-in-group:

SELECT, a.rev, a.contents
FROM YourTable a
    SELECT id, MAX(rev) rev
    FROM YourTable
    GROUP BY id
) b ON = AND a.rev = b.rev

Left Joining with self, tweaking join conditions and filters

In this approach, you left join the table with itself. Equality goes in the group-identifier. Then, 2 smart moves:

  1. The second join condition is having left side value less than right value
  2. When you do step 1, the row(s) that actually have the max value will have NULL in the right side (it’s a LEFT JOIN, remember?). Then, we filter the joined result, showing only the rows where the right side is NULL.

So you end up with:

FROM YourTable a
    ON = AND a.rev < b.rev


Both approaches bring the exact same result.

If you have two rows with max-value-in-group for group-identifier, both rows will be in the result in both approaches.

Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its “flavor”.

Both approaches are also performance-friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make the most of sense to you.

Select only rows with max value on a column- Answer #2:

My preference is to use as little code as possible…

You can do it using IN try this:

FROM t1 WHERE (id,rev) IN 
( SELECT id, MAX(rev)
  FROM t1

to my mind it is less complicated… easier to read and maintain.

Answer #3:

I am flabbergasted that no answer offered SQL window function solution:

SELECT, a.rev, a.contents
  FROM (SELECT id, rev, contents,
               ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rank
          FROM YourTable) a
 WHERE a.rank = 1 

Added in SQL standard ANSI/ISO Standard SQL:2003 and later extended with ANSI/ISO Standard SQL:2008, window (or windowing) functions are available with all major vendors now. There are more types of rank functions available to deal with a tie issue: RANK, DENSE_RANK, PERSENT_RANK.

Answer #4:

Yet another solution is to use a correlated subquery:

select, yt.rev, yt.contents
    from YourTable yt
    where rev = 
        (select max(rev) from YourTable st where

Having an index on (id,rev) renders the subquery almost as a simple lookup…

Following are comparisons to the solutions in @AdrianCarneiro’s answer (subquery, leftjoin), based on MySQL measurements with InnoDB table of ~1million records, group size being: 1-3.

While for full table scans subquery/leftjoin/correlated timings relate to each other as 6/8/9, when it comes to direct lookups or batch (id in (1,2,3)), subquery is much slower then the others (Due to rerunning the subquery). However I couldnt differentiate between leftjoin and correlated solutions in speed.

One final note, as leftjoin creates n*(n+1)/2 joins in groups, its performance can be heavily affected by the size of groups…

Answer #5:

I can’t vouch for the performance, but here’s a trick inspired by the limitations of Microsoft Excel. It has some good features


  • It should force return of only one “max record” even if there is a tie (sometimes useful)
  • It doesn’t require a join


It is a little bit ugly and requires that you know something about the range of valid values of the rev column. Let us assume that we know the rev column is a number between 0.00 and 999 including decimals but that there will only ever be two digits to the right of the decimal point (e.g. 34.17 would be a valid value).

The gist of the thing is that you create a single synthetic column by string concatenating/packing the primary comparison field along with the data you want. In this way, you can force SQL’s MAX() aggregate function to return all of the data (because it has been packed into a single column). Then you have to unpack the data.

Here’s how it looks with the above example, written in SQL

       CAST(SUBSTRING(max(packed_col) FROM 2 FOR 6) AS float) as max_rev,
       SUBSTRING(max(packed_col) FROM 11) AS content_for_max_rev 
       CAST(1000 + rev + .001 as CHAR) || '---' || CAST(content AS char) AS packed_col
       FROM yourtable

The packing begins by forcing the rev column to be a number of known character length regardless of the value of rev so that for example

  • 3.2 becomes 1003.201
  • 57 becomes 1057.001
  • 923.88 becomes 1923.881

If you do it right, string comparison of two numbers should yield the same “max” as numeric comparison of the two numbers and it’s easy to convert back to the original number using the substring function (which is available in one form or another pretty much everywhere).

Answer #6:

Unique Identifiers? Yes! Unique identifiers!

One of the best ways to develop a MySQL DB is to have each id AUTOINCREMENT (Source This allows a variety of advantages, too many to cover here. The problem with the question is that its example has duplicate ids. This disregards these tremendous advantages of unique identifiers, and at the same time, is confusing to those familiar with this already.

Cleanest Solution

Newer versions of MySQL come with ONLY_FULL_GROUP_BY enabled by default, and many of the solutions here will fail in testing with this condition.

Even so, we can simply select DISTINCT someuniquefieldMAX( whateverotherfieldtoselect )( *somethirdfield ), etc., and have no worries understanding the result or how the query works :

SELECT DISTINCT, MAX(t1.rev), MAX(t2.content)
FROM Table1 AS t1
JOIN Table1 AS t2 ON = AND t2.rev = (
    SELECT MAX(rev) FROM Table1 t3 WHERE =
  • SELECT DISTINCT, max(Table1.rev), max(Table2.content) : Return DISTINCT somefield, MAX() some otherfield, the last MAX() is redundant, because I know it’s just one row, but it’s required by the query.
  • FROM Employee : Table searched on.
  • JOIN Table1 AS Table2 ON Table2.rev = Table1.rev : Join the second table on the first, because, we need to get the max(table1.rev)’s comment.
  • GROUP BY Force the top-sorted, Salary row of each employee to be the returned result.

Note that since “content” was “…” in OP’s question, there’s no way to test that this works. So, I changed that to “..a”, “..b”, so, we can actually now see that the results are correct:

id  max(Table1.rev) max(Table2.content)
1   3   ..d
2   1   ..b

Why is it clean? DISTINCT()MAX(), etc., all make wonderful use of MySQL indices. This will be faster. Or, it will be much faster, if you have indexing, and you compare it to a query that looks at all rows.

Original Solution

With ONLY_FULL_GROUP_BY disabled, we can use still use GROUP BY, but then we are only using it on the Salary, and not the id:

    (SELECT *
    FROM Employee
    ORDER BY Salary DESC)
AS employeesub
GROUP BY employeesub.Salary;
  • SELECT * : Return all fields.
  • FROM Employee : Table searched on.
  • (SELECT *...) subquery : Return all people, sorted by Salary.
  • GROUP BY employeesub.Salary: Force the top-sorted, Salary row of each employee to be the returned result.

Unique-Row Solution

Note the Definition of a Relational Database: “Each row in a table has its own unique key.” This would mean that, in the question’s example, id would have to be unique, and in that case, we can just do :

FROM Employee
WHERE = 12345
ORDER BY Employee.Salary DESC

Hopefully, this is a solution that solves the problem and helps everyone better understand what’s happening in the DB.

Answer #7:

Something like this?

SELECT, rev, content
FROM yourtable
    SELECT id, max(rev) as maxrev
    FROM yourtable
    GROUP BY id
) AS child ON ( = AND (yourtable.rev = maxrev)

Answer #8:

Another manner to do the job is using MAX() analytic function in OVER PARTITION clause

    SELECT id
          ,MAX(rev) OVER (PARTITION BY id) as max_rev
      FROM YourTable
    ) t
  WHERE t.rev = t.max_rev 

The other ROW_NUMBER() OVER PARTITION solution already documented in this post is

    SELECT id
      FROM YourTable
    ) t
  WHERE t.rank = 1 

This 2 SELECT work well on Oracle 10g.

MAX() solution runs certainly FASTER that ROW_NUMBER() solution because MAX() complexity is O(n) while ROW_NUMBER() complexity is at minimum O(n.log(n)) where n represent the number of records in table !

Hope you learned something from this post.

Follow Programming Articles for more!

About ᴾᴿᴼᵍʳᵃᵐᵐᵉʳ

Linux and Python enthusiast, in love with open source since 2014, Writer at, India.

View all posts by ᴾᴿᴼᵍʳᵃᵐᵐᵉʳ →