
group-by interview questions

Top group-by frequently asked interview questions

Group By Multiple Columns

How can I do GroupBy Multiple Columns in LINQ

Something similar to this in SQL:

SELECT * FROM <TableName> GROUP BY <Column1>,<Column2>

How can I convert this to LINQ:

    MaterialID int,
    ProductID int,
    Quantity float

INSERT INTO @QuantityBreakdown (MaterialID, ProductID, Quantity)
SELECT MaterialID, ProductID, SUM(Quantity)
FROM @Transactions
GROUP BY MaterialID, ProductID

Source: (StackOverflow)

How to group time by hour or by 10 minutes

like when I do

  FROM [FRIIB].[dbo].[ArchiveAnalog]
  GROUP BY [Date]

how can I specify the group period ?

MS SQL 2008

2nd Edit

I'm trying

SELECT MIN([Date]) AS RecT, AVG(Value)
  FROM [FRIIB].[dbo].[ArchiveAnalog]

changed %10 to / 10. is it possible to make Date output without milliseconds ?

Source: (StackOverflow)


SQL - using alias in Group By

Just curious about SQL syntax. So if I have

 itemName as ItemName,
 substring(itemName, 1,1) as FirstLetter,
FROM table1
GROUP BY itemName, FirstLetter

This would be incorrect because

GROUP BY itemName, FirstLetter 

really should be

GROUP BY itemName, substring(itemName, 1,1)

But why can't we simply use the former for convenience?

Source: (StackOverflow)

LINQ Group By Multiple fields -Syntax help

What is the correction needed for example 2 inorder to group by multiple columns

Example 1

var query = from cm in cust
            group cm by new { cm.Customer, cm.OrderDate } into cms
            { Key1 = cms.Key.Customer,Key2=cms.Key.OrderDate,Count=cms.Count() };

Example 2 (incorrect)

   var qry = 
   cust.GroupBy(p => p.Customer, q => q.OrderDate, (k1, k2, group) =>
   new { Key1 = k1, Key2 = k2, Count = group.Count() });

Source: (StackOverflow)

What does SQL clause "GROUP BY 1" mean?

Someone sent me a SQL query where the GROUP BY clause consisted of the statement: GROUP BY 1.

This must be a typo right? No column is given the alias 1. What could this mean? Am I right to assume that this must be a typo?

Source: (StackOverflow)

Linq with group by having count

how do I write this query in linq (vb.net)?

 select B.Name
 from Company B
 group by B.Name
 having COUNT(1) > 1

Source: (StackOverflow)

Group by in LINQ

Let's suppose if we have a class like

class Person { 
    internal int PersonID; 
    internal string car  ; 

Now I have a list of this class: List<Person> persons;

Now this list can have instances multiple same PersonIDs, for ex.

persons[0] = new Person { PersonID = 1, car = "Ferrari" }; 
persons[1] = new Person { PersonID = 1, car = "BMW"     }; 
persons[2] = new Person { PersonID = 2, car = "Audi"    }; 

Is there a way I can group by personID and get the list of all the cars he has? For ex. expected result would be

class Result { 
   int PersonID;
   List<string> cars; 

So after grouping by I would get:

results[0].PersonID = 1; 
List<string> cars = results[0].cars; 

result[1].PersonID = 2; 
List<string> cars = result[1].cars;

From what I have done so far:

var results = from p in persons
              group p by p.PersonID into g
              select new { PersonID = g.Key, // this is where I am not sure what to do

Could someone please point me in the right direction?

Source: (StackOverflow)

Select first row in each GROUP BY group?

As the title suggests, I'd like to select the first row of each set of rows grouped with a GROUP BY.

Specifically, if I've got a purchases table that looks like this:

SELECT * FROM purchases;
id | customer | total
 1 | Joe      | 5
 2 | Sally    | 3
 3 | Joe      | 2
 4 | Sally    | 1

I'd like to query for the id of the largest purchase (total) made by each customer. Something like this:

SELECT FIRST(id), customer, FIRST(total)
FROM  purchases
GROUP BY customer
FIRST(id) | customer | FIRST(total)
        1 | Joe      | 5
        2 | Sally    | 3

Source: (StackOverflow)

MySQL Query GROUP BY day / month / year

Is it possible I make a simple query to count how many records I have in a determined period of time like a Year, month or day, having a TIMESTAMP field, like:

FROM stats
WHERE record_date.YEAR = 2009
GROUP BY record_date.YEAR

Or even:

FROM stats
GROUP BY record_date.YEAR, record_date.MONTH

To have a monthly statistic.


Source: (StackOverflow)

Retrieving the last record in each group

There is a table messages that contains data as shown below:

Id   Name   Other_Columns
1    A       A_data_1
2    A       A_data_2
3    A       A_data_3
4    B       B_data_1
5    B       B_data_2
6    C       C_data_1

If I run a query select * from messages group by name, I will get the result as:

1    A       A_data_1
4    B       B_data_1
6    C       C_data_1

What query will return the following result?

3    A       A_data_3
5    B       B_data_2
6    C       C_data_1

That is, the last record in each group should be returned.

At present, this is the query that I use:

select * from (select * from messages ORDER BY id DESC) AS x GROUP BY name

But this looks highly inefficient. Any other ways to achieve the same result?

Source: (StackOverflow)

Using group by on multiple columns

I understand the point of group by x

But how does group by x, y work and what does it mean?

Source: (StackOverflow)

How to concatenate strings of a string field in a PostgreSQL 'group by' query?

I am looking for a way to concatenate the strings of a field within a group by query. So for example, I have a table:

1    1            Anna
2    1            Bill
3    2            Carol
4    2            Dave

and I wanted to group by company_id to get something like:

1            Anna, Bill
2            Carol, Dave

There is a built-in function in mySQL to do this group_concat

Source: (StackOverflow)

Get top 1 row of each group

I have a table which I want to get the latest entry for each group. Here's the table:

DocumentStatusLogs Table

|ID| DocumentID | Status | DateCreated |
| 2| 1          | S1     | 7/29/2011   |
| 3| 1          | S2     | 7/30/2011   |
| 6| 1          | S1     | 8/02/2011   |
| 1| 2          | S1     | 7/28/2011   |
| 4| 2          | S2     | 7/30/2011   |
| 5| 2          | S3     | 8/01/2011   |
| 6| 3          | S1     | 8/02/2011   |

The table will be grouped by DocumentID and sorted by DateCreated in descending order. For each DocumentID, I want to get the latest status.

My preferred output:

| DocumentID | Status | DateCreated |
| 1          | S1     | 8/02/2011   |
| 2          | S3     | 8/01/2011   |
| 3          | S1     | 8/02/2011   |
  • Is there any aggregate function to get only the top from each group? See pseudo-code GetOnlyTheTop below:

    select DocumentID, GetOnlyTheTop(Status), GetOnlyTheTop(DateCreated) from DocumentStatusLogs group by DocumentID order by DateCreated desc

  • If such function doesn't exist, is there any way I can achieve the output I want?

  • Or at the first place, could this be caused by unnormalized database? I'm thinking, since what I'm looking for is just one row, should that status also be located in the parent table?

Please see the parent table for more information:

Current Documents Table

| DocumentID | Title  | Content  | DateCreated |
| 1          | TitleA | ...      | ...         |
| 2          | TitleB | ...      | ...         |
| 3          | TitleC | ...      | ...         |

Should the parent table be like this so that I can easily access its status?

| DocumentID | Title  | Content  | DateCreated | CurrentStatus |
| 1          | TitleA | ...      | ...         | s1            |
| 2          | TitleB | ...      | ...         | s3            |
| 3          | TitleC | ...      | ...         | s1            |

UPDATE I just learned how to use "apply" which makes it easier to address such problems.

Source: (StackOverflow)

C# Linq Group By on multiple columns [duplicate]

This question already has an answer here:

public class ConsolidatedChild
    public string School { get; set; }
    public string Friend { get; set; }
    public string FavoriteColor { get; set; }
    public List<Child> Children { get; set; }

public class Child
    public string School { get; set; }
    public string Name { get; set; }
    public string Address { get; set; }
    public string Friend { get; set; }
    public string Mother { get; set; }
    public string FavoriteColor { get; set; }

Given the two classes above, I would like to use LINQ to create a List from the List, grouped by the School, Friend and FavoriteColor properties. Is this possible with LINQ?

Please ignore the properties, the code has been written just to help with the question.

Source: (StackOverflow)

Is there any difference between GROUP BY and DISTINCT

I learned something simple about SQL the other day:


Has the same result as:


What I am curious of, is there anything different in the way an SQL engine processes the command, or are they truly the same thing?

I personally prefer the distinct syntax, but I am sure it's more out of habit than anything else.

EDIT: This is not a question about aggregates. The use of GROUP BY with aggregate functions is understood.

Source: (StackOverflow)