Skip to main content

20. GROUP BY & HAVING -- Grouping and Filtering Aggregates

Level: Guided beginner

ℹ️ What You'll Learn
  • GROUP BY syntax -- organize rows by column
  • GROUP BY with aggregate functions (SUM, AVG, COUNT)
  • HAVING clause -- filter grouped results
  • GROUP BY multiple columns
  • GROUP BY with ORDER BY
  • Reporting and dashboard queries
  • Common GROUP BY mistakes
  • Best practices for grouping

Aggregate functions (SUM, AVG, COUNT) combine multiple rows. GROUP BY organizes rows by category before aggregating. HAVING filters groups. This article teaches reporting queries.

Why GROUP BY & HAVING Matter for Backend Developers

APIs need summary data for dashboards:

Frontend asks: "Show average exam marks per class"
v
Backend needs: Average marks grouped by class, only for classes with 10+ students
v
GROUP BY ClassName + AVG(MarksObtained)
HAVING COUNT(*) >= 10
v
Class summary report

GROUP BY & HAVING power reports.

GROUP BY Syntax

SELECT
GroupColumn,
COUNT(*) AS RowCount,
SUM(NumericColumn) AS Total,
AVG(NumericColumn) AS Average
FROM Table
GROUP BY GroupColumn;

Rule: All non-aggregate columns in SELECT must be in GROUP BY.

Example 1: Count Students per Class

Group students by class:

SELECT
s.ClassName,
COUNT(s.StudentId) AS StudentCount
FROM Student s
GROUP BY s.ClassName
ORDER BY s.ClassName;

Result (students per class):

ClassName | StudentCount
10-A | 3
10-B | 2
11-A | 4

GROUP BY ClassName groups all students by class, COUNT per group.

Example 2: Average Exam Marks by Subject

Get average marks per subject:

SELECT
sub.Name AS Subject,
COUNT(r.ExamResultId) AS StudentsTested,
AVG(r.MarksObtained) AS AvgMarks,
MAX(r.MarksObtained) AS HighestMarks,
MIN(r.MarksObtained) AS LowestMarks
FROM Subject sub
LEFT JOIN Exam e ON sub.SubjectId = e.SubjectId
LEFT JOIN ExamResult r ON e.ExamId = r.ExamId
GROUP BY sub.SubjectId, sub.Name
ORDER BY AvgMarks DESC;

Result (subject statistics):

Subject | StudentsTested | AvgMarks | HighestMarks | LowestMarks
Maths | 12 | 82.5 | 98 | 65
Science | 11 | 79.3 | 95 | 58
English | 10 | 85.2 | 99 | 70

Multiple aggregates per group. LEFT JOIN includes subjects with no results (NULL counts).

Example 3: HAVING Clause (filter groups)

Show classes with average marks > 80:

SELECT
s.ClassName,
COUNT(s.StudentId) AS StudentCount,
AVG(r.MarksObtained) AS AvgMarks
FROM Student s
LEFT JOIN ExamResult r ON s.StudentId = r.StudentId
GROUP BY s.StudentId, s.ClassName
HAVING AVG(r.MarksObtained) > 80
ORDER BY AvgMarks DESC;

Result (only high-performing classes):

ClassName | StudentCount | AvgMarks
10-A | 3 | 82.5
11-A | 4 | 85.3

HAVING filters groups AFTER grouping. WHERE filters rows BEFORE grouping.

Example 4: Multiple GROUP BY Columns

Exam results grouped by class and subject:

SELECT
s.ClassName,
sub.Name AS Subject,
COUNT(r.ExamResultId) AS StudentCount,
AVG(r.MarksObtained) AS AvgMarks
FROM Student s
INNER JOIN ExamResult r ON s.StudentId = r.StudentId
INNER JOIN Exam e ON r.ExamId = e.ExamId
INNER JOIN Subject sub ON e.SubjectId = sub.SubjectId
GROUP BY s.ClassName, sub.SubjectId, sub.Name
ORDER BY s.ClassName, sub.Name;

Result (per class per subject):

ClassName | Subject | StudentCount | AvgMarks
10-A | English | 2 | 82.5
10-A | Maths | 3 | 80.2
10-B | Maths | 2 | 79.8
11-A | English | 3 | 85.5

GROUP BY multiple columns creates separate group per combination.

Example 5: Fee Collection Report

Total fees by payment status:

SELECT
f.Status AS PaymentStatus,
COUNT(f.FeeAccountId) AS StudentCount,
SUM(f.TotalFees) AS TotalAmount,
SUM(f.PaidAmount) AS AmountPaid,
SUM(f.TotalFees) - SUM(f.PaidAmount) AS AmountDue
FROM FeeAccount f
GROUP BY f.Status
ORDER BY TotalAmount DESC;

Result (fee summary):

PaymentStatus | StudentCount | TotalAmount | AmountPaid | AmountDue
Paid | 45 | 2250000 | 2250000 | 0
Partial | 12 | 600000 | 300000 | 300000
Pending | 8 | 400000 | 0 | 400000

GROUP BY payment status shows financial summary.

Example 6: Attendance Report per Subject

Student attendance percentage per subject:

SELECT
sub.Name AS Subject,
COUNT(DISTINCT a.StudentId) AS StudentCount,
COUNT(DISTINCT a.Date) AS ClassesHeld,
SUM(CASE WHEN a.IsPresent = 1 THEN 1 ELSE 0 END) AS TotalPresent,
CAST(
SUM(CASE WHEN a.IsPresent = 1 THEN 1 ELSE 0 END) * 100.0 /
COUNT(a.AttendanceId) AS DECIMAL(5,2)
) AS AttendancePercentage
FROM Subject sub
LEFT JOIN Attendance a ON sub.SubjectId = a.SubjectId
GROUP BY sub.SubjectId, sub.Name
HAVING COUNT(a.AttendanceId) > 0
ORDER BY AttendancePercentage DESC;

Result (attendance statistics):

Subject | StudentCount | ClassesHeld | TotalPresent | AttendancePercentage
Maths | 30 | 40 | 1140 | 95.00
English | 28 | 40 | 1064 | 94.86

Complex formula: attendance percentage per subject.

Example 7: Teachers with Multiple Subjects

Teachers and subject count:

SELECT
t.Name AS Teacher,
COUNT(sub.SubjectId) AS SubjectCount
FROM Teacher t
LEFT JOIN Subject sub ON t.TeacherId = sub.TeacherId
GROUP BY t.TeacherId, t.Name
HAVING COUNT(sub.SubjectId) >= 2
ORDER BY SubjectCount DESC;

Result (teachers with 2+ subjects):

Teacher | SubjectCount
Dr. Mehta | 3
Mrs. Rao | 2

HAVING filters to teachers with at least 2 subjects.


GROUP BY with WHERE and HAVING

WHERE filters rows BEFORE grouping. HAVING filters groups AFTER grouping.

SELECT
s.ClassName,
AVG(r.MarksObtained) AS AvgMarks
FROM Student s
INNER JOIN ExamResult r ON s.StudentId = r.StudentId
WHERE r.MarksObtained >= 60 -- Filter: only pass marks
GROUP BY s.ClassName
HAVING AVG(r.MarksObtained) > 75 -- Filter: class average > 75
ORDER BY AvgMarks DESC;

WHERE: Exclude fail marks. HAVING: Only show classes averaging > 75.


Common GROUP BY Mistakes

Mistake 1: Non-aggregate column not in GROUP BY

Wrong:

SELECT
s.ClassName,
s.Name, -- Name not in GROUP BY!
COUNT(r.ExamResultId) AS ExamCount
FROM Student s
LEFT JOIN ExamResult r ON s.StudentId = r.StudentId
GROUP BY s.ClassName;
-- ERROR: Cannot select non-aggregate column (Name) not in GROUP BY

Result: SQL error.

Fix: Add to GROUP BY or aggregate:

SELECT
s.ClassName,
COUNT(r.ExamResultId) AS ExamCount
FROM Student s
LEFT JOIN ExamResult r ON s.StudentId = r.StudentId
GROUP BY s.ClassName;

Mistake 2: Using WHERE instead of HAVING

Wrong:

SELECT
s.ClassName,
AVG(r.MarksObtained) AS AvgMarks
FROM Student s
LEFT JOIN ExamResult r ON s.StudentId = r.StudentId
GROUP BY s.ClassName
WHERE AVG(r.MarksObtained) > 80;
-- ERROR: Cannot use aggregate in WHERE

Result: Invalid syntax error.

Fix: Use HAVING:

SELECT
s.ClassName,
AVG(r.MarksObtained) AS AvgMarks
FROM Student s
LEFT JOIN ExamResult r ON s.StudentId = r.StudentId
GROUP BY s.ClassName
HAVING AVG(r.MarksObtained) > 80;

Mistake 3: GROUP BY with NULL causes unexpected group

Wrong:

SELECT
sub.Name AS Subject,
COUNT(r.ExamResultId) AS ExamCount
FROM Subject sub
LEFT JOIN ExamResult r ON sub.SubjectId = r.SubjectId
GROUP BY sub.SubjectId, sub.Name;
-- Subject with no exam results: NULL group created

Result includes NULL group (subjects never tested).

Fix: Filter if needed:

SELECT
sub.Name AS Subject,
COUNT(r.ExamResultId) AS ExamCount
FROM Subject sub
LEFT JOIN ExamResult r ON sub.SubjectId = r.SubjectId
GROUP BY sub.SubjectId, sub.Name
HAVING COUNT(r.ExamResultId) > 0;
-- Exclude untested subjects

Mistake 4: Wrong aggregate function

Wrong:

SELECT
s.ClassName,
MAX(s.Name) AS StudentName -- MAX on VARCHAR meaningless
FROM Student s
GROUP BY s.ClassName;

Result: Max student name alphabetically (nonsense).

Fix: Use correct aggregate:

SELECT
s.ClassName,
COUNT(*) AS StudentCount
FROM Student s
GROUP BY s.ClassName;

Mistake 5: GROUP BY with ORDER BY wrong column

Wrong:

SELECT
s.ClassName,
AVG(r.MarksObtained) AS AvgMarks
FROM Student s
LEFT JOIN ExamResult r ON s.StudentId = r.StudentId
GROUP BY s.ClassName
ORDER BY s.Name; -- ORDER BY: s.Name not in GROUP BY
-- ERROR

Fix: ORDER BY grouped or aggregate column:

SELECT
s.ClassName,
AVG(r.MarksObtained) AS AvgMarks
FROM Student s
LEFT JOIN ExamResult r ON s.StudentId = r.StudentId
GROUP BY s.ClassName
ORDER BY AvgMarks DESC;

Best Practices for GROUP BY & HAVING

  1. All non-aggregate columns in GROUP BY -- Strict rule
  2. Use HAVING for group filters -- Not WHERE
  3. Use WHERE for row pre-filters -- Before grouping
  4. Multiple GROUP BY for detail -- Drill-down reports
  5. Test NULL handling -- LEFT JOINs with GROUP BY create NULL groups
  6. Order by aggregate -- For top-N reports
  7. Use derived table for complex logic -- If multiple GROUP BY levels needed

🎯 Q1: What's the difference between WHERE and HAVING?

WHERE: Filters BEFORE grouping (individual rows).

FROM Student s
WHERE s.ClassName = '10-A' -- Filter: only 10-A students

HAVING: Filters AFTER grouping (groups).

GROUP BY s.ClassName
HAVING COUNT(*) > 5 -- Filter: classes with 5+ students

WHERE: Row filter. HAVING: Group filter.

🎯 Q2: Why must all non-aggregate columns be in GROUP BY?

Each group must have single value per column:

SELECT s.ClassName, COUNT(*) FROM Student s GROUP BY s.ClassName;
-- ClassName has 1 value per group (all 10-A students)
-- COUNT(*) is aggregate (1 value per group)
-- Valid

If Name not in GROUP BY:

SELECT s.ClassName, s.Name FROM Student s GROUP BY s.ClassName;
-- ClassName: 1 value (10-A)
-- Name: multiple values (Ravi, Priya, Arjun) -- which one to show?
-- ERROR

SQL requires deterministic result: 1 row per group, 1 column value per row.

🎯 Q3: Can I GROUP BY by column number (SELECT position)?

Yes, some databases allow it (not SQL Server standard):

SELECT ClassName, COUNT(*) FROM Student
GROUP BY 1; -- Group by first column (ClassName)

Better: Explicit column name:

SELECT ClassName, COUNT(*) FROM Student
GROUP BY ClassName;

More readable. Always use explicit column names.

🎯 Q4: What happens with GROUP BY and NULL?

NULL values grouped together:

SELECT FeeStatus, COUNT(*) FROM FeeAccount
GROUP BY FeeStatus;

Result:
FeeStatus | COUNT
NULL | 5 <- NULL is a group
Paid | 45
Pending | 8

NULL = NULL, so all NULL rows in one group.

To handle: Use COALESCE:

SELECT COALESCE(FeeStatus, 'Unknown') AS Status, COUNT(*)
FROM FeeAccount
GROUP BY FeeStatus;
🎯 Q5: How do I get top N groups (e.g., top 5 classes by student count)?

Use ORDER BY and TOP:

SELECT TOP 5
s.ClassName,
COUNT(*) AS StudentCount
FROM Student s
GROUP BY s.ClassName
ORDER BY StudentCount DESC;

Result: Top 5 classes by student count.

With OFFSET/FETCH:

SELECT
s.ClassName,
COUNT(*) AS StudentCount
FROM Student s
GROUP BY s.ClassName
ORDER BY StudentCount DESC
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY;

Same result, more explicit.

🎯 Q6: When would I use DISTINCT instead of GROUP BY?

DISTINCT: Get unique values only (no aggregates).

SELECT DISTINCT ClassName FROM Student;
-- Unique class names

GROUP BY: Aggregates per group.

SELECT ClassName, COUNT(*) FROM Student
GROUP BY ClassName;
-- Count per class

Use DISTINCT for unique values. GROUP BY for aggregates. DISTINCT simpler if no aggregate needed.


🤖Use AI to Learn Faster
⚠️ Important for beginners: Do NOT use AI to write your code yet. Type every example yourself. Your brain learns by doing, not by reading AI output. Use AI only to explain and quiz you — not to code for you. Once you have strong fundamentals, AI becomes a powerful productivity tool for repetitive tasks.

Use ChatGPT, Claude, or Copilot to go deeper on GROUP BY and HAVING for reporting in SQL Server. Try these prompts:

  • "Explain GROUP BY: average exam marks per class"
  • "What's the difference between WHERE and HAVING?"
  • "Show multiple GROUP BY columns: class and subject statistics"
  • "How do I use GROUP BY to create a fee collection report?"
  • "Quiz me: ask 5 questions about GROUP BY, HAVING, aggregates"

💡 Tip: After reading this article, paste your own code into AI and ask "What could go wrong here and why?" — fastest way to find edge cases and deepen understanding.

Next Article

21. String Functions -- Manipulating Text in SQL ->

nexcoding.in