[SOLVED] Finding the count of duplicates between two values within a function

Issue

This Content is from Stack Overflow. Question asked by zachs snachs

So I have this problem:

Create a function skillsInRange(n1 int, n2 int) returns the count ofWesterosis that have at least n1 skills and at most n2 skills. Test yourqueries with inputs:

With the relevant table:

”’
INSERT INTO WesterosiSkill VALUES (1001,’Archery’), (1001,’Politics’), (1002,’Archery’), (1002,’Politics’), (1004,’Politics’), (1004,’Archery’), (1005,’Politics’), (1005,’Archery’), (1005,’Swordsmanship’), (1006,’Archery’), (1006,’HorseRiding’), … ………………………..

”’
(wid, skill) — Columns

So, it seems obvious that we only need to count the duplicate values in wid (first column). And then our second condition is only keeping the ones that have between n1 and n2 occurrences.

So I created this function:

CREATE FUNCTION skillsInRange (n1 int, n2 int)
RETURNS INTEGER AS
$$
SELECT COUNT(wid) AS wcount
FROM westerosiSkill
GROUP BY wid
HAVING wcount BETWEEN n1 AND n2
$$ LANGUAGE SQL;

And a few variants of it. None of which have worked. In this version, it tells me that ‘wcount’ doesn’t exist even when I change ‘HAVING’ -> ‘WHERE’. The SELECT COUNT, FROM, GROUP BY returns the incorrect number of occurrences for each wid. And any of the ‘HAVING’ clauses I’ve tried returns the incorrect number of occurrences between these two values, or in this instance, is syntactically incorrect due to the aggregate functions.



Solution

You need two levels of aggregation:

CREATE FUNCTION skillsinrange (n1 int, n2 int)
  RETURNS int
  LANGUAGE sql AS
$func$
SELECT count(*)::int
FROM  (
   SELECT count(*)::int AS wcount
   FROM   westerosiSkill
   GROUP  BY wid
   ) sub
WHERE wcount BETWEEN n1 AND n2;
$func$;

This is assuming that (wid, skill) is the PK, so both columns are defined NOT NULL, and the combination UNIQUE. Else you need to do more.

The function wrapper seems arbitrary. You might just use the plain query.

The cast to integer is just because you show a function signature with type integer for input and output. You might use bigint instead and ditch the casts.

For the error messages you saw, consider the manual:

An output column’s name can be used to refer to the column’s value in
ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING
clauses; there you must write out the expression instead.

See:


This Question was asked in StackOverflow by zachs snachs and Answered by Erwin Brandstetter It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?