Record number of uploads of a Debian package in an arbitrary 24-hour window


Since Dimitri has given me the SQL virus I have a hard time avoiding opportunities for twisting my brain.

Seeing the latest post from Chris Lamb made me wonder: how hard would it be to do better? Splitting by date is rather arbitrary (the split may even depend on the timezone you’re using when you’re doing the query), so let’s try to find out the maximum number of uploads that happened for each package in any 24 hour window.

First, for each upload, we get how many uploads of the same package happened in the subsequent 24 hours.

SELECT
  source,
  date,
  (
    SELECT
      count(*)
    FROM
      upload_history AS other_upload
    WHERE
      other_upload.source = first_upload.source
      AND other_upload.date >= first_upload.date
      AND other_upload.date < first_upload.date + '24 hours') AS count
  FROM
    upload_history AS first_upload

For each source package, we want the maximum count of uploads in a 24 hour window.

SELECT
  source,
  max(count)
FROM
  upload_counts
GROUP BY
  source

We can then join both queries together, to get the 24-hour window in which the most uploads of a given source package has happened.

WITH upload_counts AS (
  SELECT
    source,
    date,
    (
      SELECT
        count(*)
      FROM
        upload_history AS other_upload
      WHERE
        other_upload.source = first_upload.source
        AND other_upload.date >= first_upload.date
        AND other_upload.date < first_upload.date + '24 hours') AS count
    FROM
      upload_history AS first_upload
)
SELECT
  source,
  date,
  count
FROM
  upload_counts
INNER JOIN (
  SELECT
    source,
    max(count) AS max_uploads
  FROM
    upload_counts
  GROUP BY
    source
  ) AS m
  USING (source)
WHERE
  count = max_uploads
  AND max_uploads >= 9
ORDER BY
  max_uploads DESC,
  date ASC;

The results are almost the ones Chris has found, but cl-sql and live-config now have one more upload than live-boot.

       source       |          date          | count 
--------------------+------------------------+-------
 cl-sql             | 2004-04-17 03:34:52+00 |    14
 live-config        | 2010-07-15 17:19:11+00 |    14
 live-boot          | 2010-07-15 17:17:07+00 |    13
 zutils             | 2010-12-30 17:33:45+00 |    11
 belocs-locales-bin | 2005-03-20 21:05:44+00 |    10
 openerp-web        | 2010-12-30 17:32:07+00 |    10
 debconf            | 1999-09-25 18:52:37+00 |     9
 gretl              | 2000-06-16 18:53:11+00 |     9
 posh               | 2002-07-24 17:04:46+00 |     9
 module-assistant   | 2003-09-11 05:53:18+00 |     9
 live-helper        | 2007-04-20 18:16:38+00 |     9
 dxvk               | 2018-11-06 00:04:02+00 |     9
(12 lines)

Thanks to Adrian and Chris for the involuntary challenge!

,

Leave a Reply

Your email address will not be published. Required fields are marked *