In the example below I have a bibliographic table with authors and their papers. For example, authors ‘001’ and ‘003’ wrote article ‘678’ together.
articleId | authorId
123 | 001
123 | 002
345 | 002
345 | 003
345 | 004
678 | 001
678 | 003
I need to select co-occurrences between authors based on their common authorship. For example, for table above I need to construct the following table:
AuthorA | AuthorB
001 | 002
002 | 003
002 | 004
003 | 004
001 | 003
First table is very large (approx. 1.800.000 rows). When I first try with MS SQL Server 2008, construction of the second table was fast, but I’m stuck with MySQL now. I use the following query:
SELECT foo.authorId AS authorA, bar.authorId AS authorB
FROM
(SELECT * FROM tblAuthorHasBib) AS foo,
(SELECT * FROM tblAuthorHasBib) AS bar
WHERE
foo.articleId = bar.articleId
AND
foo.authorId <> bar.authorId
GROUP BY foo.authorId, bar.authorId
How to rewrite my query to be as fast as with MS SQL? Thanks in advance for any pointer.
You could write your query as shown below, which will avoid any using the
GROUP BYclause and any in-line views.Alternatively, as per @1osmi’s comment, if you wanted only unique permutations of authors then you could replace the
!=with<as shown below