mysql5.6.9rc 子查询优化之后带来的问题

mysql5.6.9rc 子查询优化之后带来的问题

测试数据如下:
表t1和t2通过t1_id关联在一起;
t1中只有一条记录的type为0,另外三条记录的type为1;
对于t1中的每个t1_id, t2中都有6条记录与其关联。

CREATE TABLE `t1` (
  `t1_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `type` INT(10) UNSIGNED NOT NULL DEFAULT '0',
  PRIMARY KEY (`t1_id`),
  KEY `type` (`type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
CREATE TABLE `t2` (
  `t2_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `t1_id` INT(10) UNSIGNED NOT NULL DEFAULT '0',
  PRIMARY KEY (`t2_id`),
  KEY `t1_id` (`t1_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
 
INSERT INTO t1 SET TYPE=0;
INSERT INTO t1 SET TYPE=1;
INSERT INTO t1 SET TYPE=1;
INSERT INTO t1 SET TYPE=1;
 
INSERT INTO t2 SET t1_id=1;
INSERT INTO t2 SET t1_id=1;
INSERT INTO t2 SET t1_id=1;
INSERT INTO t2 SET t1_id=1;
INSERT INTO t2 SET t1_id=1;
INSERT INTO t2 SET t1_id=1;
 
INSERT INTO t2 SET t1_id=2;
INSERT INTO t2 SET t1_id=2;
INSERT INTO t2 SET t1_id=2;
INSERT INTO t2 SET t1_id=2;
INSERT INTO t2 SET t1_id=2;
INSERT INTO t2 SET t1_id=2;
 
INSERT INTO t2 SET t1_id=3;
INSERT INTO t2 SET t1_id=3;
INSERT INTO t2 SET t1_id=3;
INSERT INTO t2 SET t1_id=3;
INSERT INTO t2 SET t1_id=3;
INSERT INTO t2 SET t1_id=3;
 
INSERT INTO t2 SET t1_id=4;
INSERT INTO t2 SET t1_id=4;
INSERT INTO t2 SET t1_id=4;
INSERT INTO t2 SET t1_id=4;
INSERT INTO t2 SET t1_id=4;
INSERT INTO t2 SET t1_id=4;
 
mysql> SELECT * FROM t1;
+-------+------+
| t1_id | TYPE |
+-------+------+
|     1 |    0 |
|     2 |    1 |
|     3 |    1 |
|     4 |    1 |
+-------+------+
4 ROWS IN SET (0.00 sec)
 
mysql> SELECT * FROM t2;
+-------+-------+
| t2_id | t1_id |
+-------+-------+
|     1 |     1 |
|     2 |     1 |
|     3 |     1 |
|     4 |     1 |
|     5 |     1 |
|     6 |     1 |
|     7 |     2 |
|     8 |     2 |
|     9 |     2 |
|    10 |     2 |
|    11 |     2 |
|    12 |     2 |
|    13 |     3 |
|    14 |     3 |
|    15 |     3 |
|    16 |     3 |
|    17 |     3 |
|    18 |     3 |
|    19 |     4 |
|    20 |     4 |
|    21 |     4 |
|    22 |     4 |
|    23 |     4 |
|    24 |     4 |
+-------+-------+
24 ROWS IN SET (0.00 sec)

写一条sql语句,在t1中按照type分组,在每组中各取一个t1_id,找出它们对应的t2_id来,
个人比较习惯的还是使用in:

SELECT  t2_id,t1_id FROM t2 WHERE t2.t1_id IN (SELECT t1_id FROM t1 GROUP BY TYPE);

在mysql5.5中看一下:

 
mysql> SELECT version();
+------------+
| version()  |
+------------+
| 5.5.22-log |
+------------+
1 ROW IN SET (0.02 sec)
 
mysql> SELECT  t2_id,t1_id FROM t2 WHERE t2.t1_id IN (SELECT t1_id FROM t1 GROUP BY TYPE);
+-------+-------+
| t2_id | t1_id |
+-------+-------+
|     1 |     1 |
|     2 |     1 |
|     3 |     1 |
|     4 |     1 |
|     5 |     1 |
|     6 |     1 |
|     7 |     2 |
|     8 |     2 |
|     9 |     2 |
|    10 |     2 |
|    11 |     2 |
|    12 |     2 |
+-------+-------+
12 ROWS IN SET (0.01 sec)
 
mysql> EXPLAIN SELECT  t2_id,t1_id FROM t2 WHERE t2.t1_id IN (SELECT t1_id FROM t1 GROUP BY TYPE);
+----+--------------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type        | TABLE | TYPE  | possible_keys | KEY   | key_len | REF  | ROWS | Extra                    |
+----+--------------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
|  1 | PRIMARY            | t2    | INDEX | NULL          | t1_id | 4       | NULL |   24 | USING WHERE; USING INDEX |
|  2 | DEPENDENT SUBQUERY | t1    | INDEX | NULL          | TYPE  | 4       | NULL |    1 | USING INDEX              |
+----+--------------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
2 ROWS IN SET (0.00 sec)

返回12条数据,用到了DEPENDENT SUBQUERY, 如果数据量大的话, 效率会非常低。
可以把此语句改写成inner join的形式。

据说mysql 5.6对于子查询做了相当不错的优化,
那就看看mysql5.6.9的表现:

mysql> SELECT version();
+--------------+
| version()    |
+--------------+
| 5.6.9-rc-log |
+--------------+
1 ROW IN SET (0.00 sec)
 
mysql> SELECT  t2_id,t1_id FROM t2 WHERE t2.t1_id IN (SELECT t1_id FROM t1 GROUP BY TYPE);
+-------+-------+
| t2_id | t1_id |
+-------+-------+
|     1 |     1 |
|     2 |     1 |
|     3 |     1 |
|     4 |     1 |
|     5 |     1 |
|     6 |     1 |
|     7 |     2 |
|     8 |     2 |
|     9 |     2 |
|    10 |     2 |
|    11 |     2 |
|    12 |     2 |
|    13 |     3 |
|    14 |     3 |
|    15 |     3 |
|    16 |     3 |
|    17 |     3 |
|    18 |     3 |
|    19 |     4 |
|    20 |     4 |
|    21 |     4 |
|    22 |     4 |
|    23 |     4 |
|    24 |     4 |
+-------+-------+
24 ROWS IN SET (0.00 sec)
 
mysql> EXPLAIN SELECT  t2_id,t1_id FROM t2 WHERE t2.t1_id IN (SELECT t1_id FROM t1 GROUP BY TYPE);
+----+-------------+-------+-------+---------------+-------+---------+---------------+------+-------------+
| id | select_type | TABLE | TYPE  | possible_keys | KEY   | key_len | REF           | ROWS | Extra       |
+----+-------------+-------+-------+---------------+-------+---------+---------------+------+-------------+
|  1 | SIMPLE      | t1    | INDEX | PRIMARY       | TYPE  | 4       | NULL          |    4 | USING INDEX |
|  1 | SIMPLE      | t2    | REF   | t1_id         | t1_id | 4       | test.t1.t1_id |    1 | USING INDEX |
+----+-------------+-------+-------+---------------+-------+---------+---------------+------+-------------+
2 ROWS IN SET (0.00 sec)

竟然返回了全部的24条记录……DEPENDENT SUBQUERY被优化成了SIMPLE, 效率肯定是提高了,可惜返回的结果集有问题…

About 花荣

He is the founder, designer, and managing editor of zhaokunyao.com, and he is perpetually behind schedule.
This entry was posted in mysql and tagged . Bookmark the permalink.

One Response to mysql5.6.9rc 子查询优化之后带来的问题

  1. Louis hust says:

    首先你的subquery的用法不是标准的group by用法,而是mysql extensions to group by的语法,因为你的查询列没有出现在group by list里面,而且查询列t1_id并不能保证每个group里唯一。
    然后是mysql 优化的问题,5.6将subquery优化成semi join的时候,如果子查询中仅有group by约束,没有having或者集函数时,则会直接把group by过滤掉(参见remove_redundant_subquery_clauses函数),因为optimizer以为你的查询列必然是group里面的列, 优化掉group by对查询列没有影响,所以这就导致了原始语句转化成:
    select `t2`.`t2_id` AS `t2_id`,`t2`.`t1_id` AS `t1_id` from `t2` semi join (`t1`) where (1 and (`t2`.`t1_id` = `t1`.`t1_id`))
    所以现在有两种途径解决这个问题:
    1. 让你的group by语法是规范的语法,而不要用mysql扩展的语法,而且这种扩展的语法在其他数据库是直接报错的,迁移性也不好。
    2. 在group by后面增加一个having子句,如having t1_id > -1,这种恒真的语法,让执行计划发生变化,当然这样优化器就利用不了semi join了,而是subquery类型了。

    我查了下mysql bug系统,你好像没提交bug,我建了个新bug,用了你的测试用例。

    http://bugs.mysql.com/bug.php?id=68254

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>