1/*
2 Copyright (c) 2010, 2015, MariaDB
3
4 This program is free software; you can redistribute it and/or modify
5 it under the terms of the GNU General Public License as published by
6 the Free Software Foundation; version 2 of the License.
7
8 This program is distributed in the hope that it will be useful,
9 but WITHOUT ANY WARRANTY; without even the implied warranty of
10 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11 GNU General Public License for more details.
12
13 You should have received a copy of the GNU General Public License
14 along with this program; if not, write to the Free Software
15 Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111-1301 USA */
16
17/**
18 @file
19
20 @brief
21 Semi-join subquery optimizations code
22
23*/
24
25#ifdef USE_PRAGMA_IMPLEMENTATION
26#pragma implementation // gcc: Class implementation
27#endif
28
29#include "mariadb.h"
30#include "sql_base.h"
31#include "sql_select.h"
32#include "filesort.h"
33#include "opt_subselect.h"
34#include "sql_test.h"
35#include <my_bit.h>
36
37/*
38 This file contains optimizations for semi-join subqueries.
39
40 Contents
41 --------
42 1. What is a semi-join subquery
43 2. General idea about semi-join execution
44 2.1 Correlated vs uncorrelated semi-joins
45 2.2 Mergeable vs non-mergeable semi-joins
46 3. Code-level view of semi-join processing
47 3.1 Conversion
48 3.1.1 Merged semi-join TABLE_LIST object
49 3.1.2 Non-merged semi-join data structure
50 3.2 Semi-joins and query optimization
51 3.2.1 Non-merged semi-joins and join optimization
52 3.2.2 Merged semi-joins and join optimization
53 3.3 Semi-joins and query execution
54
55 1. What is a semi-join subquery
56 -------------------------------
57 We use this definition of semi-join:
58
59 outer_tbl SEMI JOIN inner_tbl ON cond = {set of outer_tbl.row such that
60 exist inner_tbl.row, for which
61 cond(outer_tbl.row,inner_tbl.row)
62 is satisfied}
63
64 That is, semi-join operation is similar to inner join operation, with
65 exception that we don't care how many matches a row from outer_tbl has in
66 inner_tbl.
67
68 In SQL terms: a semi-join subquery is an IN subquery that is an AND-part of
69 the WHERE/ON clause.
70
71 2. General idea about semi-join execution
72 -----------------------------------------
73 We can execute semi-join in a way similar to inner join, with exception that
74 we need to somehow ensure that we do not generate record combinations that
75 differ only in rows of inner tables.
76 There is a number of different ways to achieve this property, implemented by
77 a number of semi-join execution strategies.
78 Some strategies can handle any semi-joins, other can be applied only to
79 semi-joins that have certain properties that are described below:
80
81 2.1 Correlated vs uncorrelated semi-joins
82 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
83 Uncorrelated semi-joins are special in the respect that they allow to
84 - execute the subquery (possible as it's uncorrelated)
85 - somehow make sure that generated set does not have duplicates
86 - perform an inner join with outer tables.
87
88 or, rephrasing in SQL form:
89
90 SELECT ... FROM ot WHERE ot.col IN (SELECT it.col FROM it WHERE uncorr_cond)
91 ->
92 SELECT ... FROM ot JOIN (SELECT DISTINCT it.col FROM it WHERE uncorr_cond)
93
94 2.2 Mergeable vs non-mergeable semi-joins
95 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
96 Semi-join operation has some degree of commutability with inner join
97 operation: we can join subquery's tables with ouside table(s) and eliminate
98 duplicate record combination after that:
99
100 ot1 JOIN ot2 SEMI_JOIN{it1,it2} (it1 JOIN it2) ON sjcond(ot2,it*) ->
101 |
102 +-------------------------------+
103 v
104 ot1 SEMI_JOIN{it1,it2} (it1 JOIN it2 JOIN ot2) ON sjcond(ot2,it*)
105
106 In order for this to work, subquery's top-level operation must be join, and
107 grouping or ordering with limit (grouping or ordering with limit are not
108 commutative with duplicate removal). In other words, the conversion is
109 possible when the subquery doesn't have GROUP BY clause, any aggregate
110 functions*, or ORDER BY ... LIMIT clause.
111
112 Definitions:
113 - Subquery whose top-level operation is a join is called *mergeable semi-join*
114 - All other kinds of semi-join subqueries are considered non-mergeable.
115
116 *- this requirement is actually too strong, but its exceptions are too
117 complicated to be considered here.
118
119 3. Code-level view of semi-join processing
120 ------------------------------------------
121
122 3.1 Conversion and pre-optimization data structures
123 ---------------------------------------------------
124 * When doing JOIN::prepare for the subquery, we detect that it can be
125 converted into a semi-join and register it in parent_join->sj_subselects
126
127 * At the start of parent_join->optimize(), the predicate is converted into
128 a semi-join node. A semi-join node is a TABLE_LIST object that is linked
129 somewhere in parent_join->join_list (either it is just present there, or
130 it is a descendant of some of its members).
131
132 There are two kinds of semi-joins:
133 - Merged semi-joins
134 - Non-merged semi-joins
135
136 3.1.1 Merged semi-join TABLE_LIST object
137 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
138 Merged semi-join object is a TABLE_LIST that contains a sub-join of
139 subquery tables and the semi-join ON expression (in this respect it is
140 very similar to nested outer join representation)
141 Merged semi-join represents this SQL:
142
143 ... SEMI JOIN (inner_tbl1 JOIN ... JOIN inner_tbl_n) ON sj_on_expr
144
145 Semi-join objects of this kind have TABLE_LIST::sj_subq_pred set.
146
147 3.1.2 Non-merged semi-join data structure
148 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
149 Non-merged semi-join object is a leaf TABLE_LIST object that has a subquery
150 that produces rows. It is similar to a base table and represents this SQL:
151
152 ... SEMI_JOIN (SELECT non_mergeable_select) ON sj_on_expr
153
154 Subquery items that were converted into semi-joins are removed from the WHERE
155 clause. (They do remain in PS-saved WHERE clause, and they replace themselves
156 with Item_int(1) on subsequent re-executions).
157
158 3.2 Semi-joins and join optimization
159 ------------------------------------
160
161 3.2.1 Non-merged semi-joins and join optimization
162 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
163 For join optimization purposes, non-merged semi-join nests are similar to
164 base tables. Each such nest is represented by one one JOIN_TAB, which has
165 two possible access strategies:
166 - full table scan (representing SJ-Materialization-Scan strategy)
167 - eq_ref-like table lookup (representing SJ-Materialization-Lookup)
168
169 Unlike regular base tables, non-merged semi-joins have:
170 - non-zero JOIN_TAB::startup_cost, and
171 - join_tab->table->is_filled_at_execution()==TRUE, which means one
172 cannot do const table detection, range analysis or other dataset-dependent
173 optimizations.
174 Instead, get_delayed_table_estimates() will run optimization for the
175 subquery and produce an E(materialized table size).
176
177 3.2.2 Merged semi-joins and join optimization
178 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
179 - optimize_semijoin_nests() does pre-optimization
180 - during join optimization, the join has one JOIN_TAB (or is it POSITION?)
181 array, and suffix-based detection is used, see advance_sj_state()
182 - after join optimization is done, get_best_combination() switches
183 the data-structure to prefix-based, multiple JOIN_TAB ranges format.
184
185 3.3 Semi-joins and query execution
186 ----------------------------------
187 * Join executor has hooks for all semi-join strategies.
188 TODO elaborate.
189
190*/
191
192/*
193EqualityPropagationAndSjmNests
194******************************
195
196Equalities are used for:
197P1. Equality propagation
198P2. Equality substitution [for a certain join order]
199
200The equality propagation is not affected by SJM nests. In fact, it is done
201before we determine the execution plan, i.e. before we even know we will use
202SJM-nests for execution.
203
204The equality substitution is affected.
205
206Substitution without SJMs
207=========================
208When one doesn't have SJM nests, tables have a strict join order:
209
210 --------------------------------->
211 t1 -- t2 -- t3 -- t4 --- t5
212
213
214 ? ^
215 \
216 --(part-of-WHERE)
217
218
219parts WHERE/ON and ref. expressions are attached at some point along the axis.
220Expression is allowed to refer to a table column if the table is to the left of
221the attachment point. For any given expression, we have a goal:
222
223 "Move leftmost allowed attachment point as much as possible to the left"
224
225Substitution with SJMs - task setting
226=====================================
227
228When SJM nests are present, there is no global strict table ordering anymore:
229
230
231 --------------------------------->
232
233 ot1 -- ot2 --- sjm -- ot4 --- ot5
234 |
235 | Main execution
236 - - - - - - - - - - - - - - - - - - - - - - - -
237 | Materialization
238 it1 -- it2 --/
239
240
241Besides that, we must take into account that
242 - values for outer table columns, otN.col, are inaccessible at
243 materialization step (SJM-RULE)
244 - values for inner table columns, itN.col, are inaccessible at Main execution
245 step, except for SJ-Materialization-Scan and columns that are in the
246 subquery's select list. (SJM-RULE)
247
248Substitution with SJMs - solution
249=================================
250
251First, we introduce global strict table ordering like this:
252
253 ot1 - ot2 --\ /--- ot3 -- ot5
254 \--- it1 --- it2 --/
255
256Now, let's see how to meet (SJM-RULE).
257
258SJ-Materialization is only applicable for uncorrelated subqueries. From this, it
259follows that any multiple equality will either
2601. include only columns of outer tables, or
2612. include only columns of inner tables, or
2623. include columns of inner and outer tables, joined together through one
263 of IN-equalities.
264
265Cases #1 and #2 can be handled in the same way as with regular inner joins.
266
267Case #3 requires special handling, so that we don't construct violations of
268(SJM-RULE). Let's consider possible ways to build violations.
269
270Equality propagation starts with the clause in this form
271
272 top_query_where AND subquery_where AND in_equalities
273
274First, it builds multi-equalities. It can also build a mixed multi-equality
275
276 multiple-equal(ot1.col, ot2.col, ... it1.col, itN.col)
277
278Multi-equalities are pushed down the OR-clauses in top_query_where and in
279subquery_where, so it's possible that clauses like this one are built:
280
281 subquery_cond OR (multiple-equal(it1.col, ot1.col,...) AND ...)
282 ^^^^^^^^^^^^^ \
283 | this must be evaluated
284 \- can only be evaluated at the main phase.
285 at the materialization phase
286
287Finally, equality substitution is started. It does two operations:
288
289
2901. Field reference substitution
291~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
292
293(In the code, this is Item_field::replace_equal_field)
294
295This is a process of replacing each reference to "tblX.col"
296with the first element of the multi-equality. (REF-SUBST-ORIG)
297
298This behaviour can cause problems with Semi-join nests. Suppose, we have a
299condition:
300
301 func(it1.col, it2.col)
302
303and a multi-equality(ot1.col, it1.col). Then, reference to "it1.col" will be
304replaced with "ot1.col", constructing a condition
305
306 func(ot1.col, it2.col)
307
308which will be a violation of (SJM-RULE).
309
310In order to avoid this, (REF-SUBST-ORIG) is amended as follows:
311
312- references to tables "itX.col" that are inner wrt some SJM nest, are
313 replaced with references to the first inner table from the same SJM nest.
314
315- references to top-level tables "otX.col" are replaced with references to
316 the first element of the multi-equality, no matter if that first element is
317 a column of a top-level table or of table from some SJM nest.
318 (REF-SUBST-SJM)
319
320 The case where the first element is a table from an SJM nest $SJM is ok,
321 because it can be proven that $SJM uses SJ-Materialization-Scan, and
322 "unpacks" correct column values to the first element during the main
323 execution phase.
324
3252. Item_equal elimination
326~~~~~~~~~~~~~~~~~~~~~~~~~
327(In the code: eliminate_item_equal) This is a process of taking
328
329 multiple-equal(a,b,c,d,e)
330
331and replacing it with an equivalent expression which is an AND of pair-wise
332equalities:
333
334 a=b AND a=c AND ...
335
336The equalities are picked such that for any given join prefix (t1,t2...) the
337subset of equalities that can be evaluated gives the most restrictive
338filtering.
339
340Without SJM nests, it is sufficient to compare every multi-equality member
341with the first one:
342
343 elem1=elem2 AND elem1=elem3 AND elem1=elem4 ...
344
345When SJM nests are present, we should take care not to construct equalities
346that violate the (SJM-RULE). This is achieved by generating separate sets of
347equalites for top-level tables and for inner tables. That is, for the join
348order
349
350 ot1 - ot2 --\ /--- ot3 -- ot5
351 \--- it1 --- it2 --/
352
353we will generate
354 ot1.col=ot2.col
355 ot1.col=ot3.col
356 ot1.col=ot5.col
357 it2.col=it1.col
358
359
3602.1 The problem with Item_equals and ORs
361~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
362As has been mentioned above, multiple equalities are pushed down into OR
363clauses, possibly building clauses like this:
364
365 func(it.col2) OR multiple-equal(it1.col1, it1.col2, ot1.col) (1)
366
367where the first part of the clause has references to inner tables, while the
368second has references to the top-level tables, which is a violation of
369(SJM-RULE).
370
371AND-clauses of this kind do not create problems, because make_cond_for_table()
372will take them apart. OR-clauses will not be split. It is possible to
373split-out the part that's dependent on the inner table:
374
375 func(it.col2) OR it1.col1=it1.col2
376
377but this is a less-restrictive condition than condition (1). Current execution
378scheme will still try to generate the "remainder" condition:
379
380 func(it.col2) OR it1.col1=ot1.col
381
382which is a violation of (SJM-RULE).
383
384QQ: "ot1.col=it1.col" is checked at the upper level. Why was it not removed
385here?
386AA: because has a proper subset of conditions that are found on this level.
387 consider a join order of ot, sjm(it)
388 and a condition
389 ot.col=it.col AND ( ot.col=it.col='foo' OR it.col2='bar')
390
391 we will produce:
392 table ot: nothing
393 table it: ot.col=it.col AND (ot.col='foo' OR it.col2='bar')
394 ^^^^ ^^^^^^^^^^^^^^^^
395 | \ the problem is that
396 | this part condition didnt
397 | receive a substitution
398 |
399 +--- it was correct to subst, 'ot' is
400 the left-most.
401
402
403Does it make sense to push "inner=outer" down into ORs?
404~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
405
406Yes. Consider the query:
407
408 select * from ot
409 where ot.col in (select it.col from it where (it.col='foo' OR it.col='bar'))
410
411here, it may be useful to infer that
412
413 (ot.col='foo' OR ot.col='bar') (CASE-FOR-SUBST)
414
415and attach that condition to the table 'ot'.
416
417Possible solutions for Item_equals and ORs
418~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
419
420Solution #1
421~~~~~~~~~~~
422Let make_cond_for_table() chop analyze the OR clauses it has produced and
423discard them if they violate (SJM-RULE). This solution would allow to handle
424cases like (CASE-FOR-SUBST) at the expense of making semantics of
425make_cond_for_table() complicated.
426
427Solution #2
428~~~~~~~~~~~
429Before the equality propagation phase, none of the OR clauses violate the
430(SJM-RULE). This way, if we remember which tables the original equality
431referred to, we can only generate equalities that refer to the outer (or inner)
432tables. Note that this will disallow handling of cases like (CASE-FOR-SUBST).
433
434Currently, solution #2 is implemented.
435*/
436
437LEX_CSTRING weedout_key= {STRING_WITH_LEN("weedout_key")};
438
439static
440bool subquery_types_allow_materialization(Item_in_subselect *in_subs);
441static bool replace_where_subcondition(JOIN *, Item **, Item *, Item *, bool);
442static int subq_sj_candidate_cmp(Item_in_subselect* el1, Item_in_subselect* el2,
443 void *arg);
444static bool convert_subq_to_sj(JOIN *parent_join, Item_in_subselect *subq_pred);
445static bool convert_subq_to_jtbm(JOIN *parent_join,
446 Item_in_subselect *subq_pred, bool *remove);
447static TABLE_LIST *alloc_join_nest(THD *thd);
448static uint get_tmp_table_rec_length(Ref_ptr_array p_list, uint elements);
449bool find_eq_ref_candidate(TABLE *table, table_map sj_inner_tables);
450static SJ_MATERIALIZATION_INFO *
451at_sjmat_pos(const JOIN *join, table_map remaining_tables, const JOIN_TAB *tab,
452 uint idx, bool *loose_scan);
453void best_access_path(JOIN *join, JOIN_TAB *s,
454 table_map remaining_tables, uint idx,
455 bool disable_jbuf, double record_count,
456 POSITION *pos, POSITION *loose_scan_pos);
457
458static Item *create_subq_in_equalities(THD *thd, SJ_MATERIALIZATION_INFO *sjm,
459 Item_in_subselect *subq_pred);
460static bool remove_sj_conds(THD *thd, Item **tree);
461static bool is_cond_sj_in_equality(Item *item);
462static bool sj_table_is_included(JOIN *join, JOIN_TAB *join_tab);
463static Item *remove_additional_cond(Item* conds);
464static void remove_subq_pushed_predicates(JOIN *join, Item **where);
465
466enum_nested_loop_state
467end_sj_materialize(JOIN *join, JOIN_TAB *join_tab, bool end_of_records);
468
469
470/*
471 Check if Materialization strategy is allowed for given subquery predicate.
472
473 @param thd Thread handle
474 @param in_subs The subquery predicate
475 @param child_select The select inside predicate (the function will
476 check it is the only one)
477
478 @return TRUE - Materialization is applicable
479 FALSE - Otherwise
480*/
481
482bool is_materialization_applicable(THD *thd, Item_in_subselect *in_subs,
483 st_select_lex *child_select)
484{
485 st_select_lex_unit* parent_unit= child_select->master_unit();
486 /*
487 Check if the subquery predicate can be executed via materialization.
488 The required conditions are:
489 0. The materialization optimizer switch was set.
490 1. Subquery is a single SELECT (not a UNION).
491 TODO: this is a limitation that can be fixed
492 2. Subquery is not a table-less query. In this case there is no
493 point in materializing.
494 2A The upper query is not a table-less SELECT ... FROM DUAL. We
495 can't do materialization for SELECT .. FROM DUAL because it
496 does not call setup_subquery_materialization(). We could make
497 SELECT ... FROM DUAL call that function but that doesn't seem
498 to be the case that is worth handling.
499 3. Either the subquery predicate is a top-level predicate, or at
500 least one partial match strategy is enabled. If no partial match
501 strategy is enabled, then materialization cannot be used for
502 non-top-level queries because it cannot handle NULLs correctly.
503 4. Subquery is non-correlated
504 TODO:
505 This condition is too restrictive (limitation). It can be extended to:
506 (Subquery is non-correlated ||
507 Subquery is correlated to any query outer to IN predicate ||
508 (Subquery is correlated to the immediate outer query &&
509 Subquery !contains {GROUP BY, ORDER BY [LIMIT],
510 aggregate functions}) && subquery predicate is not under "NOT IN"))
511 5. Subquery does not contain recursive references
512
513 A note about prepared statements: we want the if-branch to be taken on
514 PREPARE and each EXECUTE. The rewrites are only done once, but we need
515 select_lex->sj_subselects list to be populated for every EXECUTE.
516
517 */
518 if (optimizer_flag(thd, OPTIMIZER_SWITCH_MATERIALIZATION) && // 0
519 !child_select->is_part_of_union() && // 1
520 parent_unit->first_select()->leaf_tables.elements && // 2
521 child_select->outer_select()->leaf_tables.elements && // 2A
522 subquery_types_allow_materialization(in_subs) &&
523 (in_subs->is_top_level_item() || //3
524 optimizer_flag(thd,
525 OPTIMIZER_SWITCH_PARTIAL_MATCH_ROWID_MERGE) || //3
526 optimizer_flag(thd,
527 OPTIMIZER_SWITCH_PARTIAL_MATCH_TABLE_SCAN)) && //3
528 !in_subs->is_correlated && //4
529 !in_subs->with_recursive_reference) //5
530 {
531 return TRUE;
532 }
533 return FALSE;
534}
535
536
537/*
538 Check if we need JOIN::prepare()-phase subquery rewrites and if yes, do them
539
540 SYNOPSIS
541 check_and_do_in_subquery_rewrites()
542 join Subquery's join
543
544 DESCRIPTION
545 Check if we need to do
546 - subquery -> mergeable semi-join rewrite
547 - if the subquery can be handled with materialization
548 - 'substitution' rewrite for table-less subqueries like "(select 1)"
549 - IN->EXISTS rewrite
550 and, depending on the rewrite, either do it, or record it to be done at a
551 later phase.
552
553 RETURN
554 0 - OK
555 Other - Some sort of query error
556*/
557
558int check_and_do_in_subquery_rewrites(JOIN *join)
559{
560 THD *thd=join->thd;
561 st_select_lex *select_lex= join->select_lex;
562 st_select_lex_unit* parent_unit= select_lex->master_unit();
563 DBUG_ENTER("check_and_do_in_subquery_rewrites");
564
565 /*
566 IN/ALL/ANY rewrites are not applicable for so called fake select
567 (this select exists only to filter results of union if it is needed).
568 */
569 if (select_lex == select_lex->master_unit()->fake_select_lex)
570 DBUG_RETURN(0);
571
572 /*
573 If
574 1) this join is inside a subquery (of any type except FROM-clause
575 subquery) and
576 2) we aren't just normalizing a VIEW
577
578 Then perform early unconditional subquery transformations:
579 - Convert subquery predicate into semi-join, or
580 - Mark the subquery for execution using materialization, or
581 - Perform IN->EXISTS transformation, or
582 - Perform more/less ALL/ANY -> MIN/MAX rewrite
583 - Substitute trivial scalar-context subquery with its value
584
585 TODO: for PS, make the whole block execute only on the first execution
586 */
587 Item_subselect *subselect;
588 if (!thd->lex->is_view_context_analysis() && // (1)
589 (subselect= parent_unit->item)) // (2)
590 {
591 Item_in_subselect *in_subs= NULL;
592 Item_allany_subselect *allany_subs= NULL;
593 switch (subselect->substype()) {
594 case Item_subselect::IN_SUBS:
595 in_subs= (Item_in_subselect *)subselect;
596 break;
597 case Item_subselect::ALL_SUBS:
598 case Item_subselect::ANY_SUBS:
599 allany_subs= (Item_allany_subselect *)subselect;
600 break;
601 default:
602 break;
603 }
604
605
606 /* Resolve expressions and perform semantic analysis for IN query */
607 if (in_subs != NULL)
608 /*
609 TODO: Add the condition below to this if statement when we have proper
610 support for is_correlated handling for materialized semijoins.
611 If we were to add this condition now, the fix_fields() call in
612 convert_subq_to_sj() would force the flag is_correlated to be set
613 erroneously for prepared queries.
614
615 thd->stmt_arena->state != Query_arena::PREPARED)
616 */
617 {
618 SELECT_LEX *current= thd->lex->current_select;
619 thd->lex->current_select= current->return_after_parsing();
620 char const *save_where= thd->where;
621 thd->where= "IN/ALL/ANY subquery";
622
623 bool failure= !in_subs->left_expr->fixed &&
624 in_subs->left_expr->fix_fields(thd, &in_subs->left_expr);
625 thd->lex->current_select= current;
626 thd->where= save_where;
627 if (failure)
628 DBUG_RETURN(-1); /* purecov: deadcode */
629
630 /*
631 Check if the left and right expressions have the same # of
632 columns, i.e. we don't have a case like
633 (oe1, oe2) IN (SELECT ie1, ie2, ie3 ...)
634
635 TODO why do we have this duplicated in IN->EXISTS transformers?
636 psergey-todo: fix these: grep for duplicated_subselect_card_check
637 */
638 if (select_lex->item_list.elements != in_subs->left_expr->cols())
639 {
640 my_error(ER_OPERAND_COLUMNS, MYF(0), in_subs->left_expr->cols());
641 DBUG_RETURN(-1);
642 }
643 }
644
645 DBUG_PRINT("info", ("Checking if subq can be converted to semi-join"));
646 /*
647 Check if we're in subquery that is a candidate for flattening into a
648 semi-join (which is done in flatten_subqueries()). The
649 requirements are:
650 1. Subquery predicate is an IN/=ANY subq predicate
651 2. Subquery is a single SELECT (not a UNION)
652 3. Subquery does not have GROUP BY or ORDER BY
653 4. Subquery does not use aggregate functions or HAVING
654 5. Subquery predicate is at the AND-top-level of ON/WHERE clause
655 6. We are not in a subquery of a single table UPDATE/DELETE that
656 doesn't have a JOIN (TODO: We should handle this at some
657 point by switching to multi-table UPDATE/DELETE)
658 7. We're not in a table-less subquery like "SELECT 1"
659 8. No execution method was already chosen (by a prepared statement)
660 9. Parent select is not a table-less select
661 10. Neither parent nor child select have STRAIGHT_JOIN option.
662 11. It is first optimisation (the subquery could be moved from ON
663 clause during first optimisation and then be considered for SJ
664 on the second when it is too late)
665 */
666 if (optimizer_flag(thd, OPTIMIZER_SWITCH_SEMIJOIN) &&
667 in_subs && // 1
668 !select_lex->is_part_of_union() && // 2
669 !select_lex->group_list.elements && !join->order && // 3
670 !join->having && !select_lex->with_sum_func && // 4
671 in_subs->emb_on_expr_nest && // 5
672 select_lex->outer_select()->join && // 6
673 parent_unit->first_select()->leaf_tables.elements && // 7
674 !in_subs->has_strategy() && // 8
675 select_lex->outer_select()->leaf_tables.elements && // 9
676 !((join->select_options | // 10
677 select_lex->outer_select()->join->select_options) // 10
678 & SELECT_STRAIGHT_JOIN) && // 10
679 select_lex->first_cond_optimization) // 11
680 {
681 DBUG_PRINT("info", ("Subquery is semi-join conversion candidate"));
682
683 (void)subquery_types_allow_materialization(in_subs);
684
685 in_subs->is_flattenable_semijoin= TRUE;
686
687 /* Register the subquery for further processing in flatten_subqueries() */
688 if (!in_subs->is_registered_semijoin)
689 {
690 Query_arena *arena, backup;
691 arena= thd->activate_stmt_arena_if_needed(&backup);
692 select_lex->outer_select()->sj_subselects.push_back(in_subs,
693 thd->mem_root);
694 if (arena)
695 thd->restore_active_arena(arena, &backup);
696 in_subs->is_registered_semijoin= TRUE;
697 }
698 }
699 else
700 {
701 DBUG_PRINT("info", ("Subquery can't be converted to merged semi-join"));
702 /* Test if the user has set a legal combination of optimizer switches. */
703 if (!optimizer_flag(thd, OPTIMIZER_SWITCH_IN_TO_EXISTS) &&
704 !optimizer_flag(thd, OPTIMIZER_SWITCH_MATERIALIZATION))
705 my_error(ER_ILLEGAL_SUBQUERY_OPTIMIZER_SWITCHES, MYF(0));
706 /*
707 Transform each subquery predicate according to its overloaded
708 transformer.
709 */
710 if (subselect->select_transformer(join))
711 DBUG_RETURN(-1);
712
713 /*
714 If the subquery predicate is IN/=ANY, analyse and set all possible
715 subquery execution strategies based on optimizer switches and syntactic
716 properties.
717 */
718 if (in_subs && !in_subs->has_strategy())
719 {
720 if (is_materialization_applicable(thd, in_subs, select_lex))
721 {
722 in_subs->add_strategy(SUBS_MATERIALIZATION);
723
724 /*
725 If the subquery is an AND-part of WHERE register for being processed
726 with jtbm strategy
727 */
728 if (in_subs->emb_on_expr_nest == NO_JOIN_NEST &&
729 optimizer_flag(thd, OPTIMIZER_SWITCH_SEMIJOIN))
730 {
731 in_subs->is_flattenable_semijoin= FALSE;
732 if (!in_subs->is_registered_semijoin)
733 {
734 Query_arena *arena, backup;
735 arena= thd->activate_stmt_arena_if_needed(&backup);
736 select_lex->outer_select()->sj_subselects.push_back(in_subs,
737 thd->mem_root);
738 if (arena)
739 thd->restore_active_arena(arena, &backup);
740 in_subs->is_registered_semijoin= TRUE;
741 }
742 }
743 }
744
745 /*
746 IN-TO-EXISTS is the only universal strategy. Choose it if the user
747 allowed it via an optimizer switch, or if materialization is not
748 possible.
749 */
750 if (optimizer_flag(thd, OPTIMIZER_SWITCH_IN_TO_EXISTS) ||
751 !in_subs->has_strategy())
752 in_subs->add_strategy(SUBS_IN_TO_EXISTS);
753 }
754
755 /* Check if max/min optimization applicable */
756 if (allany_subs && !allany_subs->is_set_strategy())
757 {
758 uchar strategy= (allany_subs->is_maxmin_applicable(join) ?
759 (SUBS_MAXMIN_INJECTED | SUBS_MAXMIN_ENGINE) :
760 SUBS_IN_TO_EXISTS);
761 allany_subs->add_strategy(strategy);
762 }
763
764 }
765 }
766 DBUG_RETURN(0);
767}
768
769
770/**
771 @brief Check if subquery's compared types allow materialization.
772
773 @param in_subs Subquery predicate, updated as follows:
774 types_allow_materialization TRUE if subquery materialization is allowed.
775 sjm_scan_allowed If types_allow_materialization is TRUE,
776 indicates whether it is possible to use subquery
777 materialization and scan the materialized table.
778
779 @retval TRUE If subquery types allow materialization.
780 @retval FALSE Otherwise.
781
782 @details
783 This is a temporary fix for BUG#36752.
784
785 There are two subquery materialization strategies:
786
787 1. Materialize and do index lookups in the materialized table. See
788 BUG#36752 for description of restrictions we need to put on the
789 compared expressions.
790
791 2. Materialize and then do a full scan of the materialized table. At the
792 moment, this strategy's applicability criteria are even stricter than
793 in #1.
794
795 This is so because of the following: consider an uncorrelated subquery
796
797 ...WHERE (ot1.col1, ot2.col2 ...) IN (SELECT ie1,ie2,... FROM it1 ...)
798
799 and a join order that could be used to do sjm-materialization:
800
801 SJM-Scan(it1, it1), ot1, ot2
802
803 IN-equalities will be parts of conditions attached to the outer tables:
804
805 ot1: ot1.col1 = ie1 AND ... (C1)
806 ot2: ot1.col2 = ie2 AND ... (C2)
807
808 besides those there may be additional references to ie1 and ie2
809 generated by equality propagation. The problem with evaluating C1 and
810 C2 is that ie{1,2} refer to subquery tables' columns, while we only have
811 current value of materialization temptable. Our solution is to
812 * require that all ie{N} are table column references. This allows
813 to copy the values of materialization temptable columns to the
814 original table's columns (see setup_sj_materialization for more
815 details)
816 * require that compared columns have exactly the same type. This is
817 a temporary measure to avoid BUG#36752-type problems.
818*/
819
820static
821bool subquery_types_allow_materialization(Item_in_subselect *in_subs)
822{
823 DBUG_ENTER("subquery_types_allow_materialization");
824
825 DBUG_ASSERT(in_subs->left_expr->fixed);
826
827 List_iterator<Item> it(in_subs->unit->first_select()->item_list);
828 uint elements= in_subs->unit->first_select()->item_list.elements;
829
830 in_subs->types_allow_materialization= FALSE; // Assign default values
831 in_subs->sjm_scan_allowed= FALSE;
832
833 bool all_are_fields= TRUE;
834 uint32 total_key_length = 0;
835 for (uint i= 0; i < elements; i++)
836 {
837 Item *outer= in_subs->left_expr->element_index(i);
838 Item *inner= it++;
839 all_are_fields &= (outer->real_item()->type() == Item::FIELD_ITEM &&
840 inner->real_item()->type() == Item::FIELD_ITEM);
841 total_key_length += inner->max_length;
842 if (!inner->type_handler()->subquery_type_allows_materialization(inner,
843 outer))
844 DBUG_RETURN(FALSE);
845 }
846
847 /*
848 Make sure that create_tmp_table will not fail due to too long keys.
849 See MDEV-7122. This check is performed inside create_tmp_table also and
850 we must do it so that we know the table has keys created.
851 Make sure that the length of the key for the temp_table is atleast
852 greater than 0.
853 */
854 if (!total_key_length || total_key_length > tmp_table_max_key_length() ||
855 elements > tmp_table_max_key_parts())
856 DBUG_RETURN(FALSE);
857
858 in_subs->types_allow_materialization= TRUE;
859 in_subs->sjm_scan_allowed= all_are_fields;
860 DBUG_PRINT("info",("subquery_types_allow_materialization: ok, allowed"));
861 DBUG_RETURN(TRUE);
862}
863
864
865/**
866 Apply max min optimization of all/any subselect
867*/
868
869bool JOIN::transform_max_min_subquery()
870{
871 DBUG_ENTER("JOIN::transform_max_min_subquery");
872 Item_subselect *subselect= unit->item;
873 if (!subselect || (subselect->substype() != Item_subselect::ALL_SUBS &&
874 subselect->substype() != Item_subselect::ANY_SUBS))
875 DBUG_RETURN(0);
876 DBUG_RETURN(((Item_allany_subselect *) subselect)->
877 transform_into_max_min(this));
878}
879
880
881/*
882 Finalize IN->EXISTS conversion in case we couldn't use materialization.
883
884 DESCRIPTION Invoke the IN->EXISTS converter
885 Replace the Item_in_subselect with its wrapper Item_in_optimizer in WHERE.
886
887 RETURN
888 FALSE - Ok
889 TRUE - Fatal error
890*/
891
892bool make_in_exists_conversion(THD *thd, JOIN *join, Item_in_subselect *item)
893{
894 DBUG_ENTER("make_in_exists_conversion");
895 JOIN *child_join= item->unit->first_select()->join;
896 bool res;
897
898 /*
899 We're going to finalize IN->EXISTS conversion.
900 Normally, IN->EXISTS conversion takes place inside the
901 Item_subselect::fix_fields() call, where item_subselect->fixed==FALSE (as
902 fix_fields() haven't finished yet) and item_subselect->changed==FALSE (as
903 the conversion haven't been finalized)
904
905 At the end of Item_subselect::fix_fields() we had to set fixed=TRUE,
906 changed=TRUE (the only other option would have been to return error).
907
908 So, now we have to set these back for the duration of select_transformer()
909 call.
910 */
911 item->changed= 0;
912 item->fixed= 0;
913
914 SELECT_LEX *save_select_lex= thd->lex->current_select;
915 thd->lex->current_select= item->unit->first_select();
916
917 res= item->select_transformer(child_join);
918
919 thd->lex->current_select= save_select_lex;
920
921 if (res)
922 DBUG_RETURN(TRUE);
923
924 item->changed= 1;
925 item->fixed= 1;
926
927 Item *substitute= item->substitution;
928 bool do_fix_fields= !item->substitution->fixed;
929 /*
930 The Item_subselect has already been wrapped with Item_in_optimizer, so we
931 should search for item->optimizer, not 'item'.
932 */
933 Item *replace_me= item->optimizer;
934 DBUG_ASSERT(replace_me==substitute);
935
936 Item **tree= (item->emb_on_expr_nest == NO_JOIN_NEST)?
937 &join->conds : &(item->emb_on_expr_nest->on_expr);
938 if (replace_where_subcondition(join, tree, replace_me, substitute,
939 do_fix_fields))
940 DBUG_RETURN(TRUE);
941 item->substitution= NULL;
942
943 /*
944 If this is a prepared statement, repeat the above operation for
945 prep_where (or prep_on_expr).
946 */
947 if (!thd->stmt_arena->is_conventional())
948 {
949 tree= (item->emb_on_expr_nest == (TABLE_LIST*)NO_JOIN_NEST)?
950 &join->select_lex->prep_where :
951 &(item->emb_on_expr_nest->prep_on_expr);
952
953 if (replace_where_subcondition(join, tree, replace_me, substitute,
954 FALSE))
955 DBUG_RETURN(TRUE);
956 }
957 DBUG_RETURN(FALSE);
958}
959
960
961bool check_for_outer_joins(List<TABLE_LIST> *join_list)
962{
963 TABLE_LIST *table;
964 NESTED_JOIN *nested_join;
965 List_iterator<TABLE_LIST> li(*join_list);
966 while ((table= li++))
967 {
968 if ((nested_join= table->nested_join))
969 {
970 if (check_for_outer_joins(&nested_join->join_list))
971 return TRUE;
972 }
973
974 if (table->outer_join)
975 return TRUE;
976 }
977 return FALSE;
978}
979
980
981void find_and_block_conversion_to_sj(Item *to_find,
982 List_iterator_fast<Item_in_subselect> &li)
983{
984 if (to_find->type() == Item::FUNC_ITEM &&
985 ((Item_func*)to_find)->functype() == Item_func::IN_OPTIMIZER_FUNC)
986 to_find= ((Item_in_optimizer*)to_find)->get_wrapped_in_subselect_item();
987
988 if (to_find->type() != Item::SUBSELECT_ITEM ||
989 ((Item_subselect *) to_find)->substype() != Item_subselect::IN_SUBS)
990 return;
991 Item_in_subselect *in_subq;
992 li.rewind();
993 while ((in_subq= li++))
994 {
995 if (in_subq == to_find)
996 {
997 in_subq->block_conversion_to_sj();
998 return;
999 }
1000 }
1001}
1002
1003
1004/*
1005 Convert semi-join subquery predicates into semi-join join nests
1006
1007 SYNOPSIS
1008 convert_join_subqueries_to_semijoins()
1009
1010 DESCRIPTION
1011
1012 Convert candidate subquery predicates into semi-join join nests. This
1013 transformation is performed once in query lifetime and is irreversible.
1014
1015 Conversion of one subquery predicate
1016 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1017 We start with a join that has a semi-join subquery:
1018
1019 SELECT ...
1020 FROM ot, ...
1021 WHERE oe IN (SELECT ie FROM it1 ... itN WHERE subq_where) AND outer_where
1022
1023 and convert it into a semi-join nest:
1024
1025 SELECT ...
1026 FROM ot SEMI JOIN (it1 ... itN), ...
1027 WHERE outer_where AND subq_where AND oe=ie
1028
1029 that is, in order to do the conversion, we need to
1030
1031 * Create the "SEMI JOIN (it1 .. itN)" part and add it into the parent
1032 query's FROM structure.
1033 * Add "AND subq_where AND oe=ie" into parent query's WHERE (or ON if
1034 the subquery predicate was in an ON expression)
1035 * Remove the subquery predicate from the parent query's WHERE
1036
1037 Considerations when converting many predicates
1038 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1039 A join may have at most MAX_TABLES tables. This may prevent us from
1040 flattening all subqueries when the total number of tables in parent and
1041 child selects exceeds MAX_TABLES.
1042 We deal with this problem by flattening children's subqueries first and
1043 then using a heuristic rule to determine each subquery predicate's
1044 "priority".
1045
1046 RETURN
1047 FALSE OK
1048 TRUE Error
1049*/
1050
1051bool convert_join_subqueries_to_semijoins(JOIN *join)
1052{
1053 Query_arena *arena, backup;
1054 Item_in_subselect *in_subq;
1055 THD *thd= join->thd;
1056 DBUG_ENTER("convert_join_subqueries_to_semijoins");
1057
1058 if (join->select_lex->sj_subselects.is_empty())
1059 DBUG_RETURN(FALSE);
1060
1061 List_iterator_fast<Item_in_subselect> li(join->select_lex->sj_subselects);
1062
1063 while ((in_subq= li++))
1064 {
1065 SELECT_LEX *subq_sel= in_subq->get_select_lex();
1066 if (subq_sel->handle_derived(thd->lex, DT_MERGE))
1067 DBUG_RETURN(TRUE);
1068 if (subq_sel->join->transform_in_predicates_into_in_subq(thd))
1069 DBUG_RETURN(TRUE);
1070 subq_sel->update_used_tables();
1071 }
1072
1073 /*
1074 Check all candidates to semi-join conversion that occur
1075 in ON expressions of outer join. Set the flag blocking
1076 this conversion for them.
1077 */
1078 TABLE_LIST *tbl;
1079 List_iterator<TABLE_LIST> ti(join->select_lex->leaf_tables);
1080 while ((tbl= ti++))
1081 {
1082 TABLE_LIST *embedded;
1083 TABLE_LIST *embedding= tbl;
1084 do
1085 {
1086 embedded= embedding;
1087 bool block_conversion_to_sj= false;
1088 if (embedded->on_expr)
1089 {
1090 /*
1091 Conversion of an IN subquery predicate into semi-join
1092 is blocked now if the predicate occurs:
1093 - in the ON expression of an outer join
1094 - in the ON expression of an inner join embedded directly
1095 or indirectly in the inner nest of an outer join
1096 */
1097 for (TABLE_LIST *tl= embedded; tl; tl= tl->embedding)
1098 {
1099 if (tl->outer_join)
1100 {
1101 block_conversion_to_sj= true;
1102 break;
1103 }
1104 }
1105 }
1106 if (block_conversion_to_sj)
1107 {
1108 Item *cond= embedded->on_expr;
1109 if (!cond)
1110 ;
1111 else if (cond->type() != Item::COND_ITEM)
1112 find_and_block_conversion_to_sj(cond, li);
1113 else if (((Item_cond*) cond)->functype() ==
1114 Item_func::COND_AND_FUNC)
1115 {
1116 Item *item;
1117 List_iterator<Item> it(*(((Item_cond*) cond)->argument_list()));
1118 while ((item= it++))
1119 {
1120 find_and_block_conversion_to_sj(item, li);
1121 }
1122 }
1123 }
1124 embedding= embedded->embedding;
1125 }
1126 while (embedding &&
1127 embedding->nested_join->join_list.head() == embedded);
1128 }
1129
1130 /*
1131 Block conversion to semi-joins for those candidates that
1132 are encountered in the WHERE condition of the multi-table view
1133 with CHECK OPTION if this view is used in UPDATE/DELETE.
1134 (This limitation can be, probably, easily lifted.)
1135 */
1136 li.rewind();
1137 while ((in_subq= li++))
1138 {
1139 if (in_subq->emb_on_expr_nest != NO_JOIN_NEST &&
1140 in_subq->emb_on_expr_nest->effective_with_check)
1141 {
1142 in_subq->block_conversion_to_sj();
1143 }
1144 }
1145
1146 if (join->select_options & SELECT_STRAIGHT_JOIN)
1147 {
1148 /* Block conversion to semijoins for all candidates */
1149 li.rewind();
1150 while ((in_subq= li++))
1151 {
1152 in_subq->block_conversion_to_sj();
1153 }
1154 }
1155
1156 li.rewind();
1157 /* First, convert child join's subqueries. We proceed bottom-up here */
1158 while ((in_subq= li++))
1159 {
1160 st_select_lex *child_select= in_subq->get_select_lex();
1161 JOIN *child_join= child_select->join;
1162 child_join->outer_tables = child_join->table_count;
1163
1164 /*
1165 child_select->where contains only the WHERE predicate of the
1166 subquery itself here. We may be selecting from a VIEW, which has its
1167 own predicate. The combined predicates are available in child_join->conds,
1168 which was built by setup_conds() doing prepare_where() for all views.
1169 */
1170 child_select->where= child_join->conds;
1171
1172 if (convert_join_subqueries_to_semijoins(child_join))
1173 DBUG_RETURN(TRUE);
1174
1175
1176 in_subq->sj_convert_priority=
1177 MY_TEST(in_subq->do_not_convert_to_sj) * MAX_TABLES * 2 +
1178 in_subq->is_correlated * MAX_TABLES + child_join->outer_tables;
1179 }
1180
1181 // Temporary measure: disable semi-joins when they are together with outer
1182 // joins.
1183#if 0
1184 if (check_for_outer_joins(join->join_list))
1185 {
1186 in_subq= join->select_lex->sj_subselects.head();
1187 arena= thd->activate_stmt_arena_if_needed(&backup);
1188 goto skip_conversion;
1189 }
1190#endif
1191 //dump_TABLE_LIST_struct(select_lex, select_lex->leaf_tables);
1192 /*
1193 2. Pick which subqueries to convert:
1194 sort the subquery array
1195 - prefer correlated subqueries over uncorrelated;
1196 - prefer subqueries that have greater number of outer tables;
1197 */
1198 bubble_sort<Item_in_subselect>(&join->select_lex->sj_subselects,
1199 subq_sj_candidate_cmp, NULL);
1200 // #tables-in-parent-query + #tables-in-subquery < MAX_TABLES
1201 /* Replace all subqueries to be flattened with Item_int(1) */
1202 arena= thd->activate_stmt_arena_if_needed(&backup);
1203
1204 li.rewind();
1205 while ((in_subq= li++))
1206 {
1207 bool remove_item= TRUE;
1208
1209 /* Stop processing if we've reached a subquery that's attached to the ON clause */
1210 if (in_subq->do_not_convert_to_sj)
1211 break;
1212
1213 if (in_subq->is_flattenable_semijoin)
1214 {
1215 if (join->table_count +
1216 in_subq->unit->first_select()->join->table_count >= MAX_TABLES)
1217 break;
1218 if (convert_subq_to_sj(join, in_subq))
1219 goto restore_arena_and_fail;
1220 }
1221 else
1222 {
1223 if (join->table_count + 1 >= MAX_TABLES)
1224 break;
1225 if (convert_subq_to_jtbm(join, in_subq, &remove_item))
1226 goto restore_arena_and_fail;
1227 }
1228 if (remove_item)
1229 {
1230 Item **tree= (in_subq->emb_on_expr_nest == NO_JOIN_NEST)?
1231 &join->conds : &(in_subq->emb_on_expr_nest->on_expr);
1232 Item *replace_me= in_subq->original_item();
1233 if (replace_where_subcondition(join, tree, replace_me,
1234 new (thd->mem_root) Item_int(thd, 1),
1235 FALSE))
1236 goto restore_arena_and_fail;
1237 }
1238 }
1239//skip_conversion:
1240 /*
1241 3. Finalize (perform IN->EXISTS rewrite) the subqueries that we didn't
1242 convert:
1243 */
1244 while (in_subq)
1245 {
1246 JOIN *child_join= in_subq->unit->first_select()->join;
1247 in_subq->changed= 0;
1248 in_subq->fixed= 0;
1249
1250 SELECT_LEX *save_select_lex= thd->lex->current_select;
1251 thd->lex->current_select= in_subq->unit->first_select();
1252
1253 bool res= in_subq->select_transformer(child_join);
1254
1255 thd->lex->current_select= save_select_lex;
1256
1257 if (res)
1258 DBUG_RETURN(TRUE);
1259
1260 in_subq->changed= 1;
1261 in_subq->fixed= 1;
1262
1263 Item *substitute= in_subq->substitution;
1264 bool do_fix_fields= !in_subq->substitution->fixed;
1265 Item **tree= (in_subq->emb_on_expr_nest == NO_JOIN_NEST)?
1266 &join->conds : &(in_subq->emb_on_expr_nest->on_expr);
1267 Item *replace_me= in_subq->original_item();
1268 if (replace_where_subcondition(join, tree, replace_me, substitute,
1269 do_fix_fields))
1270 DBUG_RETURN(TRUE);
1271 in_subq->substitution= NULL;
1272 /*
1273 If this is a prepared statement, repeat the above operation for
1274 prep_where (or prep_on_expr). Subquery-to-semijoin conversion is
1275 done once for prepared statement.
1276 */
1277 if (!thd->stmt_arena->is_conventional())
1278 {
1279 tree= (in_subq->emb_on_expr_nest == NO_JOIN_NEST)?
1280 &join->select_lex->prep_where :
1281 &(in_subq->emb_on_expr_nest->prep_on_expr);
1282 /*
1283 prep_on_expr/ prep_where may be NULL in some cases.
1284 If that is the case, do nothing - simplify_joins() will copy
1285 ON/WHERE expression into prep_on_expr/prep_where.
1286 */
1287 if (*tree && replace_where_subcondition(join, tree, replace_me, substitute,
1288 FALSE))
1289 DBUG_RETURN(TRUE);
1290 }
1291 /*
1292 Revert to the IN->EXISTS strategy in the rare case when the subquery could
1293 not be flattened.
1294 */
1295 in_subq->reset_strategy(SUBS_IN_TO_EXISTS);
1296 if (is_materialization_applicable(thd, in_subq,
1297 in_subq->unit->first_select()))
1298 {
1299 in_subq->add_strategy(SUBS_MATERIALIZATION);
1300 }
1301
1302 in_subq= li++;
1303 }
1304
1305 if (arena)
1306 thd->restore_active_arena(arena, &backup);
1307 join->select_lex->sj_subselects.empty();
1308 DBUG_RETURN(FALSE);
1309
1310restore_arena_and_fail:
1311 if (arena)
1312 thd->restore_active_arena(arena, &backup);
1313 DBUG_RETURN(TRUE);
1314}
1315
1316
1317/*
1318 Get #output_rows and scan_time estimates for a "delayed" table.
1319
1320 SYNOPSIS
1321 get_delayed_table_estimates()
1322 table IN Table to get estimates for
1323 out_rows OUT E(#rows in the table)
1324 scan_time OUT E(scan_time).
1325 startup_cost OUT cost to populate the table.
1326
1327 DESCRIPTION
1328 Get #output_rows and scan_time estimates for a "delayed" table. By
1329 "delayed" here we mean that the table is filled at the start of query
1330 execution. This means that the optimizer can't use table statistics to
1331 get #rows estimate for it, it has to call this function instead.
1332
1333 This function is expected to make different actions depending on the nature
1334 of the table. At the moment there is only one kind of delayed tables,
1335 non-flattenable semi-joins.
1336*/
1337
1338void get_delayed_table_estimates(TABLE *table,
1339 ha_rows *out_rows,
1340 double *scan_time,
1341 double *startup_cost)
1342{
1343 Item_in_subselect *item= table->pos_in_table_list->jtbm_subselect;
1344
1345 DBUG_ASSERT(item->engine->engine_type() ==
1346 subselect_engine::HASH_SJ_ENGINE);
1347
1348 subselect_hash_sj_engine *hash_sj_engine=
1349 ((subselect_hash_sj_engine*)item->engine);
1350
1351 *out_rows= (ha_rows)item->jtbm_record_count;
1352 *startup_cost= item->jtbm_read_time;
1353
1354 /* Calculate cost of scanning the temptable */
1355 double data_size= item->jtbm_record_count *
1356 hash_sj_engine->tmp_table->s->reclength;
1357 /* Do like in handler::read_time */
1358 *scan_time= data_size/IO_SIZE + 2;
1359}
1360
1361
1362/**
1363 @brief Replaces an expression destructively inside the expression tree of
1364 the WHERE clase.
1365
1366 @note We substitute AND/OR structure because it was copied by
1367 copy_andor_structure and some changes could be done in the copy but
1368 should be left permanent, also there could be several layers of AND over
1369 AND and OR over OR because ::fix_field() possibly is not called.
1370
1371 @param join The top-level query.
1372 @param old_cond The expression to be replaced.
1373 @param new_cond The expression to be substituted.
1374 @param do_fix_fields If true, Item::fix_fields(THD*, Item**) is called for
1375 the new expression.
1376 @return <code>true</code> if there was an error, <code>false</code> if
1377 successful.
1378*/
1379
1380static bool replace_where_subcondition(JOIN *join, Item **expr,
1381 Item *old_cond, Item *new_cond,
1382 bool do_fix_fields)
1383{
1384 if (*expr == old_cond)
1385 {
1386 *expr= new_cond;
1387 if (do_fix_fields)
1388 new_cond->fix_fields(join->thd, expr);
1389 return FALSE;
1390 }
1391
1392 if ((*expr)->type() == Item::COND_ITEM)
1393 {
1394 List_iterator<Item> li(*((Item_cond*)(*expr))->argument_list());
1395 Item *item;
1396 while ((item= li++))
1397 {
1398 if (item == old_cond)
1399 {
1400 li.replace(new_cond);
1401 if (do_fix_fields)
1402 new_cond->fix_fields(join->thd, li.ref());
1403 return FALSE;
1404 }
1405 else if (item->type() == Item::COND_ITEM)
1406 {
1407 replace_where_subcondition(join, li.ref(),
1408 old_cond, new_cond,
1409 do_fix_fields);
1410 }
1411 }
1412 }
1413 /*
1414 We can come to here when
1415 - we're doing replace operations on both on_expr and prep_on_expr
1416 - on_expr is the same as prep_on_expr, or they share a sub-tree
1417 (so, when we do replace in on_expr, we replace in prep_on_expr, too,
1418 and when we try doing a replace in prep_on_expr, the item we wanted
1419 to replace there has already been replaced)
1420 */
1421 return FALSE;
1422}
1423
1424static int subq_sj_candidate_cmp(Item_in_subselect* el1, Item_in_subselect* el2,
1425 void *arg)
1426{
1427 return (el1->sj_convert_priority > el2->sj_convert_priority) ? -1 :
1428 ( (el1->sj_convert_priority == el2->sj_convert_priority)? 0 : 1);
1429}
1430
1431
1432/*
1433 Convert a subquery predicate into a TABLE_LIST semi-join nest
1434
1435 SYNOPSIS
1436 convert_subq_to_sj()
1437 parent_join Parent join, the one that has subq_pred in its WHERE/ON
1438 clause
1439 subq_pred Subquery predicate to be converted
1440
1441 DESCRIPTION
1442 Convert a subquery predicate into a TABLE_LIST semi-join nest. All the
1443 prerequisites are already checked, so the conversion is always successfull.
1444
1445 Prepared Statements: the transformation is permanent:
1446 - Changes in TABLE_LIST structures are naturally permanent
1447 - Item tree changes are performed on statement MEM_ROOT:
1448 = we activate statement MEM_ROOT
1449 = this function is called before the first fix_prepare_information
1450 call.
1451
1452 This is intended because the criteria for subquery-to-sj conversion remain
1453 constant for the lifetime of the Prepared Statement.
1454
1455 RETURN
1456 FALSE OK
1457 TRUE Out of memory error
1458*/
1459
1460static bool convert_subq_to_sj(JOIN *parent_join, Item_in_subselect *subq_pred)
1461{
1462 SELECT_LEX *parent_lex= parent_join->select_lex;
1463 TABLE_LIST *emb_tbl_nest= NULL;
1464 List<TABLE_LIST> *emb_join_list= &parent_lex->top_join_list;
1465 THD *thd= parent_join->thd;
1466 DBUG_ENTER("convert_subq_to_sj");
1467
1468 /*
1469 1. Find out where to put the predicate into.
1470 Note: for "t1 LEFT JOIN t2" this will be t2, a leaf.
1471 */
1472 if ((void*)subq_pred->emb_on_expr_nest != (void*)NO_JOIN_NEST)
1473 {
1474 if (subq_pred->emb_on_expr_nest->nested_join)
1475 {
1476 /*
1477 We're dealing with
1478
1479 ... [LEFT] JOIN ( ... ) ON (subquery AND whatever) ...
1480
1481 The sj-nest will be inserted into the brackets nest.
1482 */
1483 emb_tbl_nest= subq_pred->emb_on_expr_nest;
1484 emb_join_list= &emb_tbl_nest->nested_join->join_list;
1485 }
1486 else if (!subq_pred->emb_on_expr_nest->outer_join)
1487 {
1488 /*
1489 We're dealing with
1490
1491 ... INNER JOIN tblX ON (subquery AND whatever) ...
1492
1493 The sj-nest will be tblX's "sibling", i.e. another child of its
1494 parent. This is ok because tblX is joined as an inner join.
1495 */
1496 emb_tbl_nest= subq_pred->emb_on_expr_nest->embedding;
1497 if (emb_tbl_nest)
1498 emb_join_list= &emb_tbl_nest->nested_join->join_list;
1499 }
1500 else if (!subq_pred->emb_on_expr_nest->nested_join)
1501 {
1502 TABLE_LIST *outer_tbl= subq_pred->emb_on_expr_nest;
1503 TABLE_LIST *wrap_nest;
1504 LEX_CSTRING sj_wrap_name= { STRING_WITH_LEN("(sj-wrap)") };
1505 /*
1506 We're dealing with
1507
1508 ... LEFT JOIN tbl ON (on_expr AND subq_pred) ...
1509
1510 we'll need to convert it into:
1511
1512 ... LEFT JOIN ( tbl SJ (subq_tables) ) ON (on_expr AND subq_pred) ...
1513 | |
1514 |<----- wrap_nest ---->|
1515
1516 Q: other subqueries may be pointing to this element. What to do?
1517 A1: simple solution: copy *subq_pred->expr_join_nest= *parent_nest.
1518 But we'll need to fix other pointers.
1519 A2: Another way: have TABLE_LIST::next_ptr so the following
1520 subqueries know the table has been nested.
1521 A3: changes in the TABLE_LIST::outer_join will make everything work
1522 automatically.
1523 */
1524 if (!(wrap_nest= alloc_join_nest(thd)))
1525 {
1526 DBUG_RETURN(TRUE);
1527 }
1528 wrap_nest->embedding= outer_tbl->embedding;
1529 wrap_nest->join_list= outer_tbl->join_list;
1530 wrap_nest->alias= sj_wrap_name;
1531
1532 wrap_nest->nested_join->join_list.empty();
1533 wrap_nest->nested_join->join_list.push_back(outer_tbl, thd->mem_root);
1534
1535 outer_tbl->embedding= wrap_nest;
1536 outer_tbl->join_list= &wrap_nest->nested_join->join_list;
1537
1538 /*
1539 wrap_nest will take place of outer_tbl, so move the outer join flag
1540 and on_expr
1541 */
1542 wrap_nest->outer_join= outer_tbl->outer_join;
1543 outer_tbl->outer_join= 0;
1544
1545 wrap_nest->on_expr= outer_tbl->on_expr;
1546 outer_tbl->on_expr= NULL;
1547
1548 List_iterator<TABLE_LIST> li(*wrap_nest->join_list);
1549 TABLE_LIST *tbl;
1550 while ((tbl= li++))
1551 {
1552 if (tbl == outer_tbl)
1553 {
1554 li.replace(wrap_nest);
1555 break;
1556 }
1557 }
1558 /*
1559 Ok now wrap_nest 'contains' outer_tbl and we're ready to add the
1560 semi-join nest into it
1561 */
1562 emb_join_list= &wrap_nest->nested_join->join_list;
1563 emb_tbl_nest= wrap_nest;
1564 }
1565 }
1566
1567 TABLE_LIST *sj_nest;
1568 NESTED_JOIN *nested_join;
1569 LEX_CSTRING sj_nest_name= { STRING_WITH_LEN("(sj-nest)") };
1570 if (!(sj_nest= alloc_join_nest(thd)))
1571 {
1572 DBUG_RETURN(TRUE);
1573 }
1574 nested_join= sj_nest->nested_join;
1575
1576 sj_nest->join_list= emb_join_list;
1577 sj_nest->embedding= emb_tbl_nest;
1578 sj_nest->alias= sj_nest_name;
1579 sj_nest->sj_subq_pred= subq_pred;
1580 sj_nest->original_subq_pred_used_tables= subq_pred->used_tables() |
1581 subq_pred->left_expr->used_tables();
1582 /* Nests do not participate in those 'chains', so: */
1583 /* sj_nest->next_leaf= sj_nest->next_local= sj_nest->next_global == NULL*/
1584 emb_join_list->push_back(sj_nest, thd->mem_root);
1585
1586 /*
1587 nested_join->used_tables and nested_join->not_null_tables are
1588 initialized in simplify_joins().
1589 */
1590
1591 /*
1592 2. Walk through subquery's top list and set 'embedding' to point to the
1593 sj-nest.
1594 */
1595 st_select_lex *subq_lex= subq_pred->unit->first_select();
1596 DBUG_ASSERT(subq_lex->next_select() == NULL);
1597 nested_join->join_list.empty();
1598 List_iterator_fast<TABLE_LIST> li(subq_lex->top_join_list);
1599 TABLE_LIST *tl;
1600 while ((tl= li++))
1601 {
1602 tl->embedding= sj_nest;
1603 tl->join_list= &nested_join->join_list;
1604 nested_join->join_list.push_back(tl, thd->mem_root);
1605 }
1606
1607 /*
1608 Reconnect the next_leaf chain.
1609 TODO: Do we have to put subquery's tables at the end of the chain?
1610 Inserting them at the beginning would be a bit faster.
1611 NOTE: We actually insert them at the front! That's because the order is
1612 reversed in this list.
1613 */
1614 parent_lex->leaf_tables.append(&subq_lex->leaf_tables);
1615
1616 if (subq_lex->options & OPTION_SCHEMA_TABLE)
1617 parent_lex->options |= OPTION_SCHEMA_TABLE;
1618
1619 /*
1620 Same as above for next_local chain
1621 (a theory: a next_local chain always starts with ::leaf_tables
1622 because view's tables are inserted after the view)
1623 */
1624
1625 for (tl= (TABLE_LIST*)(parent_lex->table_list.first); tl->next_local; tl= tl->next_local)
1626 {}
1627
1628 tl->next_local= subq_lex->join->tables_list;
1629
1630 /* A theory: no need to re-connect the next_global chain */
1631
1632 /* 3. Remove the original subquery predicate from the WHERE/ON */
1633
1634 // The subqueries were replaced for Item_int(1) earlier
1635 subq_pred->reset_strategy(SUBS_SEMI_JOIN); // for subsequent executions
1636 /*TODO: also reset the 'm_with_subquery' there. */
1637
1638 /* n. Adjust the parent_join->table_count counter */
1639 uint table_no= parent_join->table_count;
1640 /* n. Walk through child's tables and adjust table->map */
1641 List_iterator_fast<TABLE_LIST> si(subq_lex->leaf_tables);
1642 while ((tl= si++))
1643 {
1644 tl->set_tablenr(table_no);
1645 if (tl->is_jtbm())
1646 {
1647 tl->jtbm_table_no= table_no;
1648 Item *dummy= tl->jtbm_subselect;
1649 tl->jtbm_subselect->fix_after_pullout(parent_lex, &dummy, true);
1650 DBUG_ASSERT(dummy == tl->jtbm_subselect);
1651 }
1652 SELECT_LEX *old_sl= tl->select_lex;
1653 tl->select_lex= parent_join->select_lex;
1654 for (TABLE_LIST *emb= tl->embedding;
1655 emb && emb->select_lex == old_sl;
1656 emb= emb->embedding)
1657 emb->select_lex= parent_join->select_lex;
1658 table_no++;
1659 }
1660 parent_join->table_count += subq_lex->join->table_count;
1661 //parent_join->table_count += subq_lex->leaf_tables.elements;
1662
1663 /*
1664 Put the subquery's WHERE into semi-join's sj_on_expr
1665 Add the subquery-induced equalities too.
1666 */
1667 SELECT_LEX *save_lex= thd->lex->current_select;
1668 thd->lex->current_select=subq_lex;
1669 if (!subq_pred->left_expr->fixed &&
1670 subq_pred->left_expr->fix_fields(thd, &subq_pred->left_expr))
1671 DBUG_RETURN(TRUE);
1672 thd->lex->current_select=save_lex;
1673
1674 table_map subq_pred_used_tables= subq_pred->used_tables();
1675 sj_nest->nested_join->sj_corr_tables= subq_pred_used_tables;
1676 sj_nest->nested_join->sj_depends_on= subq_pred_used_tables |
1677 subq_pred->left_expr->used_tables();
1678 sj_nest->sj_on_expr= subq_lex->join->conds;
1679
1680 /*
1681 Create the IN-equalities and inject them into semi-join's ON expression.
1682 Additionally, for LooseScan strategy
1683 - Record the number of IN-equalities.
1684 - Create list of pointers to (oe1, ..., ieN). We'll need the list to
1685 see which of the expressions are bound and which are not (for those
1686 we'll produce a distinct stream of (ie_i1,...ie_ik).
1687
1688 (TODO: can we just create a list of pointers and hope the expressions
1689 will not substitute themselves on fix_fields()? or we need to wrap
1690 them into Item_direct_view_refs and store pointers to those. The
1691 pointers to Item_direct_view_refs are guaranteed to be stable as
1692 Item_direct_view_refs doesn't substitute itself with anything in
1693 Item_direct_view_ref::fix_fields.
1694 */
1695 sj_nest->sj_in_exprs= subq_pred->left_expr->cols();
1696 sj_nest->nested_join->sj_outer_expr_list.empty();
1697
1698 if (subq_pred->left_expr->cols() == 1)
1699 {
1700 /* add left = select_list_element */
1701 nested_join->sj_outer_expr_list.push_back(&subq_pred->left_expr,
1702 thd->mem_root);
1703 /*
1704 Create Item_func_eq. Note that
1705 1. this is done on the statement, not execution, arena
1706 2. if it's a PS then this happens only once - on the first execution.
1707 On following re-executions, the item will be fix_field-ed normally.
1708 3. Thus it should be created as if it was fix_field'ed, in particular
1709 all pointers to items in the execution arena should be protected
1710 with thd->change_item_tree
1711 */
1712 Item_func_eq *item_eq=
1713 new (thd->mem_root) Item_func_eq(thd, subq_pred->left_expr_orig,
1714 subq_lex->ref_pointer_array[0]);
1715 if (!item_eq)
1716 DBUG_RETURN(TRUE);
1717 if (subq_pred->left_expr_orig != subq_pred->left_expr)
1718 thd->change_item_tree(item_eq->arguments(), subq_pred->left_expr);
1719 item_eq->in_equality_no= 0;
1720 sj_nest->sj_on_expr= and_items(thd, sj_nest->sj_on_expr, item_eq);
1721 }
1722 else if (subq_pred->left_expr->type() == Item::ROW_ITEM)
1723 {
1724 /*
1725 disassemple left expression and add
1726 left1 = select_list_element1 and left2 = select_list_element2 ...
1727 */
1728 for (uint i= 0; i < subq_pred->left_expr->cols(); i++)
1729 {
1730 nested_join->sj_outer_expr_list.push_back(subq_pred->left_expr->addr(i),
1731 thd->mem_root);
1732 Item_func_eq *item_eq=
1733 new (thd->mem_root)
1734 Item_func_eq(thd, subq_pred->left_expr_orig->element_index(i),
1735 subq_lex->ref_pointer_array[i]);
1736 if (!item_eq)
1737 DBUG_RETURN(TRUE);
1738 DBUG_ASSERT(subq_pred->left_expr->element_index(i)->fixed);
1739 if (subq_pred->left_expr_orig->element_index(i) !=
1740 subq_pred->left_expr->element_index(i))
1741 thd->change_item_tree(item_eq->arguments(),
1742 subq_pred->left_expr->element_index(i));
1743 item_eq->in_equality_no= i;
1744 sj_nest->sj_on_expr= and_items(thd, sj_nest->sj_on_expr, item_eq);
1745 }
1746 }
1747 else
1748 {
1749 /*
1750 add row operation
1751 left = (select_list_element1, select_list_element2, ...)
1752 */
1753 Item_row *row= new (thd->mem_root) Item_row(thd, subq_lex->pre_fix);
1754 /* fix fields on subquery was call so they should be the same */
1755 if (!row)
1756 DBUG_RETURN(TRUE);
1757 DBUG_ASSERT(subq_pred->left_expr->cols() == row->cols());
1758 nested_join->sj_outer_expr_list.push_back(&subq_pred->left_expr);
1759 Item_func_eq *item_eq=
1760 new (thd->mem_root) Item_func_eq(thd, subq_pred->left_expr_orig, row);
1761 if (!item_eq)
1762 DBUG_RETURN(TRUE);
1763 for (uint i= 0; i < row->cols(); i++)
1764 {
1765 if (row->element_index(i) != subq_lex->ref_pointer_array[i])
1766 thd->change_item_tree(row->addr(i), subq_lex->ref_pointer_array[i]);
1767 }
1768 item_eq->in_equality_no= 0;
1769 sj_nest->sj_on_expr= and_items(thd, sj_nest->sj_on_expr, item_eq);
1770 }
1771 /*
1772 Fix the created equality and AND
1773
1774 Note that fix_fields() can actually fail in a meaningful way here. One
1775 example is when the IN-equality is not valid, because it compares columns
1776 with incompatible collations. (One can argue it would be more appropriate
1777 to check for this at name resolution stage, but as a legacy of IN->EXISTS
1778 we have in here).
1779 */
1780 if (!sj_nest->sj_on_expr->fixed &&
1781 sj_nest->sj_on_expr->fix_fields(thd, &sj_nest->sj_on_expr))
1782 {
1783 DBUG_RETURN(TRUE);
1784 }
1785
1786 /*
1787 Walk through sj nest's WHERE and ON expressions and call
1788 item->fix_table_changes() for all items.
1789 */
1790 sj_nest->sj_on_expr->fix_after_pullout(parent_lex, &sj_nest->sj_on_expr,
1791 TRUE);
1792 fix_list_after_tbl_changes(parent_lex, &sj_nest->nested_join->join_list);
1793
1794
1795 /* Unlink the child select_lex so it doesn't show up in EXPLAIN: */
1796 subq_lex->master_unit()->exclude_level();
1797
1798 DBUG_EXECUTE("where",
1799 print_where(sj_nest->sj_on_expr,"SJ-EXPR", QT_ORDINARY););
1800
1801 /* Inject sj_on_expr into the parent's WHERE or ON */
1802 if (emb_tbl_nest)
1803 {
1804 emb_tbl_nest->on_expr= and_items(thd, emb_tbl_nest->on_expr,
1805 sj_nest->sj_on_expr);
1806 emb_tbl_nest->on_expr->top_level_item();
1807 if (!emb_tbl_nest->on_expr->fixed &&
1808 emb_tbl_nest->on_expr->fix_fields(thd,
1809 &emb_tbl_nest->on_expr))
1810 {
1811 DBUG_RETURN(TRUE);
1812 }
1813 }
1814 else
1815 {
1816 /* Inject into the WHERE */
1817 parent_join->conds= and_items(thd, parent_join->conds, sj_nest->sj_on_expr);
1818 parent_join->conds->top_level_item();
1819 /*
1820 fix_fields must update the properties (e.g. st_select_lex::cond_count of
1821 the correct select_lex.
1822 */
1823 save_lex= thd->lex->current_select;
1824 thd->lex->current_select=parent_join->select_lex;
1825 if (!parent_join->conds->fixed &&
1826 parent_join->conds->fix_fields(thd,
1827 &parent_join->conds))
1828 {
1829 DBUG_RETURN(1);
1830 }
1831 thd->lex->current_select=save_lex;
1832 parent_join->select_lex->where= parent_join->conds;
1833 }
1834
1835 if (subq_lex->ftfunc_list->elements)
1836 {
1837 Item_func_match *ifm;
1838 List_iterator_fast<Item_func_match> li(*(subq_lex->ftfunc_list));
1839 while ((ifm= li++))
1840 parent_lex->ftfunc_list->push_front(ifm, thd->mem_root);
1841 }
1842
1843 parent_lex->have_merged_subqueries= TRUE;
1844 /* Fatal error may have been set to by fix_after_pullout() */
1845 DBUG_RETURN(thd->is_fatal_error);
1846}
1847
1848
1849const int SUBQERY_TEMPTABLE_NAME_MAX_LEN= 20;
1850
1851static void create_subquery_temptable_name(LEX_STRING *str, uint number)
1852{
1853 char *to= str->str;
1854 DBUG_ASSERT(number < 10000);
1855 to= strmov(to, "<subquery");
1856 to= int10_to_str((int) number, to, 10);
1857 to[0]= '>';
1858 to[1]= 0;
1859 str->length= (size_t) (to - str->str)+1;
1860}
1861
1862
1863/*
1864 Convert subquery predicate into non-mergeable semi-join nest.
1865
1866 TODO:
1867 why does this do IN-EXISTS conversion? Can't we unify it with mergeable
1868 semi-joins? currently, convert_subq_to_sj() cannot fail to convert (unless
1869 fatal errors)
1870
1871
1872 RETURN
1873 FALSE - Ok
1874 TRUE - Fatal error
1875*/
1876
1877static bool convert_subq_to_jtbm(JOIN *parent_join,
1878 Item_in_subselect *subq_pred,
1879 bool *remove_item)
1880{
1881 SELECT_LEX *parent_lex= parent_join->select_lex;
1882 List<TABLE_LIST> *emb_join_list= &parent_lex->top_join_list;
1883 TABLE_LIST *emb_tbl_nest= NULL; // will change when we learn to handle outer joins
1884 TABLE_LIST *tl;
1885 bool optimization_delayed= TRUE;
1886 TABLE_LIST *jtbm;
1887 LEX_STRING tbl_alias;
1888 THD *thd= parent_join->thd;
1889 DBUG_ENTER("convert_subq_to_jtbm");
1890
1891 subq_pred->set_strategy(SUBS_MATERIALIZATION);
1892 subq_pred->is_jtbm_merged= TRUE;
1893
1894 *remove_item= TRUE;
1895
1896 if (!(tbl_alias.str= (char*)thd->calloc(SUBQERY_TEMPTABLE_NAME_MAX_LEN)) ||
1897 !(jtbm= alloc_join_nest(thd))) //todo: this is not a join nest!
1898 {
1899 DBUG_RETURN(TRUE);
1900 }
1901
1902 jtbm->join_list= emb_join_list;
1903 jtbm->embedding= emb_tbl_nest;
1904 jtbm->jtbm_subselect= subq_pred;
1905 jtbm->nested_join= NULL;
1906
1907 /* Nests do not participate in those 'chains', so: */
1908 /* jtbm->next_leaf= jtbm->next_local= jtbm->next_global == NULL*/
1909 emb_join_list->push_back(jtbm, thd->mem_root);
1910
1911 /*
1912 Inject the jtbm table into TABLE_LIST::next_leaf list, so that
1913 make_join_statistics() and co. can find it.
1914 */
1915 parent_lex->leaf_tables.push_back(jtbm, thd->mem_root);
1916
1917 if (subq_pred->unit->first_select()->options & OPTION_SCHEMA_TABLE)
1918 parent_lex->options |= OPTION_SCHEMA_TABLE;
1919
1920 /*
1921 Same as above for TABLE_LIST::next_local chain
1922 (a theory: a next_local chain always starts with ::leaf_tables
1923 because view's tables are inserted after the view)
1924 */
1925 for (tl= (TABLE_LIST*)(parent_lex->table_list.first); tl->next_local; tl= tl->next_local)
1926 {}
1927 tl->next_local= jtbm;
1928
1929 /* A theory: no need to re-connect the next_global chain */
1930 if (optimization_delayed)
1931 {
1932 DBUG_ASSERT(parent_join->table_count < MAX_TABLES);
1933
1934 jtbm->jtbm_table_no= parent_join->table_count;
1935
1936 create_subquery_temptable_name(&tbl_alias,
1937 subq_pred->unit->first_select()->select_number);
1938 jtbm->alias.str= tbl_alias.str;
1939 jtbm->alias.length= tbl_alias.length;
1940 parent_join->table_count++;
1941 DBUG_RETURN(thd->is_fatal_error);
1942 }
1943 subselect_hash_sj_engine *hash_sj_engine=
1944 ((subselect_hash_sj_engine*)subq_pred->engine);
1945 jtbm->table= hash_sj_engine->tmp_table;
1946
1947 jtbm->table->tablenr= parent_join->table_count;
1948 jtbm->table->map= table_map(1) << (parent_join->table_count);
1949 jtbm->jtbm_table_no= jtbm->table->tablenr;
1950
1951 parent_join->table_count++;
1952 DBUG_ASSERT(parent_join->table_count < MAX_TABLES);
1953
1954 Item *conds= hash_sj_engine->semi_join_conds;
1955 conds->fix_after_pullout(parent_lex, &conds, TRUE);
1956
1957 DBUG_EXECUTE("where", print_where(conds,"SJ-EXPR", QT_ORDINARY););
1958
1959 create_subquery_temptable_name(&tbl_alias, hash_sj_engine->materialize_join->
1960 select_lex->select_number);
1961 jtbm->alias.str= tbl_alias.str;
1962 jtbm->alias.length= tbl_alias.length;
1963
1964 parent_lex->have_merged_subqueries= TRUE;
1965
1966 /* Don't unlink the child subselect, as the subquery will be used. */
1967
1968 DBUG_RETURN(thd->is_fatal_error);
1969}
1970
1971
1972static TABLE_LIST *alloc_join_nest(THD *thd)
1973{
1974 TABLE_LIST *tbl;
1975 if (!(tbl= (TABLE_LIST*) thd->calloc(ALIGN_SIZE(sizeof(TABLE_LIST))+
1976 sizeof(NESTED_JOIN))))
1977 return NULL;
1978 tbl->nested_join= (NESTED_JOIN*) ((uchar*)tbl +
1979 ALIGN_SIZE(sizeof(TABLE_LIST)));
1980 return tbl;
1981}
1982
1983/*
1984 @Note thd->is_fatal_error can be set in case of OOM
1985*/
1986
1987void fix_list_after_tbl_changes(SELECT_LEX *new_parent, List<TABLE_LIST> *tlist)
1988{
1989 List_iterator<TABLE_LIST> it(*tlist);
1990 TABLE_LIST *table;
1991 while ((table= it++))
1992 {
1993 if (table->on_expr)
1994 table->on_expr->fix_after_pullout(new_parent, &table->on_expr, TRUE);
1995 if (table->nested_join)
1996 fix_list_after_tbl_changes(new_parent, &table->nested_join->join_list);
1997 }
1998}
1999
2000
2001static void set_emb_join_nest(List<TABLE_LIST> *tables, TABLE_LIST *emb_sj_nest)
2002{
2003 List_iterator<TABLE_LIST> it(*tables);
2004 TABLE_LIST *tbl;
2005 while ((tbl= it++))
2006 {
2007 /*
2008 Note: check for nested_join first.
2009 derived-merged tables have tbl->table!=NULL &&
2010 tbl->table->reginfo==NULL.
2011 */
2012 if (tbl->nested_join)
2013 set_emb_join_nest(&tbl->nested_join->join_list, emb_sj_nest);
2014 else if (tbl->table)
2015 tbl->table->reginfo.join_tab->emb_sj_nest= emb_sj_nest;
2016
2017 }
2018}
2019
2020/*
2021 Pull tables out of semi-join nests, if possible
2022
2023 SYNOPSIS
2024 pull_out_semijoin_tables()
2025 join The join where to do the semi-join flattening
2026
2027 DESCRIPTION
2028 Try to pull tables out of semi-join nests.
2029
2030 PRECONDITIONS
2031 When this function is called, the join may have several semi-join nests
2032 but it is guaranteed that one semi-join nest does not contain another.
2033
2034 ACTION
2035 A table can be pulled out of the semi-join nest if
2036 - It is a constant table, or
2037 - It is accessed via eq_ref(outer_tables)
2038
2039 POSTCONDITIONS
2040 * Tables that were pulled out have JOIN_TAB::emb_sj_nest == NULL
2041 * Tables that were not pulled out have JOIN_TAB::emb_sj_nest pointing
2042 to semi-join nest they are in.
2043 * Semi-join nests' TABLE_LIST::sj_inner_tables is updated accordingly
2044
2045 This operation is (and should be) performed at each PS execution since
2046 tables may become/cease to be constant across PS reexecutions.
2047
2048 NOTE
2049 Table pullout may make uncorrelated subquery correlated. Consider this
2050 example:
2051
2052 ... WHERE oe IN (SELECT it1.primary_key WHERE p(it1, it2) ... )
2053
2054 here table it1 can be pulled out (we have it1.primary_key=oe which gives
2055 us functional dependency). Once it1 is pulled out, all references to it1
2056 from p(it1, it2) become references to outside of the subquery and thus
2057 make the subquery (i.e. its semi-join nest) correlated.
2058 Making the subquery (i.e. its semi-join nest) correlated prevents us from
2059 using Materialization or LooseScan to execute it.
2060
2061 RETURN
2062 0 - OK
2063 1 - Out of memory error
2064*/
2065
2066int pull_out_semijoin_tables(JOIN *join)
2067{
2068 TABLE_LIST *sj_nest;
2069 DBUG_ENTER("pull_out_semijoin_tables");
2070 List_iterator<TABLE_LIST> sj_list_it(join->select_lex->sj_nests);
2071
2072 /* Try pulling out of the each of the semi-joins */
2073 while ((sj_nest= sj_list_it++))
2074 {
2075 List_iterator<TABLE_LIST> child_li(sj_nest->nested_join->join_list);
2076 TABLE_LIST *tbl;
2077
2078 /*
2079 Don't do table pull-out for nested joins (if we get nested joins here, it
2080 means these are outer joins. It is theoretically possible to do pull-out
2081 for some of the outer tables but we dont support this currently.
2082 */
2083 bool have_join_nest_children= FALSE;
2084
2085 set_emb_join_nest(&sj_nest->nested_join->join_list, sj_nest);
2086
2087 while ((tbl= child_li++))
2088 {
2089 if (tbl->nested_join)
2090 {
2091 have_join_nest_children= TRUE;
2092 break;
2093 }
2094 }
2095
2096 table_map pulled_tables= 0;
2097 table_map dep_tables= 0;
2098 if (have_join_nest_children)
2099 goto skip;
2100
2101 /*
2102 Calculate set of tables within this semi-join nest that have
2103 other dependent tables
2104 */
2105 child_li.rewind();
2106 while ((tbl= child_li++))
2107 {
2108 TABLE *const table= tbl->table;
2109 if (table &&
2110 (table->reginfo.join_tab->dependent &
2111 sj_nest->nested_join->used_tables))
2112 dep_tables|= table->reginfo.join_tab->dependent;
2113 }
2114
2115 /* Action #1: Mark the constant tables to be pulled out */
2116 child_li.rewind();
2117 while ((tbl= child_li++))
2118 {
2119 if (tbl->table)
2120 {
2121 tbl->table->reginfo.join_tab->emb_sj_nest= sj_nest;
2122#if 0
2123 /*
2124 Do not pull out tables because they are constant. This operation has
2125 a problem:
2126 - Some constant tables may become/cease to be constant across PS
2127 re-executions
2128 - Contrary to our initial assumption, it turned out that table pullout
2129 operation is not easily undoable.
2130
2131 The solution is to leave constant tables where they are. This will
2132 affect only constant tables that are 1-row or empty, tables that are
2133 constant because they are accessed via eq_ref(const) access will
2134 still be pulled out as functionally-dependent.
2135
2136 This will cause us to miss the chance to flatten some of the
2137 subqueries, but since const tables do not generate many duplicates,
2138 it really doesn't matter that much whether they were pulled out or
2139 not.
2140
2141 All of this was done as fix for BUG#43768.
2142 */
2143 if (tbl->table->map & join->const_table_map)
2144 {
2145 pulled_tables |= tbl->table->map;
2146 DBUG_PRINT("info", ("Table %s pulled out (reason: constant)",
2147 tbl->table->alias));
2148 }
2149#endif
2150 }
2151 }
2152
2153 /*
2154 Action #2: Find which tables we can pull out based on
2155 update_ref_and_keys() data. Note that pulling one table out can allow
2156 us to pull out some other tables too.
2157 */
2158 bool pulled_a_table;
2159 do
2160 {
2161 pulled_a_table= FALSE;
2162 child_li.rewind();
2163 while ((tbl= child_li++))
2164 {
2165 if (tbl->table && !(pulled_tables & tbl->table->map) &&
2166 !(dep_tables & tbl->table->map))
2167 {
2168 if (find_eq_ref_candidate(tbl->table,
2169 sj_nest->nested_join->used_tables &
2170 ~pulled_tables))
2171 {
2172 pulled_a_table= TRUE;
2173 pulled_tables |= tbl->table->map;
2174 DBUG_PRINT("info", ("Table %s pulled out (reason: func dep)",
2175 tbl->table->alias.c_ptr()));
2176 /*
2177 Pulling a table out of uncorrelated subquery in general makes
2178 makes it correlated. See the NOTE to this funtion.
2179 */
2180 sj_nest->sj_subq_pred->is_correlated= TRUE;
2181 sj_nest->nested_join->sj_corr_tables|= tbl->table->map;
2182 sj_nest->nested_join->sj_depends_on|= tbl->table->map;
2183 }
2184 }
2185 }
2186 } while (pulled_a_table);
2187
2188 child_li.rewind();
2189 skip:
2190 /*
2191 Action #3: Move the pulled out TABLE_LIST elements to the parents.
2192 */
2193 table_map inner_tables= sj_nest->nested_join->used_tables &
2194 ~pulled_tables;
2195 /* Record the bitmap of inner tables */
2196 sj_nest->sj_inner_tables= inner_tables;
2197 if (pulled_tables)
2198 {
2199 List<TABLE_LIST> *upper_join_list= (sj_nest->embedding != NULL)?
2200 (&sj_nest->embedding->nested_join->join_list):
2201 (&join->select_lex->top_join_list);
2202 Query_arena *arena, backup;
2203 arena= join->thd->activate_stmt_arena_if_needed(&backup);
2204 while ((tbl= child_li++))
2205 {
2206 if (tbl->table)
2207 {
2208 if (inner_tables & tbl->table->map)
2209 {
2210 /* This table is not pulled out */
2211 tbl->table->reginfo.join_tab->emb_sj_nest= sj_nest;
2212 }
2213 else
2214 {
2215 /* This table has been pulled out of the semi-join nest */
2216 tbl->table->reginfo.join_tab->emb_sj_nest= NULL;
2217 /*
2218 Pull the table up in the same way as simplify_joins() does:
2219 update join_list and embedding pointers but keep next[_local]
2220 pointers.
2221 */
2222 child_li.remove();
2223 sj_nest->nested_join->used_tables &= ~tbl->table->map;
2224 upper_join_list->push_back(tbl, join->thd->mem_root);
2225 tbl->join_list= upper_join_list;
2226 tbl->embedding= sj_nest->embedding;
2227 }
2228 }
2229 }
2230
2231 /* Remove the sj-nest itself if we've removed everything from it */
2232 if (!inner_tables)
2233 {
2234 List_iterator<TABLE_LIST> li(*upper_join_list);
2235 /* Find the sj_nest in the list. */
2236 while (sj_nest != li++) ;
2237 li.remove();
2238 /* Also remove it from the list of SJ-nests: */
2239 sj_list_it.remove();
2240 }
2241
2242 if (arena)
2243 join->thd->restore_active_arena(arena, &backup);
2244 }
2245 }
2246 DBUG_RETURN(0);
2247}
2248
2249
2250/*
2251 Optimize semi-join nests that could be run with sj-materialization
2252
2253 SYNOPSIS
2254 optimize_semijoin_nests()
2255 join The join to optimize semi-join nests for
2256 all_table_map Bitmap of all tables in the join
2257
2258 DESCRIPTION
2259 Optimize each of the semi-join nests that can be run with
2260 materialization. For each of the nests, we
2261 - Generate the best join order for this "sub-join" and remember it;
2262 - Remember the sub-join execution cost (it's part of materialization
2263 cost);
2264 - Calculate other costs that will be incurred if we decide
2265 to use materialization strategy for this semi-join nest.
2266
2267 All obtained information is saved and will be used by the main join
2268 optimization pass.
2269
2270 NOTES
2271 Because of Join::reoptimize(), this function may be called multiple times.
2272
2273 RETURN
2274 FALSE Ok
2275 TRUE Out of memory error
2276*/
2277
2278bool optimize_semijoin_nests(JOIN *join, table_map all_table_map)
2279{
2280 DBUG_ENTER("optimize_semijoin_nests");
2281 List_iterator<TABLE_LIST> sj_list_it(join->select_lex->sj_nests);
2282 TABLE_LIST *sj_nest;
2283 while ((sj_nest= sj_list_it++))
2284 {
2285 /* semi-join nests with only constant tables are not valid */
2286 /// DBUG_ASSERT(sj_nest->sj_inner_tables & ~join->const_table_map);
2287
2288 sj_nest->sj_mat_info= NULL;
2289 /*
2290 The statement may have been executed with 'semijoin=on' earlier.
2291 We need to verify that 'semijoin=on' still holds.
2292 */
2293 if (optimizer_flag(join->thd, OPTIMIZER_SWITCH_SEMIJOIN) &&
2294 optimizer_flag(join->thd, OPTIMIZER_SWITCH_MATERIALIZATION))
2295 {
2296 if ((sj_nest->sj_inner_tables & ~join->const_table_map) && /* not everything was pulled out */
2297 !sj_nest->sj_subq_pred->is_correlated &&
2298 sj_nest->sj_subq_pred->types_allow_materialization)
2299 {
2300 join->emb_sjm_nest= sj_nest;
2301 if (choose_plan(join, all_table_map &~join->const_table_map))
2302 DBUG_RETURN(TRUE); /* purecov: inspected */
2303 /*
2304 The best plan to run the subquery is now in join->best_positions,
2305 save it.
2306 */
2307 uint n_tables= my_count_bits(sj_nest->sj_inner_tables & ~join->const_table_map);
2308 SJ_MATERIALIZATION_INFO* sjm;
2309 if (!(sjm= new SJ_MATERIALIZATION_INFO) ||
2310 !(sjm->positions= (POSITION*)join->thd->alloc(sizeof(POSITION)*
2311 n_tables)))
2312 DBUG_RETURN(TRUE); /* purecov: inspected */
2313 sjm->tables= n_tables;
2314 sjm->is_used= FALSE;
2315 double subjoin_out_rows, subjoin_read_time;
2316
2317 /*
2318 join->get_partial_cost_and_fanout(n_tables + join->const_tables,
2319 table_map(-1),
2320 &subjoin_read_time,
2321 &subjoin_out_rows);
2322 */
2323 join->get_prefix_cost_and_fanout(n_tables,
2324 &subjoin_read_time,
2325 &subjoin_out_rows);
2326
2327 sjm->materialization_cost.convert_from_cost(subjoin_read_time);
2328 sjm->rows= subjoin_out_rows;
2329
2330 // Don't use the following list because it has "stale" items. use
2331 // ref_pointer_array instead:
2332 //
2333 //List<Item> &right_expr_list=
2334 // sj_nest->sj_subq_pred->unit->first_select()->item_list;
2335 /*
2336 Adjust output cardinality estimates. If the subquery has form
2337
2338 ... oe IN (SELECT t1.colX, t2.colY, func(X,Y,Z) )
2339
2340 then the number of distinct output record combinations has an
2341 upper bound of product of number of records matching the tables
2342 that are used by the SELECT clause.
2343 TODO:
2344 We can get a more precise estimate if we
2345 - use rec_per_key cardinality estimates. For simple cases like
2346 "oe IN (SELECT t.key ...)" it is trivial.
2347 - Functional dependencies between the tables in the semi-join
2348 nest (the payoff is probably less here?)
2349
2350 See also get_post_group_estimate().
2351 */
2352 SELECT_LEX *subq_select= sj_nest->sj_subq_pred->unit->first_select();
2353 {
2354 for (uint i=0 ; i < join->const_tables + sjm->tables ; i++)
2355 {
2356 JOIN_TAB *tab= join->best_positions[i].table;
2357 join->map2table[tab->table->tablenr]= tab;
2358 }
2359 table_map map= 0;
2360 for (uint i=0; i < subq_select->item_list.elements; i++)
2361 map|= subq_select->ref_pointer_array[i]->used_tables();
2362 map= map & ~PSEUDO_TABLE_BITS;
2363 Table_map_iterator tm_it(map);
2364 int tableno;
2365 double rows= 1.0;
2366 while ((tableno = tm_it.next_bit()) != Table_map_iterator::BITMAP_END)
2367 rows *= join->map2table[tableno]->table->quick_condition_rows;
2368 sjm->rows= MY_MIN(sjm->rows, rows);
2369 }
2370 memcpy((uchar*) sjm->positions,
2371 (uchar*) (join->best_positions + join->const_tables),
2372 sizeof(POSITION) * n_tables);
2373
2374 /*
2375 Calculate temporary table parameters and usage costs
2376 */
2377 uint rowlen= get_tmp_table_rec_length(subq_select->ref_pointer_array,
2378 subq_select->item_list.elements);
2379 double lookup_cost= get_tmp_table_lookup_cost(join->thd,
2380 subjoin_out_rows, rowlen);
2381 double write_cost= get_tmp_table_write_cost(join->thd,
2382 subjoin_out_rows, rowlen);
2383
2384 /*
2385 Let materialization cost include the cost to write the data into the
2386 temporary table:
2387 */
2388 sjm->materialization_cost.add_io(subjoin_out_rows, write_cost);
2389
2390 /*
2391 Set the cost to do a full scan of the temptable (will need this to
2392 consider doing sjm-scan):
2393 */
2394 sjm->scan_cost.reset();
2395 sjm->scan_cost.add_io(sjm->rows, lookup_cost);
2396
2397 sjm->lookup_cost.convert_from_cost(lookup_cost);
2398 sj_nest->sj_mat_info= sjm;
2399 DBUG_EXECUTE("opt", print_sjm(sjm););
2400 }
2401 }
2402 }
2403 join->emb_sjm_nest= NULL;
2404 DBUG_RETURN(FALSE);
2405}
2406
2407
2408/*
2409 Get estimated record length for semi-join materialization temptable
2410
2411 SYNOPSIS
2412 get_tmp_table_rec_length()
2413 items IN subquery's select list.
2414
2415 DESCRIPTION
2416 Calculate estimated record length for semi-join materialization
2417 temptable. It's an estimate because we don't follow every bit of
2418 create_tmp_table()'s logic. This isn't necessary as the return value of
2419 this function is used only for cost calculations.
2420
2421 RETURN
2422 Length of the temptable record, in bytes
2423*/
2424
2425static uint get_tmp_table_rec_length(Ref_ptr_array p_items, uint elements)
2426{
2427 uint len= 0;
2428 Item *item;
2429 //List_iterator<Item> it(items);
2430 for (uint i= 0; i < elements ; i++)
2431 {
2432 item = p_items[i];
2433 switch (item->result_type()) {
2434 case REAL_RESULT:
2435 len += sizeof(double);
2436 break;
2437 case INT_RESULT:
2438 if (item->max_length >= (MY_INT32_NUM_DECIMAL_DIGITS - 1))
2439 len += 8;
2440 else
2441 len += 4;
2442 break;
2443 case STRING_RESULT:
2444 enum enum_field_types type;
2445 /* DATE/TIME and GEOMETRY fields have STRING_RESULT result type. */
2446 if ((type= item->field_type()) == MYSQL_TYPE_DATETIME ||
2447 type == MYSQL_TYPE_TIME || type == MYSQL_TYPE_DATE ||
2448 type == MYSQL_TYPE_TIMESTAMP || type == MYSQL_TYPE_GEOMETRY)
2449 len += 8;
2450 else
2451 len += item->max_length;
2452 break;
2453 case DECIMAL_RESULT:
2454 len += 10;
2455 break;
2456 case ROW_RESULT:
2457 default:
2458 DBUG_ASSERT(0); /* purecov: deadcode */
2459 break;
2460 }
2461 }
2462 return len;
2463}
2464
2465
2466/**
2467 The cost of a lookup into a unique hash/btree index on a temporary table
2468 with 'row_count' rows each of size 'row_size'.
2469
2470 @param thd current query context
2471 @param row_count number of rows in the temp table
2472 @param row_size average size in bytes of the rows
2473
2474 @return the cost of one lookup
2475*/
2476
2477double
2478get_tmp_table_lookup_cost(THD *thd, double row_count, uint row_size)
2479{
2480 if (row_count * row_size > thd->variables.max_heap_table_size)
2481 return (double) DISK_TEMPTABLE_LOOKUP_COST;
2482 else
2483 return (double) HEAP_TEMPTABLE_LOOKUP_COST;
2484}
2485
2486/**
2487 The cost of writing a row into a temporary table with 'row_count' unique
2488 rows each of size 'row_size'.
2489
2490 @param thd current query context
2491 @param row_count number of rows in the temp table
2492 @param row_size average size in bytes of the rows
2493
2494 @return the cost of writing one row
2495*/
2496
2497double
2498get_tmp_table_write_cost(THD *thd, double row_count, uint row_size)
2499{
2500 double lookup_cost= get_tmp_table_lookup_cost(thd, row_count, row_size);
2501 /*
2502 TODO:
2503 This is an optimistic estimate. Add additional costs resulting from
2504 actually writing the row to memory/disk and possible index reorganization.
2505 */
2506 return lookup_cost;
2507}
2508
2509
2510/*
2511 Check if table's KEYUSE elements have an eq_ref(outer_tables) candidate
2512
2513 SYNOPSIS
2514 find_eq_ref_candidate()
2515 table Table to be checked
2516 sj_inner_tables Bitmap of inner tables. eq_ref(inner_table) doesn't
2517 count.
2518
2519 DESCRIPTION
2520 Check if table's KEYUSE elements have an eq_ref(outer_tables) candidate
2521
2522 TODO
2523 Check again if it is feasible to factor common parts with constant table
2524 search
2525
2526 Also check if it's feasible to factor common parts with table elimination
2527
2528 RETURN
2529 TRUE - There exists an eq_ref(outer-tables) candidate
2530 FALSE - Otherwise
2531*/
2532
2533bool find_eq_ref_candidate(TABLE *table, table_map sj_inner_tables)
2534{
2535 KEYUSE *keyuse= table->reginfo.join_tab->keyuse;
2536
2537 if (keyuse)
2538 {
2539 do
2540 {
2541 uint key= keyuse->key;
2542 KEY *keyinfo;
2543 key_part_map bound_parts= 0;
2544 bool is_excluded_key= keyuse->is_for_hash_join();
2545 if (!is_excluded_key)
2546 {
2547 keyinfo= table->key_info + key;
2548 is_excluded_key= !MY_TEST(keyinfo->flags & HA_NOSAME);
2549 }
2550 if (!is_excluded_key)
2551 {
2552 do /* For all equalities on all key parts */
2553 {
2554 /* Check if this is "t.keypart = expr(outer_tables) */
2555 if (!(keyuse->used_tables & sj_inner_tables) &&
2556 !(keyuse->optimize & KEY_OPTIMIZE_REF_OR_NULL))
2557 {
2558 bound_parts |= 1 << keyuse->keypart;
2559 }
2560 keyuse++;
2561 } while (keyuse->key == key && keyuse->table == table);
2562
2563 if (bound_parts == PREV_BITS(uint, keyinfo->user_defined_key_parts))
2564 return TRUE;
2565 }
2566 else
2567 {
2568 do
2569 {
2570 keyuse++;
2571 } while (keyuse->key == key && keyuse->table == table);
2572 }
2573 } while (keyuse->table == table);
2574 }
2575 return FALSE;
2576}
2577
2578
2579/*
2580 Do semi-join optimization step after we've added a new tab to join prefix
2581
2582 SYNOPSIS
2583 advance_sj_state()
2584 join The join we're optimizing
2585 remaining_tables Tables not in the join prefix
2586 new_join_tab Join tab we've just added to the join prefix
2587 idx Index of this join tab (i.e. number of tables
2588 in the prefix minus one)
2589 current_record_count INOUT Estimate of #records in join prefix's output
2590 current_read_time INOUT Cost to execute the join prefix
2591 loose_scan_pos IN A POSITION with LooseScan plan to access
2592 table new_join_tab
2593 (produced by the last best_access_path call)
2594
2595 DESCRIPTION
2596 Update semi-join optimization state after we've added another tab (table
2597 and access method) to the join prefix.
2598
2599 The state is maintained in join->positions[#prefix_size]. Each of the
2600 available strategies has its own state variables.
2601
2602 for each semi-join strategy
2603 {
2604 update strategy's state variables;
2605
2606 if (join prefix has all the tables that are needed to consider
2607 using this strategy for the semi-join(s))
2608 {
2609 calculate cost of using the strategy
2610 if ((this is the first strategy to handle the semi-join nest(s) ||
2611 the cost is less than other strategies))
2612 {
2613 // Pick this strategy
2614 pos->sj_strategy= ..
2615 ..
2616 }
2617 }
2618
2619 Most of the new state is saved join->positions[idx] (and hence no undo
2620 is necessary). Several members of class JOIN are updated also, these
2621 changes can be rolled back with restore_prev_sj_state().
2622
2623 See setup_semijoin_dups_elimination() for a description of what kinds of
2624 join prefixes each strategy can handle.
2625*/
2626
2627bool is_multiple_semi_joins(JOIN *join, POSITION *prefix, uint idx, table_map inner_tables)
2628{
2629 for (int i= (int)idx; i >= 0; i--)
2630 {
2631 TABLE_LIST *emb_sj_nest;
2632 if ((emb_sj_nest= prefix[i].table->emb_sj_nest))
2633 {
2634 if (inner_tables & emb_sj_nest->sj_inner_tables)
2635 return !MY_TEST(inner_tables == (emb_sj_nest->sj_inner_tables &
2636 ~join->const_table_map));
2637 }
2638 }
2639 return FALSE;
2640}
2641
2642
2643void advance_sj_state(JOIN *join, table_map remaining_tables, uint idx,
2644 double *current_record_count, double *current_read_time,
2645 POSITION *loose_scan_pos)
2646{
2647 POSITION *pos= join->positions + idx;
2648 const JOIN_TAB *new_join_tab= pos->table;
2649 Semi_join_strategy_picker *pickers[]=
2650 {
2651 &pos->firstmatch_picker,
2652 &pos->loosescan_picker,
2653 &pos->sjmat_picker,
2654 &pos->dups_weedout_picker,
2655 NULL,
2656 };
2657
2658 if (join->emb_sjm_nest)
2659 {
2660 /*
2661 We're performing optimization inside SJ-Materialization nest:
2662 - there are no other semi-joins inside semi-join nests
2663 - attempts to build semi-join strategies here will confuse
2664 the optimizer, so bail out.
2665 */
2666 pos->sj_strategy= SJ_OPT_NONE;
2667 return;
2668 }
2669
2670 /*
2671 Update join->cur_sj_inner_tables (Used by FirstMatch in this function and
2672 LooseScan detector in best_access_path)
2673 */
2674 remaining_tables &= ~new_join_tab->table->map;
2675 table_map dups_producing_tables, UNINIT_VAR(prev_dups_producing_tables),
2676 UNINIT_VAR(prev_sjm_lookup_tables);
2677
2678 if (idx == join->const_tables)
2679 dups_producing_tables= 0;
2680 else
2681 dups_producing_tables= pos[-1].dups_producing_tables;
2682
2683 TABLE_LIST *emb_sj_nest;
2684 if ((emb_sj_nest= new_join_tab->emb_sj_nest))
2685 dups_producing_tables |= emb_sj_nest->sj_inner_tables;
2686
2687 Semi_join_strategy_picker **strategy, **prev_strategy= 0;
2688 if (idx == join->const_tables)
2689 {
2690 /* First table, initialize pickers */
2691 for (strategy= pickers; *strategy != NULL; strategy++)
2692 (*strategy)->set_empty();
2693 pos->inner_tables_handled_with_other_sjs= 0;
2694 }
2695 else
2696 {
2697 for (strategy= pickers; *strategy != NULL; strategy++)
2698 {
2699 (*strategy)->set_from_prev(pos - 1);
2700 }
2701 pos->inner_tables_handled_with_other_sjs=
2702 pos[-1].inner_tables_handled_with_other_sjs;
2703 }
2704
2705 pos->prefix_cost.convert_from_cost(*current_read_time);
2706 pos->prefix_record_count= *current_record_count;
2707
2708 {
2709 pos->sj_strategy= SJ_OPT_NONE;
2710
2711 for (strategy= pickers; *strategy != NULL; strategy++)
2712 {
2713 table_map handled_fanout;
2714 sj_strategy_enum sj_strategy;
2715 double rec_count= *current_record_count;
2716 double read_time= *current_read_time;
2717 if ((*strategy)->check_qep(join, idx, remaining_tables,
2718 new_join_tab,
2719 &rec_count,
2720 &read_time,
2721 &handled_fanout,
2722 &sj_strategy,
2723 loose_scan_pos))
2724 {
2725 /*
2726 It's possible to use the strategy. Use it, if
2727 - it removes semi-join fanout that was not removed before
2728 - using it is cheaper than using something else,
2729 and {if some other strategy has removed fanout
2730 that this strategy is trying to remove, then it
2731 did remove the fanout only for one semi-join}
2732 This is to avoid a situation when
2733 1. strategy X removes fanout for semijoin X,Y
2734 2. using strategy Z is cheaper, but it only removes
2735 fanout from semijoin X.
2736 3. We have no clue what to do about fanount of semi-join Y.
2737 */
2738 if ((dups_producing_tables & handled_fanout) ||
2739 (read_time < *current_read_time &&
2740 !(handled_fanout & pos->inner_tables_handled_with_other_sjs)))
2741 {
2742 DBUG_ASSERT(pos->sj_strategy != sj_strategy);
2743 /*
2744 If the strategy choosen first time or
2745 the strategy replace strategy which was used to exectly the same
2746 tables
2747 */
2748 if (pos->sj_strategy == SJ_OPT_NONE ||
2749 handled_fanout ==
2750 (prev_dups_producing_tables ^ dups_producing_tables))
2751 {
2752 prev_strategy= strategy;
2753 if (pos->sj_strategy == SJ_OPT_NONE)
2754 {
2755 prev_dups_producing_tables= dups_producing_tables;
2756 prev_sjm_lookup_tables= join->sjm_lookup_tables;
2757 }
2758 /* Mark strategy as used */
2759 (*strategy)->mark_used();
2760 pos->sj_strategy= sj_strategy;
2761 if (sj_strategy == SJ_OPT_MATERIALIZE)
2762 join->sjm_lookup_tables |= handled_fanout;
2763 else
2764 join->sjm_lookup_tables &= ~handled_fanout;
2765 *current_read_time= read_time;
2766 *current_record_count= rec_count;
2767 dups_producing_tables &= ~handled_fanout;
2768 //TODO: update bitmap of semi-joins that were handled together with
2769 // others.
2770 if (is_multiple_semi_joins(join, join->positions, idx,
2771 handled_fanout))
2772 pos->inner_tables_handled_with_other_sjs |= handled_fanout;
2773 }
2774 else
2775 {
2776 /* Conflict fall to most general variant */
2777 (*prev_strategy)->set_empty();
2778 dups_producing_tables= prev_dups_producing_tables;
2779 join->sjm_lookup_tables= prev_sjm_lookup_tables;
2780 // mark it 'none' to avpoid loops
2781 pos->sj_strategy= SJ_OPT_NONE;
2782 // next skip to last;
2783 strategy= pickers +
2784 (sizeof(pickers)/sizeof(Semi_join_strategy_picker*) - 3);
2785 continue;
2786 }
2787 }
2788 else
2789 {
2790 /* We decided not to apply the strategy. */
2791 (*strategy)->set_empty();
2792 }
2793 }
2794 }
2795 }
2796
2797 if ((emb_sj_nest= new_join_tab->emb_sj_nest))
2798 {
2799 join->cur_sj_inner_tables |= emb_sj_nest->sj_inner_tables;
2800
2801 /* Remove the sj_nest if all of its SJ-inner tables are in cur_table_map */
2802 if (!(remaining_tables &
2803 emb_sj_nest->sj_inner_tables & ~new_join_tab->table->map))
2804 join->cur_sj_inner_tables &= ~emb_sj_nest->sj_inner_tables;
2805 }
2806
2807 pos->prefix_cost.convert_from_cost(*current_read_time);
2808 pos->prefix_record_count= *current_record_count;
2809 pos->dups_producing_tables= dups_producing_tables;
2810}
2811
2812
2813void Sj_materialization_picker::set_from_prev(struct st_position *prev)
2814{
2815 if (prev->sjmat_picker.is_used)
2816 set_empty();
2817 else
2818 {
2819 sjm_scan_need_tables= prev->sjmat_picker.sjm_scan_need_tables;
2820 sjm_scan_last_inner= prev->sjmat_picker.sjm_scan_last_inner;
2821 }
2822 is_used= FALSE;
2823}
2824
2825
2826bool Sj_materialization_picker::check_qep(JOIN *join,
2827 uint idx,
2828 table_map remaining_tables,
2829 const JOIN_TAB *new_join_tab,
2830 double *record_count,
2831 double *read_time,
2832 table_map *handled_fanout,
2833 sj_strategy_enum *strategy,
2834 POSITION *loose_scan_pos)
2835{
2836 bool sjm_scan;
2837 SJ_MATERIALIZATION_INFO *mat_info;
2838 if ((mat_info= at_sjmat_pos(join, remaining_tables,
2839 new_join_tab, idx, &sjm_scan)))
2840 {
2841 if (sjm_scan)
2842 {
2843 /*
2844 We can't yet evaluate this option yet. This is because we can't
2845 accout for fanout of sj-inner tables yet:
2846
2847 ntX SJM-SCAN(it1 ... itN) | ot1 ... otN |
2848 ^(1) ^(2)
2849
2850 we're now at position (1). SJM temptable in general has multiple
2851 records, so at point (1) we'll get the fanout from sj-inner tables (ie
2852 there will be multiple record combinations).
2853
2854 The final join result will not contain any semi-join produced
2855 fanout, i.e. tables within SJM-SCAN(...) will not contribute to
2856 the cardinality of the join output. Extra fanout produced by
2857 SJM-SCAN(...) will be 'absorbed' into fanout produced by ot1 ... otN.
2858
2859 The simple way to model this is to remove SJM-SCAN(...) fanout once
2860 we reach the point #2.
2861 */
2862 sjm_scan_need_tables=
2863 new_join_tab->emb_sj_nest->sj_inner_tables |
2864 new_join_tab->emb_sj_nest->nested_join->sj_depends_on |
2865 new_join_tab->emb_sj_nest->nested_join->sj_corr_tables;
2866 sjm_scan_last_inner= idx;
2867 }
2868 else
2869 {
2870 /* This is SJ-Materialization with lookups */
2871 Cost_estimate prefix_cost;
2872 signed int first_tab= (int)idx - mat_info->tables;
2873 double prefix_rec_count;
2874 if (first_tab < (int)join->const_tables)
2875 {
2876 prefix_cost.reset();
2877 prefix_rec_count= 1.0;
2878 }
2879 else
2880 {
2881 prefix_cost= join->positions[first_tab].prefix_cost;
2882 prefix_rec_count= join->positions[first_tab].prefix_record_count;
2883 }
2884
2885 double mat_read_time= prefix_cost.total_cost();
2886 mat_read_time += mat_info->materialization_cost.total_cost() +
2887 prefix_rec_count * mat_info->lookup_cost.total_cost();
2888
2889 /*
2890 NOTE: When we pick to use SJM[-Scan] we don't memcpy its POSITION
2891 elements to join->positions as that makes it hard to return things
2892 back when making one step back in join optimization. That's done
2893 after the QEP has been chosen.
2894 */
2895 *read_time= mat_read_time;
2896 *record_count= prefix_rec_count;
2897 *handled_fanout= new_join_tab->emb_sj_nest->sj_inner_tables;
2898 *strategy= SJ_OPT_MATERIALIZE;
2899 return TRUE;
2900 }
2901 }
2902
2903 /* 4.A SJM-Scan second phase check */
2904 if (sjm_scan_need_tables && /* Have SJM-Scan prefix */
2905 !(sjm_scan_need_tables & remaining_tables))
2906 {
2907 TABLE_LIST *mat_nest=
2908 join->positions[sjm_scan_last_inner].table->emb_sj_nest;
2909 SJ_MATERIALIZATION_INFO *mat_info= mat_nest->sj_mat_info;
2910
2911 double prefix_cost;
2912 double prefix_rec_count;
2913 int first_tab= sjm_scan_last_inner + 1 - mat_info->tables;
2914 /* Get the prefix cost */
2915 if (first_tab == (int)join->const_tables)
2916 {
2917 prefix_rec_count= 1.0;
2918 prefix_cost= 0.0;
2919 }
2920 else
2921 {
2922 prefix_cost= join->positions[first_tab - 1].prefix_cost.total_cost();
2923 prefix_rec_count= join->positions[first_tab - 1].prefix_record_count;
2924 }
2925
2926 /* Add materialization cost */
2927 prefix_cost += mat_info->materialization_cost.total_cost() +
2928 prefix_rec_count * mat_info->scan_cost.total_cost();
2929 prefix_rec_count *= mat_info->rows;
2930
2931 uint i;
2932 table_map rem_tables= remaining_tables;
2933 for (i= idx; i != (first_tab + mat_info->tables - 1); i--)
2934 rem_tables |= join->positions[i].table->table->map;
2935
2936 POSITION curpos, dummy;
2937 /* Need to re-run best-access-path as we prefix_rec_count has changed */
2938 bool disable_jbuf= (join->thd->variables.join_cache_level == 0);
2939 for (i= first_tab + mat_info->tables; i <= idx; i++)
2940 {
2941 best_access_path(join, join->positions[i].table, rem_tables, i,
2942 disable_jbuf, prefix_rec_count, &curpos, &dummy);
2943 prefix_rec_count *= curpos.records_read;
2944 prefix_cost += curpos.read_time;
2945 }
2946
2947 *strategy= SJ_OPT_MATERIALIZE_SCAN;
2948 *read_time= prefix_cost;
2949 *record_count= prefix_rec_count;
2950 *handled_fanout= mat_nest->sj_inner_tables;
2951 return TRUE;
2952 }
2953 return FALSE;
2954}
2955
2956
2957void LooseScan_picker::set_from_prev(struct st_position *prev)
2958{
2959 if (prev->loosescan_picker.is_used)
2960 set_empty();
2961 else
2962 {
2963 first_loosescan_table= prev->loosescan_picker.first_loosescan_table;
2964 loosescan_need_tables= prev->loosescan_picker.loosescan_need_tables;
2965 }
2966 is_used= FALSE;
2967}
2968
2969
2970bool LooseScan_picker::check_qep(JOIN *join,
2971 uint idx,
2972 table_map remaining_tables,
2973 const JOIN_TAB *new_join_tab,
2974 double *record_count,
2975 double *read_time,
2976 table_map *handled_fanout,
2977 sj_strategy_enum *strategy,
2978 struct st_position *loose_scan_pos)
2979{
2980 POSITION *first= join->positions + first_loosescan_table;
2981 /*
2982 LooseScan strategy can't handle interleaving between tables from the
2983 semi-join that LooseScan is handling and any other tables.
2984
2985 If we were considering LooseScan for the join prefix (1)
2986 and the table we're adding creates an interleaving (2)
2987 then
2988 stop considering loose scan
2989 */
2990 if ((first_loosescan_table != MAX_TABLES) && // (1)
2991 (first->table->emb_sj_nest->sj_inner_tables & remaining_tables) && //(2)
2992 new_join_tab->emb_sj_nest != first->table->emb_sj_nest) //(2)
2993 {
2994 first_loosescan_table= MAX_TABLES;
2995 }
2996
2997 /*
2998 If we got an option to use LooseScan for the current table, start
2999 considering using LooseScan strategy
3000 */
3001 if (loose_scan_pos->read_time != DBL_MAX && !join->outer_join)
3002 {
3003 first_loosescan_table= idx;
3004 loosescan_need_tables=
3005 new_join_tab->emb_sj_nest->sj_inner_tables |
3006 new_join_tab->emb_sj_nest->nested_join->sj_depends_on |
3007 new_join_tab->emb_sj_nest->nested_join->sj_corr_tables;
3008 }
3009
3010 if ((first_loosescan_table != MAX_TABLES) &&
3011 !(remaining_tables & loosescan_need_tables) &&
3012 (new_join_tab->table->map & loosescan_need_tables))
3013 {
3014 /*
3015 Ok we have LooseScan plan and also have all LooseScan sj-nest's
3016 inner tables and outer correlated tables into the prefix.
3017 */
3018
3019 first= join->positions + first_loosescan_table;
3020 uint n_tables= my_count_bits(first->table->emb_sj_nest->sj_inner_tables);
3021 /* Got a complete LooseScan range. Calculate its cost */
3022 /*
3023 The same problem as with FirstMatch - we need to save POSITIONs
3024 somewhere but reserving space for all cases would require too
3025 much space. We will re-calculate POSITION structures later on.
3026 */
3027 bool disable_jbuf= (join->thd->variables.join_cache_level == 0);
3028 optimize_wo_join_buffering(join, first_loosescan_table, idx,
3029 remaining_tables,
3030 TRUE, //first_alt
3031 disable_jbuf ? join->table_count :
3032 first_loosescan_table + n_tables,
3033 record_count,
3034 read_time);
3035 /*
3036 We don't yet have any other strategies that could handle this
3037 semi-join nest (the other options are Duplicate Elimination or
3038 Materialization, which need at least the same set of tables in
3039 the join prefix to be considered) so unconditionally pick the
3040 LooseScan.
3041 */
3042 *strategy= SJ_OPT_LOOSE_SCAN;
3043 *handled_fanout= first->table->emb_sj_nest->sj_inner_tables;
3044 return TRUE;
3045 }
3046 return FALSE;
3047}
3048
3049void Firstmatch_picker::set_from_prev(struct st_position *prev)
3050{
3051 if (prev->firstmatch_picker.is_used)
3052 invalidate_firstmatch_prefix();
3053 else
3054 {
3055 first_firstmatch_table= prev->firstmatch_picker.first_firstmatch_table;
3056 first_firstmatch_rtbl= prev->firstmatch_picker.first_firstmatch_rtbl;
3057 firstmatch_need_tables= prev->firstmatch_picker.firstmatch_need_tables;
3058 }
3059 is_used= FALSE;
3060}
3061
3062bool Firstmatch_picker::check_qep(JOIN *join,
3063 uint idx,
3064 table_map remaining_tables,
3065 const JOIN_TAB *new_join_tab,
3066 double *record_count,
3067 double *read_time,
3068 table_map *handled_fanout,
3069 sj_strategy_enum *strategy,
3070 POSITION *loose_scan_pos)
3071{
3072 if (new_join_tab->emb_sj_nest &&
3073 optimizer_flag(join->thd, OPTIMIZER_SWITCH_FIRSTMATCH) &&
3074 !join->outer_join)
3075 {
3076 const table_map outer_corr_tables=
3077 new_join_tab->emb_sj_nest->nested_join->sj_corr_tables |
3078 new_join_tab->emb_sj_nest->nested_join->sj_depends_on;
3079 const table_map sj_inner_tables=
3080 new_join_tab->emb_sj_nest->sj_inner_tables & ~join->const_table_map;
3081
3082 /*
3083 Enter condition:
3084 1. The next join tab belongs to semi-join nest
3085 (verified for the encompassing code block above).
3086 2. We're not in a duplicate producer range yet
3087 3. All outer tables that
3088 - the subquery is correlated with, or
3089 - referred to from the outer_expr
3090 are in the join prefix
3091 4. All inner tables are still part of remaining_tables.
3092 */
3093 if (!join->cur_sj_inner_tables && // (2)
3094 !(remaining_tables & outer_corr_tables) && // (3)
3095 (sj_inner_tables == // (4)
3096 ((remaining_tables | new_join_tab->table->map) & sj_inner_tables)))
3097 {
3098 /* Start tracking potential FirstMatch range */
3099 first_firstmatch_table= idx;
3100 firstmatch_need_tables= sj_inner_tables;
3101 first_firstmatch_rtbl= remaining_tables;
3102 }
3103
3104 if (in_firstmatch_prefix())
3105 {
3106 if (outer_corr_tables & first_firstmatch_rtbl)
3107 {
3108 /*
3109 Trying to add an sj-inner table whose sj-nest has an outer correlated
3110 table that was not in the prefix. This means FirstMatch can't be used.
3111 */
3112 invalidate_firstmatch_prefix();
3113 }
3114 else
3115 {
3116 /* Record that we need all of this semi-join's inner tables, too */
3117 firstmatch_need_tables|= sj_inner_tables;
3118 }
3119
3120 if (in_firstmatch_prefix() &&
3121 !(firstmatch_need_tables & remaining_tables))
3122 {
3123 /*
3124 Got a complete FirstMatch range. Calculate correct costs and fanout
3125 */
3126
3127 if (idx == first_firstmatch_table &&
3128 optimizer_flag(join->thd, OPTIMIZER_SWITCH_SEMIJOIN_WITH_CACHE))
3129 {
3130 /*
3131 An important special case: only one inner table, and @@optimizer_switch
3132 allows join buffering.
3133 - read_time is the same (i.e. FirstMatch doesn't add any cost
3134 - remove fanout added by the last table
3135 */
3136 if (*record_count)
3137 *record_count /= join->positions[idx].records_read;
3138 }
3139 else
3140 {
3141 optimize_wo_join_buffering(join, first_firstmatch_table, idx,
3142 remaining_tables, FALSE, idx,
3143 record_count,
3144 read_time);
3145 }
3146 /*
3147 We ought to save the alternate POSITIONs produced by
3148 optimize_wo_join_buffering but the problem is that providing save
3149 space uses too much space. Instead, we will re-calculate the
3150 alternate POSITIONs after we've picked the best QEP.
3151 */
3152 *handled_fanout= firstmatch_need_tables;
3153 /* *record_count and *read_time were set by the above call */
3154 *strategy= SJ_OPT_FIRST_MATCH;
3155 return TRUE;
3156 }
3157 }
3158 }
3159 else
3160 invalidate_firstmatch_prefix();
3161 return FALSE;
3162}
3163
3164
3165void Duplicate_weedout_picker::set_from_prev(POSITION *prev)
3166{
3167 if (prev->dups_weedout_picker.is_used)
3168 set_empty();
3169 else
3170 {
3171 dupsweedout_tables= prev->dups_weedout_picker.dupsweedout_tables;
3172 first_dupsweedout_table= prev->dups_weedout_picker.first_dupsweedout_table;
3173 }
3174 is_used= FALSE;
3175}
3176
3177
3178bool Duplicate_weedout_picker::check_qep(JOIN *join,
3179 uint idx,
3180 table_map remaining_tables,
3181 const JOIN_TAB *new_join_tab,
3182 double *record_count,
3183 double *read_time,
3184 table_map *handled_fanout,
3185 sj_strategy_enum *strategy,
3186 POSITION *loose_scan_pos
3187 )
3188{
3189 TABLE_LIST *nest;
3190 if ((nest= new_join_tab->emb_sj_nest))
3191 {
3192 if (!dupsweedout_tables)
3193 first_dupsweedout_table= idx;
3194
3195 dupsweedout_tables |= nest->sj_inner_tables |
3196 nest->nested_join->sj_depends_on |
3197 nest->nested_join->sj_corr_tables;
3198 }
3199
3200 if (dupsweedout_tables)
3201 {
3202 /* we're in the process of constructing a DuplicateWeedout range */
3203 TABLE_LIST *emb= new_join_tab->table->pos_in_table_list->embedding;
3204 /* and we've entered an inner side of an outer join*/
3205 if (emb && emb->on_expr)
3206 dupsweedout_tables |= emb->nested_join->used_tables;
3207 }
3208
3209 /* If this is the last table that we need for DuplicateWeedout range */
3210 if (dupsweedout_tables && !(remaining_tables & ~new_join_tab->table->map &
3211 dupsweedout_tables))
3212 {
3213 /*
3214 Ok, reached a state where we could put a dups weedout point.
3215 Walk back and calculate
3216 - the join cost (this is needed as the accumulated cost may assume
3217 some other duplicate elimination method)
3218 - extra fanout that will be removed by duplicate elimination
3219 - duplicate elimination cost
3220 There are two cases:
3221 1. We have other strategy/ies to remove all of the duplicates.
3222 2. We don't.
3223
3224 We need to calculate the cost in case #2 also because we need to make
3225 choice between this join order and others.
3226 */
3227 uint first_tab= first_dupsweedout_table;
3228 double dups_cost;
3229 double prefix_rec_count;
3230 double sj_inner_fanout= 1.0;
3231 double sj_outer_fanout= 1.0;
3232 uint temptable_rec_size;
3233 if (first_tab == join->const_tables)
3234 {
3235 prefix_rec_count= 1.0;
3236 temptable_rec_size= 0;
3237 dups_cost= 0.0;
3238 }
3239 else
3240 {
3241 dups_cost= join->positions[first_tab - 1].prefix_cost.total_cost();
3242 prefix_rec_count= join->positions[first_tab - 1].prefix_record_count;
3243 temptable_rec_size= 8; /* This is not true but we'll make it so */
3244 }
3245
3246 table_map dups_removed_fanout= 0;
3247 double current_fanout= prefix_rec_count;
3248 for (uint j= first_dupsweedout_table; j <= idx; j++)
3249 {
3250 POSITION *p= join->positions + j;
3251 current_fanout *= p->records_read;
3252 dups_cost += p->read_time + current_fanout / TIME_FOR_COMPARE;
3253 if (p->table->emb_sj_nest)
3254 {
3255 sj_inner_fanout *= p->records_read;
3256 dups_removed_fanout |= p->table->table->map;
3257 }
3258 else
3259 {
3260 sj_outer_fanout *= p->records_read;
3261 temptable_rec_size += p->table->table->file->ref_length;
3262 }
3263 }
3264
3265 /*
3266 Add the cost of temptable use. The table will have sj_outer_fanout
3267 records, and we will make
3268 - sj_outer_fanout table writes
3269 - sj_inner_fanout*sj_outer_fanout lookups.
3270
3271 */
3272 double one_lookup_cost= get_tmp_table_lookup_cost(join->thd,
3273 sj_outer_fanout,
3274 temptable_rec_size);
3275 double one_write_cost= get_tmp_table_write_cost(join->thd,
3276 sj_outer_fanout,
3277 temptable_rec_size);
3278
3279 double write_cost= join->positions[first_tab].prefix_record_count*
3280 sj_outer_fanout * one_write_cost;
3281 double full_lookup_cost= join->positions[first_tab].prefix_record_count*
3282 sj_outer_fanout* sj_inner_fanout *
3283 one_lookup_cost;
3284 dups_cost += write_cost + full_lookup_cost;
3285
3286 *read_time= dups_cost;
3287 *record_count= prefix_rec_count * sj_outer_fanout;
3288 *handled_fanout= dups_removed_fanout;
3289 *strategy= SJ_OPT_DUPS_WEEDOUT;
3290 return TRUE;
3291 }
3292 return FALSE;
3293}
3294
3295
3296/*
3297 Remove the last join tab from from join->cur_sj_inner_tables bitmap
3298 we assume remaining_tables doesnt contain @tab.
3299*/
3300
3301void restore_prev_sj_state(const table_map remaining_tables,
3302 const JOIN_TAB *tab, uint idx)
3303{
3304 TABLE_LIST *emb_sj_nest;
3305
3306 if (tab->emb_sj_nest)
3307 {
3308 table_map subq_tables= tab->emb_sj_nest->sj_inner_tables;
3309 tab->join->sjm_lookup_tables &= ~subq_tables;
3310 }
3311
3312 if ((emb_sj_nest= tab->emb_sj_nest))
3313 {
3314 /* If we're removing the last SJ-inner table, remove the sj-nest */
3315 if ((remaining_tables & emb_sj_nest->sj_inner_tables) ==
3316 (emb_sj_nest->sj_inner_tables & ~tab->table->map))
3317 {
3318 tab->join->cur_sj_inner_tables &= ~emb_sj_nest->sj_inner_tables;
3319 }
3320 }
3321}
3322
3323
3324/*
3325 Given a semi-join nest, find out which of the IN-equalities are bound
3326
3327 SYNOPSIS
3328 get_bound_sj_equalities()
3329 sj_nest Semi-join nest
3330 remaining_tables Tables that are not yet bound
3331
3332 DESCRIPTION
3333 Given a semi-join nest, find out which of the IN-equalities have their
3334 left part expression bound (i.e. the said expression doesn't refer to
3335 any of remaining_tables and can be evaluated).
3336
3337 RETURN
3338 Bitmap of bound IN-equalities.
3339*/
3340
3341ulonglong get_bound_sj_equalities(TABLE_LIST *sj_nest,
3342 table_map remaining_tables)
3343{
3344 List_iterator<Item_ptr> li(sj_nest->nested_join->sj_outer_expr_list);
3345 Item **item;
3346 uint i= 0;
3347 ulonglong res= 0;
3348 while ((item= li++))
3349 {
3350 /*
3351 Q: should this take into account equality propagation and how?
3352 A: If e->outer_side is an Item_field, walk over the equality
3353 class and see if there is an element that is bound?
3354 (this is an optional feature)
3355 */
3356 if (!(item[0]->used_tables() & remaining_tables))
3357 {
3358 res |= 1ULL << i;
3359 }
3360 i++;
3361 }
3362 return res;
3363}
3364
3365
3366/*
3367 Check if the last tables of the partial join order allow to use
3368 sj-materialization strategy for them
3369
3370 SYNOPSIS
3371 at_sjmat_pos()
3372 join
3373 remaining_tables
3374 tab the last table's join tab
3375 idx last table's index
3376 loose_scan OUT TRUE <=> use LooseScan
3377
3378 RETURN
3379 TRUE Yes, can apply sj-materialization
3380 FALSE No, some of the requirements are not met
3381*/
3382
3383static SJ_MATERIALIZATION_INFO *
3384at_sjmat_pos(const JOIN *join, table_map remaining_tables, const JOIN_TAB *tab,
3385 uint idx, bool *loose_scan)
3386{
3387 /*
3388 Check if
3389 1. We're in a semi-join nest that can be run with SJ-materialization
3390 2. All the tables correlated through the IN subquery are in the prefix
3391 */
3392 TABLE_LIST *emb_sj_nest= tab->emb_sj_nest;
3393 table_map suffix= remaining_tables & ~tab->table->map;
3394 if (emb_sj_nest && emb_sj_nest->sj_mat_info &&
3395 !(suffix & emb_sj_nest->sj_inner_tables))
3396 {
3397 /*
3398 Walk back and check if all immediately preceding tables are from
3399 this semi-join.
3400 */
3401 uint n_tables= my_count_bits(tab->emb_sj_nest->sj_inner_tables);
3402 for (uint i= 1; i < n_tables ; i++)
3403 {
3404 if (join->positions[idx - i].table->emb_sj_nest != tab->emb_sj_nest)
3405 return NULL;
3406 }
3407 *loose_scan= MY_TEST(remaining_tables & ~tab->table->map &
3408 (emb_sj_nest->sj_inner_tables |
3409 emb_sj_nest->nested_join->sj_depends_on));
3410 if (*loose_scan && !emb_sj_nest->sj_subq_pred->sjm_scan_allowed)
3411 return NULL;
3412 else
3413 return emb_sj_nest->sj_mat_info;
3414 }
3415 return NULL;
3416}
3417
3418
3419/*
3420 Re-calculate values of join->best_positions[start..end].prefix_record_count
3421*/
3422
3423static void recalculate_prefix_record_count(JOIN *join, uint start, uint end)
3424{
3425 for (uint j= start; j < end ;j++)
3426 {
3427 double prefix_count;
3428 if (j == join->const_tables)
3429 prefix_count= 1.0;
3430 else
3431 prefix_count= join->best_positions[j-1].prefix_record_count *
3432 join->best_positions[j-1].records_read;
3433
3434 join->best_positions[j].prefix_record_count= prefix_count;
3435 }
3436}
3437
3438
3439/*
3440 Fix semi-join strategies for the picked join order
3441
3442 SYNOPSIS
3443 fix_semijoin_strategies_for_picked_join_order()
3444 join The join with the picked join order
3445
3446 DESCRIPTION
3447 Fix semi-join strategies for the picked join order. This is a step that
3448 needs to be done right after we have fixed the join order. What we do
3449 here is switch join's semi-join strategy description from backward-based
3450 to forwards based.
3451
3452 When join optimization is in progress, we re-consider semi-join
3453 strategies after we've added another table. Here's an illustration.
3454 Suppose the join optimization is underway:
3455
3456 1) ot1 it1 it2
3457 sjX -- looking at (ot1, it1, it2) join prefix, we decide
3458 to use semi-join strategy sjX.
3459
3460 2) ot1 it1 it2 ot2
3461 sjX sjY -- Having added table ot2, we now may consider
3462 another semi-join strategy and decide to use a
3463 different strategy sjY. Note that the record
3464 of sjX has remained under it2. That is
3465 necessary because we need to be able to get
3466 back to (ot1, it1, it2) join prefix.
3467 what makes things even worse is that there are cases where the choice
3468 of sjY changes the way we should access it2.
3469
3470 3) [ot1 it1 it2 ot2 ot3]
3471 sjX sjY -- This means that after join optimization is
3472 finished, semi-join info should be read
3473 right-to-left (while nearly all plan refinement
3474 functions, EXPLAIN, etc proceed from left to
3475 right)
3476
3477 This function does the needed reversal, making it possible to read the
3478 join and semi-join order from left to right.
3479*/
3480
3481void fix_semijoin_strategies_for_picked_join_order(JOIN *join)
3482{
3483 uint table_count=join->table_count;
3484 uint tablenr;
3485 table_map remaining_tables= 0;
3486 table_map handled_tabs= 0;
3487 join->sjm_lookup_tables= 0;
3488 join->sjm_scan_tables= 0;
3489 for (tablenr= table_count - 1 ; tablenr != join->const_tables - 1; tablenr--)
3490 {
3491 POSITION *pos= join->best_positions + tablenr;
3492 JOIN_TAB *s= pos->table;
3493 uint UNINIT_VAR(first); // Set by every branch except SJ_OPT_NONE which doesn't use it
3494
3495 if ((handled_tabs & s->table->map) || pos->sj_strategy == SJ_OPT_NONE)
3496 {
3497 remaining_tables |= s->table->map;
3498 continue;
3499 }
3500
3501 if (pos->sj_strategy == SJ_OPT_MATERIALIZE)
3502 {
3503 SJ_MATERIALIZATION_INFO *sjm= s->emb_sj_nest->sj_mat_info;
3504 sjm->is_used= TRUE;
3505 sjm->is_sj_scan= FALSE;
3506 memcpy((uchar*) (pos - sjm->tables + 1), (uchar*) sjm->positions,
3507 sizeof(POSITION) * sjm->tables);
3508 recalculate_prefix_record_count(join, tablenr - sjm->tables + 1,
3509 tablenr);
3510 first= tablenr - sjm->tables + 1;
3511 join->best_positions[first].n_sj_tables= sjm->tables;
3512 join->best_positions[first].sj_strategy= SJ_OPT_MATERIALIZE;
3513 join->sjm_lookup_tables|= s->table->map;
3514 }
3515 else if (pos->sj_strategy == SJ_OPT_MATERIALIZE_SCAN)
3516 {
3517 POSITION *first_inner= join->best_positions + pos->sjmat_picker.sjm_scan_last_inner;
3518 SJ_MATERIALIZATION_INFO *sjm= first_inner->table->emb_sj_nest->sj_mat_info;
3519 sjm->is_used= TRUE;
3520 sjm->is_sj_scan= TRUE;
3521 first= pos->sjmat_picker.sjm_scan_last_inner - sjm->tables + 1;
3522 memcpy((uchar*) (join->best_positions + first),
3523 (uchar*) sjm->positions, sizeof(POSITION) * sjm->tables);
3524 recalculate_prefix_record_count(join, first, first + sjm->tables);
3525 join->best_positions[first].sj_strategy= SJ_OPT_MATERIALIZE_SCAN;
3526 join->best_positions[first].n_sj_tables= sjm->tables;
3527 /*
3528 Do what advance_sj_state did: re-run best_access_path for every table
3529 in the [last_inner_table + 1; pos..) range
3530 */
3531 double prefix_rec_count;
3532 /* Get the prefix record count */
3533 if (first == join->const_tables)
3534 prefix_rec_count= 1.0;
3535 else
3536 prefix_rec_count= join->best_positions[first-1].prefix_record_count;
3537
3538 /* Add materialization record count*/
3539 prefix_rec_count *= sjm->rows;
3540
3541 uint i;
3542 table_map rem_tables= remaining_tables;
3543 for (i= tablenr; i != (first + sjm->tables - 1); i--)
3544 rem_tables |= join->best_positions[i].table->table->map;
3545
3546 for (i= first; i < first+ sjm->tables; i++)
3547 join->sjm_scan_tables |= join->best_positions[i].table->table->map;
3548
3549 POSITION dummy;
3550 join->cur_sj_inner_tables= 0;
3551 for (i= first + sjm->tables; i <= tablenr; i++)
3552 {
3553 best_access_path(join, join->best_positions[i].table, rem_tables, i,
3554 FALSE, prefix_rec_count,
3555 join->best_positions + i, &dummy);
3556 prefix_rec_count *= join->best_positions[i].records_read;
3557 rem_tables &= ~join->best_positions[i].table->table->map;
3558 }
3559 }
3560
3561 if (pos->sj_strategy == SJ_OPT_FIRST_MATCH)
3562 {
3563 first= pos->firstmatch_picker.first_firstmatch_table;
3564 join->best_positions[first].sj_strategy= SJ_OPT_FIRST_MATCH;
3565 join->best_positions[first].n_sj_tables= tablenr - first + 1;
3566 POSITION dummy; // For loose scan paths
3567 double record_count= (first== join->const_tables)? 1.0:
3568 join->best_positions[tablenr - 1].prefix_record_count;
3569
3570 table_map rem_tables= remaining_tables;
3571 uint idx;
3572 for (idx= first; idx <= tablenr; idx++)
3573 {
3574 rem_tables |= join->best_positions[idx].table->table->map;
3575 }
3576 /*
3577 Re-run best_access_path to produce best access methods that do not use
3578 join buffering
3579 */
3580 join->cur_sj_inner_tables= 0;
3581 for (idx= first; idx <= tablenr; idx++)
3582 {
3583 if (join->best_positions[idx].use_join_buffer)
3584 {
3585 best_access_path(join, join->best_positions[idx].table,
3586 rem_tables, idx, TRUE /* no jbuf */,
3587 record_count, join->best_positions + idx, &dummy);
3588 }
3589 record_count *= join->best_positions[idx].records_read;
3590 rem_tables &= ~join->best_positions[idx].table->table->map;
3591 }
3592 }
3593
3594 if (pos->sj_strategy == SJ_OPT_LOOSE_SCAN)
3595 {
3596 first= pos->loosescan_picker.first_loosescan_table;
3597 POSITION *first_pos= join->best_positions + first;
3598 POSITION loose_scan_pos; // For loose scan paths
3599 double record_count= (first== join->const_tables)? 1.0:
3600 join->best_positions[tablenr - 1].prefix_record_count;
3601
3602 table_map rem_tables= remaining_tables;
3603 uint idx;
3604 for (idx= first; idx <= tablenr; idx++)
3605 rem_tables |= join->best_positions[idx].table->table->map;
3606 /*
3607 Re-run best_access_path to produce best access methods that do not use
3608 join buffering
3609 */
3610 join->cur_sj_inner_tables= 0;
3611 for (idx= first; idx <= tablenr; idx++)
3612 {
3613 if (join->best_positions[idx].use_join_buffer || (idx == first))
3614 {
3615 best_access_path(join, join->best_positions[idx].table,
3616 rem_tables, idx, TRUE /* no jbuf */,
3617 record_count, join->best_positions + idx,
3618 &loose_scan_pos);
3619 if (idx==first)
3620 {
3621 join->best_positions[idx]= loose_scan_pos;
3622 /*
3623 If LooseScan is based on ref access (including the "degenerate"
3624 one with 0 key parts), we should use full index scan.
3625
3626 Unfortunately, lots of code assumes that if tab->type==JT_ALL &&
3627 tab->quick!=NULL, then quick select should be used. The only
3628 simple way to fix this is to remove the quick select:
3629 */
3630 if (join->best_positions[idx].key)
3631 {
3632 delete join->best_positions[idx].table->quick;
3633 join->best_positions[idx].table->quick= NULL;
3634 }
3635 }
3636 }
3637 rem_tables &= ~join->best_positions[idx].table->table->map;
3638 record_count *= join->best_positions[idx].records_read;
3639 }
3640 first_pos->sj_strategy= SJ_OPT_LOOSE_SCAN;
3641 first_pos->n_sj_tables= my_count_bits(first_pos->table->emb_sj_nest->sj_inner_tables);
3642 }
3643
3644 if (pos->sj_strategy == SJ_OPT_DUPS_WEEDOUT)
3645 {
3646 /*
3647 Duplicate Weedout starting at pos->first_dupsweedout_table, ending at
3648 this table.
3649 */
3650 first= pos->dups_weedout_picker.first_dupsweedout_table;
3651 join->best_positions[first].sj_strategy= SJ_OPT_DUPS_WEEDOUT;
3652 join->best_positions[first].n_sj_tables= tablenr - first + 1;
3653 }
3654
3655 uint i_end= first + join->best_positions[first].n_sj_tables;
3656 for (uint i= first; i < i_end; i++)
3657 {
3658 if (i != first)
3659 join->best_positions[i].sj_strategy= SJ_OPT_NONE;
3660 handled_tabs |= join->best_positions[i].table->table->map;
3661 }
3662
3663 if (tablenr != first)
3664 pos->sj_strategy= SJ_OPT_NONE;
3665 remaining_tables |= s->table->map;
3666 join->join_tab[first].sj_strategy= join->best_positions[first].sj_strategy;
3667 join->join_tab[first].n_sj_tables= join->best_positions[first].n_sj_tables;
3668 }
3669}
3670
3671
3672/*
3673 Setup semi-join materialization strategy for one semi-join nest
3674
3675 SYNOPSIS
3676
3677 setup_sj_materialization()
3678 tab The first tab in the semi-join
3679
3680 DESCRIPTION
3681 Setup execution structures for one semi-join materialization nest:
3682 - Create the materialization temporary table
3683 - If we're going to do index lookups
3684 create TABLE_REF structure to make the lookus
3685 - else (if we're going to do a full scan of the temptable)
3686 create Copy_field structures to do copying.
3687
3688 RETURN
3689 FALSE Ok
3690 TRUE Error
3691*/
3692
3693bool setup_sj_materialization_part1(JOIN_TAB *sjm_tab)
3694{
3695 JOIN_TAB *tab= sjm_tab->bush_children->start;
3696 TABLE_LIST *emb_sj_nest= tab->table->pos_in_table_list->embedding;
3697 SJ_MATERIALIZATION_INFO *sjm;
3698 THD *thd;
3699
3700 DBUG_ENTER("setup_sj_materialization");
3701
3702 /* Walk out of outer join nests until we reach the semi-join nest we're in */
3703 while (!emb_sj_nest->sj_mat_info)
3704 emb_sj_nest= emb_sj_nest->embedding;
3705
3706 sjm= emb_sj_nest->sj_mat_info;
3707 thd= tab->join->thd;
3708 /* First the calls come to the materialization function */
3709
3710 DBUG_ASSERT(sjm->is_used);
3711 /*
3712 Set up the table to write to, do as select_union::create_result_table does
3713 */
3714 sjm->sjm_table_param.init();
3715 sjm->sjm_table_param.bit_fields_as_long= TRUE;
3716 SELECT_LEX *subq_select= emb_sj_nest->sj_subq_pred->unit->first_select();
3717 const LEX_CSTRING sj_materialize_name= { STRING_WITH_LEN("sj-materialize") };
3718 List_iterator<Item> it(subq_select->item_list);
3719 Item *item;
3720 while((item= it++))
3721 {
3722 /*
3723 This semi-join replaced the subquery (subq_select) and so on
3724 re-executing it will not be prepared. To use the Items from its
3725 select list we have to prepare (fix_fields) them
3726 */
3727 if (!item->fixed && item->fix_fields(thd, it.ref()))
3728 DBUG_RETURN(TRUE);
3729 item= *(it.ref()); // it can be changed by fix_fields
3730 DBUG_ASSERT(!item->name.length || item->name.length == strlen(item->name.str));
3731 sjm->sjm_table_cols.push_back(item, thd->mem_root);
3732 }
3733
3734 sjm->sjm_table_param.field_count= subq_select->item_list.elements;
3735 sjm->sjm_table_param.force_not_null_cols= TRUE;
3736
3737 if (!(sjm->table= create_tmp_table(thd, &sjm->sjm_table_param,
3738 sjm->sjm_table_cols, (ORDER*) 0,
3739 TRUE /* distinct */,
3740 1, /*save_sum_fields*/
3741 thd->variables.option_bits | TMP_TABLE_ALL_COLUMNS,
3742 HA_POS_ERROR /*rows_limit */,
3743 &sj_materialize_name)))
3744 DBUG_RETURN(TRUE); /* purecov: inspected */
3745 sjm->table->map= emb_sj_nest->nested_join->used_tables;
3746 sjm->table->file->extra(HA_EXTRA_WRITE_CACHE);
3747 sjm->table->file->extra(HA_EXTRA_IGNORE_DUP_KEY);
3748
3749 tab->join->sj_tmp_tables.push_back(sjm->table, thd->mem_root);
3750 tab->join->sjm_info_list.push_back(sjm, thd->mem_root);
3751
3752 sjm->materialized= FALSE;
3753 sjm_tab->table= sjm->table;
3754 sjm->table->pos_in_table_list= emb_sj_nest;
3755
3756 DBUG_RETURN(FALSE);
3757}
3758
3759/**
3760 @retval
3761 FALSE ok
3762 TRUE error
3763*/
3764
3765bool setup_sj_materialization_part2(JOIN_TAB *sjm_tab)
3766{
3767 DBUG_ENTER("setup_sj_materialization_part2");
3768 JOIN_TAB *tab= sjm_tab->bush_children->start;
3769 TABLE_LIST *emb_sj_nest= tab->table->pos_in_table_list->embedding;
3770 /* Walk out of outer join nests until we reach the semi-join nest we're in */
3771 while (!emb_sj_nest->sj_mat_info)
3772 emb_sj_nest= emb_sj_nest->embedding;
3773 SJ_MATERIALIZATION_INFO *sjm= emb_sj_nest->sj_mat_info;
3774 THD *thd= tab->join->thd;
3775 uint i;
3776
3777 if (!sjm->is_sj_scan)
3778 {
3779 KEY *tmp_key; /* The only index on the temporary table. */
3780 uint tmp_key_parts; /* Number of keyparts in tmp_key. */
3781 tmp_key= sjm->table->key_info;
3782 tmp_key_parts= tmp_key->user_defined_key_parts;
3783
3784 /*
3785 Create/initialize everything we will need to index lookups into the
3786 temptable.
3787 */
3788 TABLE_REF *tab_ref;
3789 tab_ref= &sjm_tab->ref;
3790 tab_ref->key= 0; /* The only temp table index. */
3791 tab_ref->key_length= tmp_key->key_length;
3792 if (!(tab_ref->key_buff=
3793 (uchar*) thd->calloc(ALIGN_SIZE(tmp_key->key_length) * 2)) ||
3794 !(tab_ref->key_copy=
3795 (store_key**) thd->alloc((sizeof(store_key*) *
3796 (tmp_key_parts + 1)))) ||
3797 !(tab_ref->items=
3798 (Item**) thd->alloc(sizeof(Item*) * tmp_key_parts)))
3799 DBUG_RETURN(TRUE); /* purecov: inspected */
3800
3801 tab_ref->key_buff2=tab_ref->key_buff+ALIGN_SIZE(tmp_key->key_length);
3802 tab_ref->key_err=1;
3803 tab_ref->null_rejecting= 1;
3804 tab_ref->disable_cache= FALSE;
3805
3806 KEY_PART_INFO *cur_key_part= tmp_key->key_part;
3807 store_key **ref_key= tab_ref->key_copy;
3808 uchar *cur_ref_buff= tab_ref->key_buff;
3809
3810 for (i= 0; i < tmp_key_parts; i++, cur_key_part++, ref_key++)
3811 {
3812 tab_ref->items[i]= emb_sj_nest->sj_subq_pred->left_expr->element_index(i);
3813 int null_count= MY_TEST(cur_key_part->field->real_maybe_null());
3814 *ref_key= new store_key_item(thd, cur_key_part->field,
3815 /* TODO:
3816 the NULL byte is taken into account in
3817 cur_key_part->store_length, so instead of
3818 cur_ref_buff + MY_TEST(maybe_null), we could
3819 use that information instead.
3820 */
3821 cur_ref_buff + null_count,
3822 null_count ? cur_ref_buff : 0,
3823 cur_key_part->length, tab_ref->items[i],
3824 FALSE);
3825 if (!*ref_key)
3826 DBUG_RETURN(TRUE);
3827 cur_ref_buff+= cur_key_part->store_length;
3828 }
3829 *ref_key= NULL; /* End marker. */
3830
3831 /*
3832 We don't ever have guarded conditions for SJM tables, but code at SQL
3833 layer depends on cond_guards array being alloced.
3834 */
3835 if (!(tab_ref->cond_guards= (bool**) thd->calloc(sizeof(uint*)*tmp_key_parts)))
3836 {
3837 DBUG_RETURN(TRUE);
3838 }
3839
3840 tab_ref->key_err= 1;
3841 tab_ref->key_parts= tmp_key_parts;
3842 sjm->tab_ref= tab_ref;
3843
3844 /*
3845 Remove the injected semi-join IN-equalities from join_tab conds. This
3846 needs to be done because the IN-equalities refer to columns of
3847 sj-inner tables which are not available after the materialization
3848 has been finished.
3849 */
3850 for (i= 0; i < sjm->tables; i++)
3851 {
3852 if (remove_sj_conds(thd, &tab[i].select_cond) ||
3853 (tab[i].select && remove_sj_conds(thd, &tab[i].select->cond)))
3854 DBUG_RETURN(TRUE);
3855 }
3856 if (!(sjm->in_equality= create_subq_in_equalities(thd, sjm,
3857 emb_sj_nest->sj_subq_pred)))
3858 DBUG_RETURN(TRUE); /* purecov: inspected */
3859 sjm_tab->type= JT_EQ_REF;
3860 sjm_tab->select_cond= sjm->in_equality;
3861 }
3862 else
3863 {
3864 /*
3865 We'll be doing full scan of the temptable.
3866 Setup copying of temptable columns back to the record buffers
3867 for their source tables. We need this because IN-equalities
3868 refer to the original tables.
3869
3870 EXAMPLE
3871
3872 Consider the query:
3873 SELECT * FROM ot WHERE ot.col1 IN (SELECT it.col2 FROM it)
3874
3875 Suppose it's executed with SJ-Materialization-scan. We choose to do scan
3876 if we can't do the lookup, i.e. the join order is (it, ot). The plan
3877 would look as follows:
3878
3879 table access method condition
3880 it materialize+scan -
3881 ot (whatever) ot1.col1=it.col2 (C2)
3882
3883 The condition C2 refers to current row of table it. The problem is
3884 that by the time we evaluate C2, we would have finished with scanning
3885 it itself and will be scanning the temptable.
3886
3887 At the moment, our solution is to copy back: when we get the next
3888 temptable record, we copy its columns to their corresponding columns
3889 in the record buffers for the source tables.
3890 */
3891 if (!(sjm->copy_field= new Copy_field[sjm->sjm_table_cols.elements]))
3892 DBUG_RETURN(TRUE);
3893
3894 //it.rewind();
3895 Ref_ptr_array p_items= emb_sj_nest->sj_subq_pred->unit->first_select()->ref_pointer_array;
3896 for (uint i=0; i < sjm->sjm_table_cols.elements; i++)
3897 {
3898 bool dummy;
3899 Item_equal *item_eq;
3900 //Item *item= (it++)->real_item();
3901 Item *item= p_items[i]->real_item();
3902 DBUG_ASSERT(item->type() == Item::FIELD_ITEM);
3903 Field *copy_to= ((Item_field*)item)->field;
3904 /*
3905 Tricks with Item_equal are due to the following: suppose we have a
3906 query:
3907
3908 ... WHERE cond(ot.col) AND ot.col IN (SELECT it2.col FROM it1,it2
3909 WHERE it1.col= it2.col)
3910 then equality propagation will create an
3911
3912 Item_equal(it1.col, it2.col, ot.col)
3913
3914 then substitute_for_best_equal_field() will change the conditions
3915 according to the join order:
3916
3917 table | attached condition
3918 ------+--------------------
3919 it1 |
3920 it2 | it1.col=it2.col
3921 ot | cond(it1.col)
3922
3923 although we've originally had "SELECT it2.col", conditions attached
3924 to subsequent outer tables will refer to it1.col, so SJM-Scan will
3925 need to unpack data to there.
3926 That is, if an element from subquery's select list participates in
3927 equality propagation, then we need to unpack it to the first
3928 element equality propagation member that refers to table that is
3929 within the subquery.
3930 */
3931 item_eq= find_item_equal(tab->join->cond_equal, copy_to, &dummy);
3932
3933 if (item_eq)
3934 {
3935 List_iterator<Item> it(item_eq->equal_items);
3936 /* We're interested in field items only */
3937 if (item_eq->get_const())
3938 it++;
3939 Item *item;
3940 while ((item= it++))
3941 {
3942 if (!(item->used_tables() & ~emb_sj_nest->sj_inner_tables))
3943 {
3944 DBUG_ASSERT(item->real_item()->type() == Item::FIELD_ITEM);
3945 copy_to= ((Item_field *) (item->real_item()))->field;
3946 break;
3947 }
3948 }
3949 }
3950 sjm->copy_field[i].set(copy_to, sjm->table->field[i], FALSE);
3951 /* The write_set for source tables must be set up to allow the copying */
3952 bitmap_set_bit(copy_to->table->write_set, copy_to->field_index);
3953 }
3954 sjm_tab->type= JT_ALL;
3955
3956 /* Initialize full scan */
3957 sjm_tab->read_first_record= join_read_record_no_init;
3958 sjm_tab->read_record.copy_field= sjm->copy_field;
3959 sjm_tab->read_record.copy_field_end= sjm->copy_field +
3960 sjm->sjm_table_cols.elements;
3961 sjm_tab->read_record.read_record_func= rr_sequential_and_unpack;
3962 }
3963
3964 sjm_tab->bush_children->end[-1].next_select= end_sj_materialize;
3965
3966 DBUG_RETURN(FALSE);
3967}
3968
3969
3970
3971/*
3972 Create subquery IN-equalities assuming use of materialization strategy
3973
3974 SYNOPSIS
3975 create_subq_in_equalities()
3976 thd Thread handle
3977 sjm Semi-join materialization structure
3978 subq_pred The subquery predicate
3979
3980 DESCRIPTION
3981 Create subquery IN-equality predicates. That is, for a subquery
3982
3983 (oe1, oe2, ...) IN (SELECT ie1, ie2, ... FROM ...)
3984
3985 create "oe1=ie1 AND ie1=ie2 AND ..." expression, such that ie1, ie2, ..
3986 refer to the columns of the table that's used to materialize the
3987 subquery.
3988
3989 RETURN
3990 Created condition
3991*/
3992
3993static Item *create_subq_in_equalities(THD *thd, SJ_MATERIALIZATION_INFO *sjm,
3994 Item_in_subselect *subq_pred)
3995{
3996 Item *res= NULL;
3997 if (subq_pred->left_expr->cols() == 1)
3998 {
3999 if (!(res= new (thd->mem_root) Item_func_eq(thd, subq_pred->left_expr,
4000 new (thd->mem_root) Item_field(thd, sjm->table->field[0]))))
4001 return NULL; /* purecov: inspected */
4002 }
4003 else
4004 {
4005 Item *conj;
4006 for (uint i= 0; i < subq_pred->left_expr->cols(); i++)
4007 {
4008 if (!(conj= new (thd->mem_root) Item_func_eq(thd, subq_pred->left_expr->element_index(i),
4009 new (thd->mem_root) Item_field(thd, sjm->table->field[i]))) ||
4010 !(res= and_items(thd, res, conj)))
4011 return NULL; /* purecov: inspected */
4012 }
4013 }
4014 if (res->fix_fields(thd, &res))
4015 return NULL; /* purecov: inspected */
4016 return res;
4017}
4018
4019
4020/**
4021 @retval
4022 0 ok
4023 1 error
4024*/
4025
4026static bool remove_sj_conds(THD *thd, Item **tree)
4027{
4028 if (*tree)
4029 {
4030 if (is_cond_sj_in_equality(*tree))
4031 {
4032 *tree= NULL;
4033 return 0;
4034 }
4035 else if ((*tree)->type() == Item::COND_ITEM)
4036 {
4037 Item *item;
4038 List_iterator<Item> li(*(((Item_cond*)*tree)->argument_list()));
4039 while ((item= li++))
4040 {
4041 if (is_cond_sj_in_equality(item))
4042 {
4043 Item_int *tmp= new (thd->mem_root) Item_int(thd, 1);
4044 if (!tmp)
4045 return 1;
4046 li.replace(tmp);
4047 }
4048 }
4049 }
4050 }
4051 return 0;
4052}
4053
4054
4055/* Check if given Item was injected by semi-join equality */
4056static bool is_cond_sj_in_equality(Item *item)
4057{
4058 if (item->type() == Item::FUNC_ITEM &&
4059 ((Item_func*)item)->functype()== Item_func::EQ_FUNC)
4060 {
4061 Item_func_eq *item_eq= (Item_func_eq*)item;
4062 return MY_TEST(item_eq->in_equality_no != UINT_MAX);
4063 }
4064 return FALSE;
4065}
4066
4067
4068/*
4069 Create a temporary table to weed out duplicate rowid combinations
4070
4071 SYNOPSIS
4072
4073 create_sj_weedout_tmp_table()
4074 thd Thread handle
4075
4076 DESCRIPTION
4077 Create a temporary table to weed out duplicate rowid combinations. The
4078 table has a single column that is a concatenation of all rowids in the
4079 combination.
4080
4081 Depending on the needed length, there are two cases:
4082
4083 1. When the length of the column < max_key_length:
4084
4085 CREATE TABLE tmp (col VARBINARY(n) NOT NULL, UNIQUE KEY(col));
4086
4087 2. Otherwise (not a valid SQL syntax but internally supported):
4088
4089 CREATE TABLE tmp (col VARBINARY NOT NULL, UNIQUE CONSTRAINT(col));
4090
4091 The code in this function was produced by extraction of relevant parts
4092 from create_tmp_table().
4093
4094 RETURN
4095 created table
4096 NULL on error
4097*/
4098
4099bool
4100SJ_TMP_TABLE::create_sj_weedout_tmp_table(THD *thd)
4101{
4102 MEM_ROOT *mem_root_save, own_root;
4103 TABLE *table;
4104 TABLE_SHARE *share;
4105 uint temp_pool_slot=MY_BIT_NONE;
4106 char *tmpname,path[FN_REFLEN];
4107 Field **reg_field;
4108 KEY_PART_INFO *key_part_info;
4109 KEY *keyinfo;
4110 uchar *group_buff;
4111 uchar *bitmaps;
4112 uint *blob_field;
4113 bool using_unique_constraint=FALSE;
4114 bool use_packed_rows= FALSE;
4115 Field *field, *key_field;
4116 uint null_pack_length, null_count;
4117 uchar *null_flags;
4118 uchar *pos;
4119 DBUG_ENTER("create_sj_weedout_tmp_table");
4120 DBUG_ASSERT(!is_degenerate);
4121
4122 tmp_table= NULL;
4123 uint uniq_tuple_length_arg= rowid_len + null_bytes;
4124 /*
4125 STEP 1: Get temporary table name
4126 */
4127 if (use_temp_pool && !(test_flags & TEST_KEEP_TMP_TABLES))
4128 temp_pool_slot = bitmap_lock_set_next(&temp_pool);
4129
4130 if (temp_pool_slot != MY_BIT_NONE) // we got a slot
4131 sprintf(path, "%s_%lx_%i", tmp_file_prefix,
4132 current_pid, temp_pool_slot);
4133 else
4134 {
4135 /* if we run out of slots or we are not using tempool */
4136 sprintf(path,"%s%lx_%lx_%x", tmp_file_prefix,current_pid,
4137 (ulong) thd->thread_id, thd->tmp_table++);
4138 }
4139 fn_format(path, path, mysql_tmpdir, "", MY_REPLACE_EXT|MY_UNPACK_FILENAME);
4140
4141 /* STEP 2: Figure if we'll be using a key or blob+constraint */
4142 /* it always has my_charset_bin, so mbmaxlen==1 */
4143 if (uniq_tuple_length_arg >= CONVERT_IF_BIGGER_TO_BLOB)
4144 using_unique_constraint= TRUE;
4145
4146 /* STEP 3: Allocate memory for temptable description */
4147 init_sql_alloc(&own_root, "SJ_TMP_TABLE",
4148 TABLE_ALLOC_BLOCK_SIZE, 0, MYF(MY_THREAD_SPECIFIC));
4149 if (!multi_alloc_root(&own_root,
4150 &table, sizeof(*table),
4151 &share, sizeof(*share),
4152 &reg_field, sizeof(Field*) * (1+1),
4153 &blob_field, sizeof(uint)*2,
4154 &keyinfo, sizeof(*keyinfo),
4155 &key_part_info, sizeof(*key_part_info) * 2,
4156 &start_recinfo,
4157 sizeof(*recinfo)*(1*2+4),
4158 &tmpname, (uint) strlen(path)+1,
4159 &group_buff, (!using_unique_constraint ?
4160 uniq_tuple_length_arg : 0),
4161 &bitmaps, bitmap_buffer_size(1)*6,
4162 NullS))
4163 {
4164 if (temp_pool_slot != MY_BIT_NONE)
4165 bitmap_lock_clear_bit(&temp_pool, temp_pool_slot);
4166 DBUG_RETURN(TRUE);
4167 }
4168 strmov(tmpname,path);
4169
4170
4171 /* STEP 4: Create TABLE description */
4172 bzero((char*) table,sizeof(*table));
4173 bzero((char*) reg_field,sizeof(Field*)*2);
4174
4175 table->mem_root= own_root;
4176 mem_root_save= thd->mem_root;
4177 thd->mem_root= &table->mem_root;
4178
4179 table->field=reg_field;
4180 table->alias.set("weedout-tmp", sizeof("weedout-tmp")-1,
4181 table_alias_charset);
4182 table->reginfo.lock_type=TL_WRITE; /* Will be updated */
4183 table->db_stat=HA_OPEN_KEYFILE;
4184 table->map=1;
4185 table->temp_pool_slot = temp_pool_slot;
4186 table->copy_blobs= 1;
4187 table->in_use= thd;
4188 table->quick_keys.init();
4189 table->covering_keys.init();
4190 table->keys_in_use_for_query.init();
4191
4192 table->s= share;
4193 init_tmp_table_share(thd, share, "", 0, tmpname, tmpname);
4194 share->blob_field= blob_field;
4195 share->table_charset= NULL;
4196 share->primary_key= MAX_KEY; // Indicate no primary key
4197 share->keys_for_keyread.init();
4198 share->keys_in_use.init();
4199
4200 /* Create the field */
4201 {
4202 LEX_CSTRING field_name= {STRING_WITH_LEN("rowids") };
4203 /*
4204 For the sake of uniformity, always use Field_varstring (altough we could
4205 use Field_string for shorter keys)
4206 */
4207 field= new Field_varstring(uniq_tuple_length_arg, FALSE, &field_name,
4208 share, &my_charset_bin);
4209 if (!field)
4210 DBUG_RETURN(0);
4211 field->table= table;
4212 field->key_start.init(0);
4213 field->part_of_key.init(0);
4214 field->part_of_sortkey.init(0);
4215 field->unireg_check= Field::NONE;
4216 field->flags= (NOT_NULL_FLAG | BINARY_FLAG | NO_DEFAULT_VALUE_FLAG);
4217 field->reset_fields();
4218 field->init(table);
4219 field->orig_table= NULL;
4220
4221 field->field_index= 0;
4222
4223 *(reg_field++)= field;
4224 *blob_field= 0;
4225 *reg_field= 0;
4226
4227 share->fields= 1;
4228 share->blob_fields= 0;
4229 }
4230
4231 uint reclength= field->pack_length();
4232 if (using_unique_constraint)
4233 {
4234 share->db_plugin= ha_lock_engine(0, TMP_ENGINE_HTON);
4235 table->file= get_new_handler(share, &table->mem_root,
4236 share->db_type());
4237 }
4238 else
4239 {
4240 share->db_plugin= ha_lock_engine(0, heap_hton);
4241 table->file= get_new_handler(share, &table->mem_root,
4242 share->db_type());
4243 DBUG_ASSERT(!table->file || uniq_tuple_length_arg <= table->file->max_key_length());
4244 }
4245 if (!table->file)
4246 goto err;
4247
4248 if (table->file->set_ha_share_ref(&share->ha_share))
4249 {
4250 delete table->file;
4251 goto err;
4252 }
4253
4254 null_count=1;
4255
4256 null_pack_length= 1;
4257 reclength += null_pack_length;
4258
4259 share->reclength= reclength;
4260 {
4261 uint alloc_length=ALIGN_SIZE(share->reclength + MI_UNIQUE_HASH_LENGTH+1);
4262 share->rec_buff_length= alloc_length;
4263 if (!(table->record[0]= (uchar*)
4264 alloc_root(&table->mem_root, alloc_length*3)))
4265 goto err;
4266 table->record[1]= table->record[0]+alloc_length;
4267 share->default_values= table->record[1]+alloc_length;
4268 }
4269 setup_tmp_table_column_bitmaps(table, bitmaps);
4270
4271 recinfo= start_recinfo;
4272 null_flags=(uchar*) table->record[0];
4273 pos=table->record[0]+ null_pack_length;
4274 if (null_pack_length)
4275 {
4276 bzero((uchar*) recinfo,sizeof(*recinfo));
4277 recinfo->type=FIELD_NORMAL;
4278 recinfo->length=null_pack_length;
4279 recinfo++;
4280 bfill(null_flags,null_pack_length,255); // Set null fields
4281
4282 table->null_flags= (uchar*) table->record[0];
4283 share->null_fields= null_count;
4284 share->null_bytes= null_pack_length;
4285 }
4286 null_count=1;
4287
4288 {
4289 //Field *field= *reg_field;
4290 uint length;
4291 bzero((uchar*) recinfo,sizeof(*recinfo));
4292 field->move_field(pos,(uchar*) 0,0);
4293
4294 field->reset();
4295 /*
4296 Test if there is a default field value. The test for ->ptr is to skip
4297 'offset' fields generated by initalize_tables
4298 */
4299 // Initialize the table field:
4300 bzero(field->ptr, field->pack_length());
4301
4302 length=field->pack_length();
4303 pos+= length;
4304
4305 /* Make entry for create table */
4306 recinfo->length=length;
4307 if (field->flags & BLOB_FLAG)
4308 recinfo->type= FIELD_BLOB;
4309 else if (use_packed_rows &&
4310 field->real_type() == MYSQL_TYPE_STRING &&
4311 length >= MIN_STRING_LENGTH_TO_PACK_ROWS)
4312 recinfo->type=FIELD_SKIP_ENDSPACE;
4313 else
4314 recinfo->type=FIELD_NORMAL;
4315
4316 field->set_table_name(&table->alias);
4317 }
4318
4319 if (thd->variables.tmp_memory_table_size == ~ (ulonglong) 0) // No limit
4320 share->max_rows= ~(ha_rows) 0;
4321 else
4322 share->max_rows= (ha_rows) (((share->db_type() == heap_hton) ?
4323 MY_MIN(thd->variables.tmp_memory_table_size,
4324 thd->variables.max_heap_table_size) :
4325 thd->variables.tmp_memory_table_size) /
4326 share->reclength);
4327 set_if_bigger(share->max_rows,1); // For dummy start options
4328
4329
4330 //// keyinfo= param->keyinfo;
4331 if (TRUE)
4332 {
4333 DBUG_PRINT("info",("Creating group key in temporary table"));
4334 share->keys=1;
4335 share->uniques= MY_TEST(using_unique_constraint);
4336 table->key_info=keyinfo;
4337 keyinfo->key_part=key_part_info;
4338 keyinfo->flags=HA_NOSAME;
4339 keyinfo->usable_key_parts= keyinfo->user_defined_key_parts= 1;
4340 keyinfo->key_length=0;
4341 keyinfo->rec_per_key=0;
4342 keyinfo->algorithm= HA_KEY_ALG_UNDEF;
4343 keyinfo->name= weedout_key;
4344 {
4345 key_part_info->null_bit=0;
4346 key_part_info->field= field;
4347 key_part_info->offset= field->offset(table->record[0]);
4348 key_part_info->length= (uint16) field->key_length();
4349 key_part_info->type= (uint8) field->key_type();
4350 key_part_info->key_type = FIELDFLAG_BINARY;
4351 if (!using_unique_constraint)
4352 {
4353 if (!(key_field= field->new_key_field(thd->mem_root, table,
4354 group_buff,
4355 key_part_info->length,
4356 field->null_ptr,
4357 field->null_bit)))
4358 goto err;
4359 }
4360 keyinfo->key_length+= key_part_info->length;
4361 }
4362 }
4363
4364 if (unlikely(thd->is_fatal_error)) // If end of memory
4365 goto err;
4366 share->db_record_offset= 1;
4367 table->no_rows= 1; // We don't need the data
4368
4369 // recinfo must point after last field
4370 recinfo++;
4371 if (share->db_type() == TMP_ENGINE_HTON)
4372 {
4373 if (unlikely(create_internal_tmp_table(table, keyinfo, start_recinfo,
4374 &recinfo, 0)))
4375 goto err;
4376 }
4377 if (unlikely(open_tmp_table(table)))
4378 goto err;
4379
4380 thd->mem_root= mem_root_save;
4381 tmp_table= table;
4382 DBUG_RETURN(FALSE);
4383
4384err:
4385 thd->mem_root= mem_root_save;
4386 free_tmp_table(thd,table); /* purecov: inspected */
4387 if (temp_pool_slot != MY_BIT_NONE)
4388 bitmap_lock_clear_bit(&temp_pool, temp_pool_slot);
4389 DBUG_RETURN(TRUE); /* purecov: inspected */
4390}
4391
4392
4393/*
4394 SemiJoinDuplicateElimination: Reset the temporary table
4395*/
4396
4397int SJ_TMP_TABLE::sj_weedout_delete_rows()
4398{
4399 DBUG_ENTER("SJ_TMP_TABLE::sj_weedout_delete_rows");
4400 if (tmp_table)
4401 {
4402 int rc= tmp_table->file->ha_delete_all_rows();
4403 DBUG_RETURN(rc);
4404 }
4405 have_degenerate_row= FALSE;
4406 DBUG_RETURN(0);
4407}
4408
4409
4410/*
4411 SemiJoinDuplicateElimination: Weed out duplicate row combinations
4412
4413 SYNPOSIS
4414 sj_weedout_check_row()
4415 thd Thread handle
4416
4417 DESCRIPTION
4418 Try storing current record combination of outer tables (i.e. their
4419 rowids) in the temporary table. This records the fact that we've seen
4420 this record combination and also tells us if we've seen it before.
4421
4422 RETURN
4423 -1 Error
4424 1 The row combination is a duplicate (discard it)
4425 0 The row combination is not a duplicate (continue)
4426*/
4427
4428int SJ_TMP_TABLE::sj_weedout_check_row(THD *thd)
4429{
4430 int error;
4431 SJ_TMP_TABLE::TAB *tab= tabs;
4432 SJ_TMP_TABLE::TAB *tab_end= tabs_end;
4433 uchar *ptr;
4434 uchar *nulls_ptr;
4435
4436 DBUG_ENTER("SJ_TMP_TABLE::sj_weedout_check_row");
4437
4438 if (is_degenerate)
4439 {
4440 if (have_degenerate_row)
4441 DBUG_RETURN(1);
4442
4443 have_degenerate_row= TRUE;
4444 DBUG_RETURN(0);
4445 }
4446
4447 ptr= tmp_table->record[0] + 1;
4448
4449 /* Put the the rowids tuple into table->record[0]: */
4450
4451 // 1. Store the length
4452 if (((Field_varstring*)(tmp_table->field[0]))->length_bytes == 1)
4453 {
4454 *ptr= (uchar)(rowid_len + null_bytes);
4455 ptr++;
4456 }
4457 else
4458 {
4459 int2store(ptr, rowid_len + null_bytes);
4460 ptr += 2;
4461 }
4462
4463 nulls_ptr= ptr;
4464 // 2. Zero the null bytes
4465 if (null_bytes)
4466 {
4467 bzero(ptr, null_bytes);
4468 ptr += null_bytes;
4469 }
4470
4471 // 3. Put the rowids
4472 for (uint i=0; tab != tab_end; tab++, i++)
4473 {
4474 handler *h= tab->join_tab->table->file;
4475 if (tab->join_tab->table->maybe_null && tab->join_tab->table->null_row)
4476 {
4477 /* It's a NULL-complemented row */
4478 *(nulls_ptr + tab->null_byte) |= tab->null_bit;
4479 bzero(ptr + tab->rowid_offset, h->ref_length);
4480 }
4481 else
4482 {
4483 /* Copy the rowid value */
4484 memcpy(ptr + tab->rowid_offset, h->ref, h->ref_length);
4485 }
4486 }
4487
4488 error= tmp_table->file->ha_write_tmp_row(tmp_table->record[0]);
4489 if (unlikely(error))
4490 {
4491 /* create_internal_tmp_table_from_heap will generate error if needed */
4492 if (!tmp_table->file->is_fatal_error(error, HA_CHECK_DUP))
4493 DBUG_RETURN(1); /* Duplicate */
4494
4495 bool is_duplicate;
4496 if (create_internal_tmp_table_from_heap(thd, tmp_table, start_recinfo,
4497 &recinfo, error, 1, &is_duplicate))
4498 DBUG_RETURN(-1);
4499 if (is_duplicate)
4500 DBUG_RETURN(1);
4501 }
4502 DBUG_RETURN(0);
4503}
4504
4505
4506int init_dups_weedout(JOIN *join, uint first_table, int first_fanout_table, uint n_tables)
4507{
4508 THD *thd= join->thd;
4509 DBUG_ENTER("init_dups_weedout");
4510 SJ_TMP_TABLE::TAB sjtabs[MAX_TABLES];
4511 SJ_TMP_TABLE::TAB *last_tab= sjtabs;
4512 uint jt_rowid_offset= 0; // # tuple bytes are already occupied (w/o NULL bytes)
4513 uint jt_null_bits= 0; // # null bits in tuple bytes
4514 /*
4515 Walk through the range and remember
4516 - tables that need their rowids to be put into temptable
4517 - the last outer table
4518 */
4519 for (JOIN_TAB *j=join->join_tab + first_table;
4520 j < join->join_tab + first_table + n_tables; j++)
4521 {
4522 if (sj_table_is_included(join, j))
4523 {
4524 last_tab->join_tab= j;
4525 last_tab->rowid_offset= jt_rowid_offset;
4526 jt_rowid_offset += j->table->file->ref_length;
4527 if (j->table->maybe_null)
4528 {
4529 last_tab->null_byte= jt_null_bits / 8;
4530 last_tab->null_bit= jt_null_bits++;
4531 }
4532 last_tab++;
4533 j->table->prepare_for_position();
4534 j->keep_current_rowid= TRUE;
4535 }
4536 }
4537
4538 SJ_TMP_TABLE *sjtbl;
4539 if (jt_rowid_offset) /* Temptable has at least one rowid */
4540 {
4541 size_t tabs_size= (last_tab - sjtabs) * sizeof(SJ_TMP_TABLE::TAB);
4542 if (!(sjtbl= (SJ_TMP_TABLE*)thd->alloc(sizeof(SJ_TMP_TABLE))) ||
4543 !(sjtbl->tabs= (SJ_TMP_TABLE::TAB*) thd->alloc(tabs_size)))
4544 DBUG_RETURN(TRUE); /* purecov: inspected */
4545 memcpy(sjtbl->tabs, sjtabs, tabs_size);
4546 sjtbl->is_degenerate= FALSE;
4547 sjtbl->tabs_end= sjtbl->tabs + (last_tab - sjtabs);
4548 sjtbl->rowid_len= jt_rowid_offset;
4549 sjtbl->null_bits= jt_null_bits;
4550 sjtbl->null_bytes= (jt_null_bits + 7)/8;
4551 if (sjtbl->create_sj_weedout_tmp_table(thd))
4552 DBUG_RETURN(TRUE);
4553 join->sj_tmp_tables.push_back(sjtbl->tmp_table, thd->mem_root);
4554 }
4555 else
4556 {
4557 /*
4558 This is a special case where the entire subquery predicate does
4559 not depend on anything at all, ie this is
4560 WHERE const IN (uncorrelated select)
4561 */
4562 if (!(sjtbl= (SJ_TMP_TABLE*)thd->alloc(sizeof(SJ_TMP_TABLE))))
4563 DBUG_RETURN(TRUE); /* purecov: inspected */
4564 sjtbl->tmp_table= NULL;
4565 sjtbl->is_degenerate= TRUE;
4566 sjtbl->have_degenerate_row= FALSE;
4567 }
4568
4569 sjtbl->next_flush_table= join->join_tab[first_table].flush_weedout_table;
4570 join->join_tab[first_table].flush_weedout_table= sjtbl;
4571 join->join_tab[first_fanout_table].first_weedout_table= sjtbl;
4572 join->join_tab[first_table + n_tables - 1].check_weed_out_table= sjtbl;
4573 DBUG_RETURN(0);
4574}
4575
4576
4577/*
4578 @brief
4579 Set up semi-join Loose Scan strategy for execution
4580
4581 @detail
4582 Other strategies are done in setup_semijoin_dups_elimination(),
4583 however, we need to set up Loose Scan earlier, before make_join_select is
4584 called. This is to prevent make_join_select() from switching full index
4585 scans into quick selects (which will break Loose Scan access).
4586
4587 @return
4588 0 OK
4589 1 Error
4590*/
4591
4592int setup_semijoin_loosescan(JOIN *join)
4593{
4594 uint i;
4595 DBUG_ENTER("setup_semijoin_loosescan");
4596
4597 POSITION *pos= join->best_positions + join->const_tables;
4598 for (i= join->const_tables ; i < join->top_join_tab_count; )
4599 {
4600 JOIN_TAB *tab=join->join_tab + i;
4601 switch (pos->sj_strategy) {
4602 case SJ_OPT_MATERIALIZE:
4603 case SJ_OPT_MATERIALIZE_SCAN:
4604 i+= 1; /* join tabs are embedded in the nest */
4605 pos += pos->n_sj_tables;
4606 break;
4607 case SJ_OPT_LOOSE_SCAN:
4608 {
4609 /* We jump from the last table to the first one */
4610 tab->loosescan_match_tab= tab + pos->n_sj_tables - 1;
4611
4612 /* LooseScan requires records to be produced in order */
4613 if (tab->select && tab->select->quick)
4614 tab->select->quick->need_sorted_output();
4615
4616 for (uint j= i; j < i + pos->n_sj_tables; j++)
4617 join->join_tab[j].inside_loosescan_range= TRUE;
4618
4619 /* Calculate key length */
4620 uint keylen= 0;
4621 uint keyno= pos->loosescan_picker.loosescan_key;
4622 for (uint kp=0; kp < pos->loosescan_picker.loosescan_parts; kp++)
4623 keylen += tab->table->key_info[keyno].key_part[kp].store_length;
4624
4625 tab->loosescan_key= keyno;
4626 tab->loosescan_key_len= keylen;
4627 if (pos->n_sj_tables > 1)
4628 tab[pos->n_sj_tables - 1].do_firstmatch= tab;
4629 i+= pos->n_sj_tables;
4630 pos+= pos->n_sj_tables;
4631 break;
4632 }
4633 default:
4634 {
4635 i++;
4636 pos++;
4637 break;
4638 }
4639 }
4640 }
4641 DBUG_RETURN(FALSE);
4642}
4643
4644
4645/*
4646 Setup the strategies to eliminate semi-join duplicates.
4647
4648 SYNOPSIS
4649 setup_semijoin_dups_elimination()
4650 join Join to process
4651 options Join options (needed to see if join buffering will be
4652 used or not)
4653 no_jbuf_after Another bit of information re where join buffering will
4654 be used.
4655
4656 DESCRIPTION
4657 Setup the strategies to eliminate semi-join duplicates. ATM there are 4
4658 strategies:
4659
4660 1. DuplicateWeedout (use of temptable to remove duplicates based on rowids
4661 of row combinations)
4662 2. FirstMatch (pick only the 1st matching row combination of inner tables)
4663 3. LooseScan (scanning the sj-inner table in a way that groups duplicates
4664 together and picking the 1st one)
4665 4. SJ-Materialization.
4666
4667 The join order has "duplicate-generating ranges", and every range is
4668 served by one strategy or a combination of FirstMatch with with some
4669 other strategy.
4670
4671 "Duplicate-generating range" is defined as a range within the join order
4672 that contains all of the inner tables of a semi-join. All ranges must be
4673 disjoint, if tables of several semi-joins are interleaved, then the ranges
4674 are joined together, which is equivalent to converting
4675 SELECT ... WHERE oe1 IN (SELECT ie1 ...) AND oe2 IN (SELECT ie2 )
4676 to
4677 SELECT ... WHERE (oe1, oe2) IN (SELECT ie1, ie2 ... ...)
4678 .
4679
4680 Applicability conditions are as follows:
4681
4682 DuplicateWeedout strategy
4683 ~~~~~~~~~~~~~~~~~~~~~~~~~
4684
4685 (ot|nt)* [ it ((it|ot|nt)* (it|ot))] (nt)*
4686 +------+ +=========================+ +---+
4687 (1) (2) (3)
4688
4689 (1) - Prefix of OuterTables (those that participate in
4690 IN-equality and/or are correlated with subquery) and outer
4691 Non-correlated tables.
4692 (2) - The handled range. The range starts with the first sj-inner
4693 table, and covers all sj-inner and outer tables
4694 Within the range, Inner, Outer, outer non-correlated tables
4695 may follow in any order.
4696 (3) - The suffix of outer non-correlated tables.
4697
4698 FirstMatch strategy
4699 ~~~~~~~~~~~~~~~~~~~
4700
4701 (ot|nt)* [ it ((it|nt)* it) ] (nt)*
4702 +------+ +==================+ +---+
4703 (1) (2) (3)
4704
4705 (1) - Prefix of outer and non-correlated tables
4706 (2) - The handled range, which may contain only inner and
4707 non-correlated tables.
4708 (3) - The suffix of outer non-correlated tables.
4709
4710 LooseScan strategy
4711 ~~~~~~~~~~~~~~~~~~
4712
4713 (ot|ct|nt) [ loosescan_tbl (ot|nt|it)* it ] (ot|nt)*
4714 +--------+ +===========+ +=============+ +------+
4715 (1) (2) (3) (4)
4716
4717 (1) - Prefix that may contain any outer tables. The prefix must contain
4718 all the non-trivially correlated outer tables. (non-trivially means
4719 that the correlation is not just through the IN-equality).
4720
4721 (2) - Inner table for which the LooseScan scan is performed.
4722
4723 (3) - The remainder of the duplicate-generating range. It is served by
4724 application of FirstMatch strategy, with the exception that
4725 outer IN-correlated tables are considered to be non-correlated.
4726
4727 (4) - THe suffix of outer and outer non-correlated tables.
4728
4729
4730 The choice between the strategies is made by the join optimizer (see
4731 advance_sj_state() and fix_semijoin_strategies_for_picked_join_order()).
4732 This function sets up all fields/structures/etc needed for execution except
4733 for setup/initialization of semi-join materialization which is done in
4734 setup_sj_materialization() (todo: can't we move that to here also?)
4735
4736 RETURN
4737 FALSE OK
4738 TRUE Out of memory error
4739*/
4740
4741int setup_semijoin_dups_elimination(JOIN *join, ulonglong options,
4742 uint no_jbuf_after)
4743{
4744 uint i;
4745 DBUG_ENTER("setup_semijoin_dups_elimination");
4746
4747 join->complex_firstmatch_tables= table_map(0);
4748
4749 POSITION *pos= join->best_positions + join->const_tables;
4750 for (i= join->const_tables ; i < join->top_join_tab_count; )
4751 {
4752 JOIN_TAB *tab=join->join_tab + i;
4753 switch (pos->sj_strategy) {
4754 case SJ_OPT_MATERIALIZE:
4755 case SJ_OPT_MATERIALIZE_SCAN:
4756 /* Do nothing */
4757 i+= 1;// It used to be pos->n_sj_tables, but now they are embedded in a nest
4758 pos += pos->n_sj_tables;
4759 break;
4760 case SJ_OPT_LOOSE_SCAN:
4761 {
4762 /* Setup already handled by setup_semijoin_loosescan */
4763 i+= pos->n_sj_tables;
4764 pos+= pos->n_sj_tables;
4765 break;
4766 }
4767 case SJ_OPT_DUPS_WEEDOUT:
4768 {
4769 /*
4770 Check for join buffering. If there is one, move the first table
4771 forwards, but do not destroy other duplicate elimination methods.
4772 */
4773 uint first_table= i;
4774
4775 uint join_cache_level= join->thd->variables.join_cache_level;
4776 for (uint j= i; j < i + pos->n_sj_tables; j++)
4777 {
4778 /*
4779 When we'll properly take join buffering into account during
4780 join optimization, the below check should be changed to
4781 "if (join->best_positions[j].use_join_buffer &&
4782 j <= no_jbuf_after)".
4783 For now, use a rough criteria:
4784 */
4785 JOIN_TAB *js_tab=join->join_tab + j;
4786 if (j != join->const_tables && js_tab->use_quick != 2 &&
4787 j <= no_jbuf_after &&
4788 ((js_tab->type == JT_ALL && join_cache_level != 0) ||
4789 (join_cache_level > 2 && (js_tab->type == JT_REF ||
4790 js_tab->type == JT_EQ_REF))))
4791 {
4792 /* Looks like we'll be using join buffer */
4793 first_table= join->const_tables;
4794 /*
4795 Make sure that possible sorting of rows from the head table
4796 is not to be employed.
4797 */
4798 if (join->get_sort_by_join_tab())
4799 {
4800 join->simple_order= 0;
4801 join->simple_group= 0;
4802 join->need_tmp= join->test_if_need_tmp_table();
4803 }
4804 break;
4805 }
4806 }
4807
4808 init_dups_weedout(join, first_table, i, i + pos->n_sj_tables - first_table);
4809 i+= pos->n_sj_tables;
4810 pos+= pos->n_sj_tables;
4811 break;
4812 }
4813 case SJ_OPT_FIRST_MATCH:
4814 {
4815 JOIN_TAB *j;
4816 JOIN_TAB *jump_to= tab-1;
4817
4818 bool complex_range= FALSE;
4819 table_map tables_in_range= table_map(0);
4820
4821 for (j= tab; j != tab + pos->n_sj_tables; j++)
4822 {
4823 tables_in_range |= j->table->map;
4824 if (!j->emb_sj_nest)
4825 {
4826 /*
4827 Got a table that's not within any semi-join nest. This is a case
4828 like this:
4829
4830 SELECT * FROM ot1, nt1 WHERE ot1.col IN (SELECT expr FROM it1, it2)
4831
4832 with a join order of
4833
4834 +----- FirstMatch range ----+
4835 | |
4836 ot1 it1 nt1 nt2 it2 it3 ...
4837 | ^
4838 | +-------- 'j' points here
4839 +------------- SJ_OPT_FIRST_MATCH was set for this table as
4840 it's the first one that produces duplicates
4841
4842 */
4843 DBUG_ASSERT(j != tab); /* table ntX must have an itX before it */
4844
4845 /*
4846 If the table right before us is an inner table (like it1 in the
4847 picture), it should be set to jump back to previous outer-table
4848 */
4849 if (j[-1].emb_sj_nest)
4850 j[-1].do_firstmatch= jump_to;
4851
4852 jump_to= j; /* Jump back to us */
4853 complex_range= TRUE;
4854 }
4855 else
4856 {
4857 j->first_sj_inner_tab= tab;
4858 j->last_sj_inner_tab= tab + pos->n_sj_tables - 1;
4859 }
4860 }
4861 j[-1].do_firstmatch= jump_to;
4862 i+= pos->n_sj_tables;
4863 pos+= pos->n_sj_tables;
4864
4865 if (complex_range)
4866 join->complex_firstmatch_tables|= tables_in_range;
4867 break;
4868 }
4869 case SJ_OPT_NONE:
4870 i++;
4871 pos++;
4872 break;
4873 }
4874 }
4875 DBUG_RETURN(FALSE);
4876}
4877
4878
4879/*
4880 Destroy all temporary tables created by NL-semijoin runtime
4881*/
4882
4883void destroy_sj_tmp_tables(JOIN *join)
4884{
4885 List_iterator<TABLE> it(join->sj_tmp_tables);
4886 TABLE *table;
4887 while ((table= it++))
4888 {
4889 /*
4890 SJ-Materialization tables are initialized for either sequential reading
4891 or index lookup, DuplicateWeedout tables are not initialized for read
4892 (we only write to them), so need to call ha_index_or_rnd_end.
4893 */
4894 table->file->ha_index_or_rnd_end();
4895 free_tmp_table(join->thd, table);
4896 }
4897 join->sj_tmp_tables.empty();
4898 join->sjm_info_list.empty();
4899}
4900
4901
4902/*
4903 Remove all records from all temp tables used by NL-semijoin runtime
4904
4905 SYNOPSIS
4906 clear_sj_tmp_tables()
4907 join The join to remove tables for
4908
4909 DESCRIPTION
4910 Remove all records from all temp tables used by NL-semijoin runtime. This
4911 must be done before every join re-execution.
4912*/
4913
4914int clear_sj_tmp_tables(JOIN *join)
4915{
4916 int res;
4917 List_iterator<TABLE> it(join->sj_tmp_tables);
4918 TABLE *table;
4919 while ((table= it++))
4920 {
4921 if ((res= table->file->ha_delete_all_rows()))
4922 return res; /* purecov: inspected */
4923 }
4924
4925 SJ_MATERIALIZATION_INFO *sjm;
4926 List_iterator<SJ_MATERIALIZATION_INFO> it2(join->sjm_info_list);
4927 while ((sjm= it2++))
4928 {
4929 sjm->materialized= FALSE;
4930 }
4931 return 0;
4932}
4933
4934
4935/*
4936 Check if the table's rowid is included in the temptable
4937
4938 SYNOPSIS
4939 sj_table_is_included()
4940 join The join
4941 join_tab The table to be checked
4942
4943 DESCRIPTION
4944 SemiJoinDuplicateElimination: check the table's rowid should be included
4945 in the temptable. This is so if
4946
4947 1. The table is not embedded within some semi-join nest
4948 2. The has been pulled out of a semi-join nest, or
4949
4950 3. The table is functionally dependent on some previous table
4951
4952 [4. This is also true for constant tables that can't be
4953 NULL-complemented but this function is not called for such tables]
4954
4955 RETURN
4956 TRUE - Include table's rowid
4957 FALSE - Don't
4958*/
4959
4960static bool sj_table_is_included(JOIN *join, JOIN_TAB *join_tab)
4961{
4962 if (join_tab->emb_sj_nest)
4963 return FALSE;
4964
4965 /* Check if this table is functionally dependent on the tables that
4966 are within the same outer join nest
4967 */
4968 TABLE_LIST *embedding= join_tab->table->pos_in_table_list->embedding;
4969 if (join_tab->type == JT_EQ_REF)
4970 {
4971 table_map depends_on= 0;
4972 uint idx;
4973
4974 for (uint kp= 0; kp < join_tab->ref.key_parts; kp++)
4975 depends_on |= join_tab->ref.items[kp]->used_tables();
4976
4977 Table_map_iterator it(depends_on & ~PSEUDO_TABLE_BITS);
4978 while ((idx= it.next_bit())!=Table_map_iterator::BITMAP_END)
4979 {
4980 JOIN_TAB *ref_tab= join->map2table[idx];
4981 if (embedding != ref_tab->table->pos_in_table_list->embedding)
4982 return TRUE;
4983 }
4984 /* Ok, functionally dependent */
4985 return FALSE;
4986 }
4987 /* Not functionally dependent => need to include*/
4988 return TRUE;
4989}
4990
4991
4992/*
4993 Index lookup-based subquery: save some flags for EXPLAIN output
4994
4995 SYNOPSIS
4996 save_index_subquery_explain_info()
4997 join_tab Subquery's join tab (there is only one as index lookup is
4998 only used for subqueries that are single-table SELECTs)
4999 where Subquery's WHERE clause
5000
5001 DESCRIPTION
5002 For index lookup-based subquery (i.e. one executed with
5003 subselect_uniquesubquery_engine or subselect_indexsubquery_engine),
5004 check its EXPLAIN output row should contain
5005 "Using index" (TAB_INFO_FULL_SCAN_ON_NULL)
5006 "Using Where" (TAB_INFO_USING_WHERE)
5007 "Full scan on NULL key" (TAB_INFO_FULL_SCAN_ON_NULL)
5008 and set appropriate flags in join_tab->packed_info.
5009*/
5010
5011static void save_index_subquery_explain_info(JOIN_TAB *join_tab, Item* where)
5012{
5013 join_tab->packed_info= TAB_INFO_HAVE_VALUE;
5014 if (join_tab->table->covering_keys.is_set(join_tab->ref.key))
5015 join_tab->packed_info |= TAB_INFO_USING_INDEX;
5016 if (where)
5017 join_tab->packed_info |= TAB_INFO_USING_WHERE;
5018 for (uint i = 0; i < join_tab->ref.key_parts; i++)
5019 {
5020 if (join_tab->ref.cond_guards[i])
5021 {
5022 join_tab->packed_info |= TAB_INFO_FULL_SCAN_ON_NULL;
5023 break;
5024 }
5025 }
5026}
5027
5028
5029/*
5030 Check if the join can be rewritten to [unique_]indexsubquery_engine
5031
5032 DESCRIPTION
5033 Check if the join can be changed into [unique_]indexsubquery_engine.
5034
5035 The check is done after join optimization, the idea is that if the join
5036 has only one table and uses a [eq_]ref access generated from subselect's
5037 IN-equality then we replace it with a subselect_indexsubquery_engine or a
5038 subselect_uniquesubquery_engine.
5039
5040 RETURN
5041 0 - Ok, rewrite done (stop join optimization and return)
5042 1 - Fatal error (stop join optimization and return)
5043 -1 - No rewrite performed, continue with join optimization
5044*/
5045
5046int rewrite_to_index_subquery_engine(JOIN *join)
5047{
5048 THD *thd= join->thd;
5049 JOIN_TAB* join_tab=join->join_tab;
5050 SELECT_LEX_UNIT *unit= join->unit;
5051 DBUG_ENTER("rewrite_to_index_subquery_engine");
5052
5053 /*
5054 is this simple IN subquery?
5055 */
5056 /* TODO: In order to use these more efficient subquery engines in more cases,
5057 the following problems need to be solved:
5058 - the code that removes GROUP BY (group_list), also adds an ORDER BY
5059 (order), thus GROUP BY queries (almost?) never pass through this branch.
5060 Solution: remove the test below '!join->order', because we remove the
5061 ORDER clase for subqueries anyway.
5062 - in order to set a more efficient engine, the optimizer needs to both
5063 decide to remove GROUP BY, *and* select one of the JT_[EQ_]REF[_OR_NULL]
5064 access methods, *and* loose scan should be more expensive or
5065 inapliccable. When is that possible?
5066 - Consider expanding the applicability of this rewrite for loose scan
5067 for group by queries.
5068 */
5069 if (!join->group_list && !join->order &&
5070 join->unit->item &&
5071 join->unit->item->substype() == Item_subselect::IN_SUBS &&
5072 join->table_count == 1 && join->conds &&
5073 !join->unit->is_unit_op())
5074 {
5075 if (!join->having)
5076 {
5077 Item *where= join->conds;
5078 if (join_tab[0].type == JT_EQ_REF &&
5079 join_tab[0].ref.items[0]->name.str == in_left_expr_name.str)
5080 {
5081 remove_subq_pushed_predicates(join, &where);
5082 save_index_subquery_explain_info(join_tab, where);
5083 join_tab[0].type= JT_UNIQUE_SUBQUERY;
5084 join->error= 0;
5085 DBUG_RETURN(unit->item->
5086 change_engine(new
5087 subselect_uniquesubquery_engine(thd,
5088 join_tab,
5089 unit->item,
5090 where)));
5091 }
5092 else if (join_tab[0].type == JT_REF &&
5093 join_tab[0].ref.items[0]->name.str == in_left_expr_name.str)
5094 {
5095 remove_subq_pushed_predicates(join, &where);
5096 save_index_subquery_explain_info(join_tab, where);
5097 join_tab[0].type= JT_INDEX_SUBQUERY;
5098 join->error= 0;
5099 DBUG_RETURN(unit->item->
5100 change_engine(new
5101 subselect_indexsubquery_engine(thd,
5102 join_tab,
5103 unit->item,
5104 where,
5105 NULL,
5106 0)));
5107 }
5108 } else if (join_tab[0].type == JT_REF_OR_NULL &&
5109 join_tab[0].ref.items[0]->name.str == in_left_expr_name.str &&
5110 join->having->name.str == in_having_cond.str)
5111 {
5112 join_tab[0].type= JT_INDEX_SUBQUERY;
5113 join->error= 0;
5114 join->conds= remove_additional_cond(join->conds);
5115 save_index_subquery_explain_info(join_tab, join->conds);
5116 DBUG_RETURN(unit->item->
5117 change_engine(new subselect_indexsubquery_engine(thd,
5118 join_tab,
5119 unit->item,
5120 join->conds,
5121 join->having,
5122 1)));
5123 }
5124 }
5125
5126 DBUG_RETURN(-1); /* Haven't done the rewrite */
5127}
5128
5129
5130/**
5131 Remove additional condition inserted by IN/ALL/ANY transformation.
5132
5133 @param conds condition for processing
5134
5135 @return
5136 new conditions
5137*/
5138
5139static Item *remove_additional_cond(Item* conds)
5140{
5141 if (conds->name.str == in_additional_cond.str)
5142 return 0;
5143 if (conds->type() == Item::COND_ITEM)
5144 {
5145 Item_cond *cnd= (Item_cond*) conds;
5146 List_iterator<Item> li(*(cnd->argument_list()));
5147 Item *item;
5148 while ((item= li++))
5149 {
5150 if (item->name.str == in_additional_cond.str)
5151 {
5152 li.remove();
5153 if (cnd->argument_list()->elements == 1)
5154 return cnd->argument_list()->head();
5155 return conds;
5156 }
5157 }
5158 }
5159 return conds;
5160}
5161
5162
5163/*
5164 Remove the predicates pushed down into the subquery
5165
5166 SYNOPSIS
5167 remove_subq_pushed_predicates()
5168 where IN Must be NULL
5169 OUT The remaining WHERE condition, or NULL
5170
5171 DESCRIPTION
5172 Given that this join will be executed using (unique|index)_subquery,
5173 without "checking NULL", remove the predicates that were pushed down
5174 into the subquery.
5175
5176 If the subquery compares scalar values, we can remove the condition that
5177 was wrapped into trig_cond (it will be checked when needed by the subquery
5178 engine)
5179
5180 If the subquery compares row values, we need to keep the wrapped
5181 equalities in the WHERE clause: when the left (outer) tuple has both NULL
5182 and non-NULL values, we'll do a full table scan and will rely on the
5183 equalities corresponding to non-NULL parts of left tuple to filter out
5184 non-matching records.
5185
5186 TODO: We can remove the equalities that will be guaranteed to be true by the
5187 fact that subquery engine will be using index lookup. This must be done only
5188 for cases where there are no conversion errors of significance, e.g. 257
5189 that is searched in a byte. But this requires homogenization of the return
5190 codes of all Field*::store() methods.
5191*/
5192
5193static void remove_subq_pushed_predicates(JOIN *join, Item **where)
5194{
5195 if (join->conds->type() == Item::FUNC_ITEM &&
5196 ((Item_func *)join->conds)->functype() == Item_func::EQ_FUNC &&
5197 ((Item_func *)join->conds)->arguments()[0]->type() == Item::REF_ITEM &&
5198 ((Item_func *)join->conds)->arguments()[1]->type() == Item::FIELD_ITEM &&
5199 test_if_ref (join->conds,
5200 (Item_field *)((Item_func *)join->conds)->arguments()[1],
5201 ((Item_func *)join->conds)->arguments()[0]))
5202 {
5203 *where= 0;
5204 return;
5205 }
5206}
5207
5208
5209
5210
5211/**
5212 Optimize all subqueries of a query that were not flattened into a semijoin.
5213
5214 @details
5215 Optimize all immediate children subqueries of a query.
5216
5217 This phase must be called after substitute_for_best_equal_field() because
5218 that function may replace items with other items from a multiple equality,
5219 and we need to reference the correct items in the index access method of the
5220 IN predicate.
5221
5222 @return Operation status
5223 @retval FALSE success.
5224 @retval TRUE error occurred.
5225*/
5226
5227bool JOIN::optimize_unflattened_subqueries()
5228{
5229 return select_lex->optimize_unflattened_subqueries(false);
5230}
5231
5232/**
5233 Optimize all constant subqueries of a query that were not flattened into
5234 a semijoin.
5235
5236 @details
5237 Similar to other constant conditions, constant subqueries can be used in
5238 various constant optimizations. Having optimized constant subqueries before
5239 these constant optimizations, makes it possible to estimate if a subquery
5240 is "cheap" enough to be executed during the optimization phase.
5241
5242 Constant subqueries can be optimized and evaluated independent of the outer
5243 query, therefore if const_only = true, this method can be called early in
5244 the optimization phase of the outer query.
5245
5246 @return Operation status
5247 @retval FALSE success.
5248 @retval TRUE error occurred.
5249*/
5250
5251bool JOIN::optimize_constant_subqueries()
5252{
5253 ulonglong save_options= select_lex->options;
5254 bool res;
5255 /*
5256 Constant subqueries may be executed during the optimization phase.
5257 In EXPLAIN mode the optimizer doesn't initialize many of the data structures
5258 needed for execution. In order to make it possible to execute subqueries
5259 during optimization, constant subqueries must be optimized for execution,
5260 not for EXPLAIN.
5261 */
5262 select_lex->options&= ~SELECT_DESCRIBE;
5263 res= select_lex->optimize_unflattened_subqueries(true);
5264 select_lex->options= save_options;
5265 return res;
5266}
5267
5268
5269/*
5270 Join tab execution startup function.
5271
5272 SYNOPSIS
5273 join_tab_execution_startup()
5274 tab Join tab to perform startup actions for
5275
5276 DESCRIPTION
5277 Join tab execution startup function. This is different from
5278 tab->read_first_record in the regard that this has actions that are to be
5279 done once per join execution.
5280
5281 Currently there are only two possible startup functions, so we have them
5282 both here inside if (...) branches. In future we could switch to function
5283 pointers.
5284
5285 TODO: consider moving this together with JOIN_TAB::preread_init
5286
5287 RETURN
5288 NESTED_LOOP_OK - OK
5289 NESTED_LOOP_ERROR| NESTED_LOOP_KILLED - Error, abort the join execution
5290*/
5291
5292enum_nested_loop_state join_tab_execution_startup(JOIN_TAB *tab)
5293{
5294 Item_in_subselect *in_subs;
5295 DBUG_ENTER("join_tab_execution_startup");
5296
5297 if (tab->table->pos_in_table_list &&
5298 (in_subs= tab->table->pos_in_table_list->jtbm_subselect))
5299 {
5300 /* It's a non-merged SJM nest */
5301 DBUG_ASSERT(in_subs->engine->engine_type() ==
5302 subselect_engine::HASH_SJ_ENGINE);
5303 subselect_hash_sj_engine *hash_sj_engine=
5304 ((subselect_hash_sj_engine*)in_subs->engine);
5305 if (!hash_sj_engine->is_materialized)
5306 {
5307 hash_sj_engine->materialize_join->exec();
5308 hash_sj_engine->is_materialized= TRUE;
5309
5310 if (unlikely(hash_sj_engine->materialize_join->error) ||
5311 unlikely(tab->join->thd->is_fatal_error))
5312 DBUG_RETURN(NESTED_LOOP_ERROR);
5313 }
5314 }
5315 else if (tab->bush_children)
5316 {
5317 /* It's a merged SJM nest */
5318 enum_nested_loop_state rc;
5319 SJ_MATERIALIZATION_INFO *sjm= tab->bush_children->start->emb_sj_nest->sj_mat_info;
5320
5321 if (!sjm->materialized)
5322 {
5323 JOIN *join= tab->join;
5324 JOIN_TAB *join_tab= tab->bush_children->start;
5325 JOIN_TAB *save_return_tab= join->return_tab;
5326 /*
5327 Now run the join for the inner tables. The first call is to run the
5328 join, the second one is to signal EOF (this is essential for some
5329 join strategies, e.g. it will make join buffering flush the records)
5330 */
5331 if ((rc= sub_select(join, join_tab, FALSE/* no EOF */)) < 0 ||
5332 (rc= sub_select(join, join_tab, TRUE/* now EOF */)) < 0)
5333 {
5334 join->return_tab= save_return_tab;
5335 DBUG_RETURN(rc); /* it's NESTED_LOOP_(ERROR|KILLED)*/
5336 }
5337 join->return_tab= save_return_tab;
5338 sjm->materialized= TRUE;
5339 }
5340 }
5341
5342 DBUG_RETURN(NESTED_LOOP_OK);
5343}
5344
5345
5346/*
5347 Create a dummy temporary table, useful only for the sake of having a
5348 TABLE* object with map,tablenr and maybe_null properties.
5349
5350 This is used by non-mergeable semi-join materilization code to handle
5351 degenerate cases where materialized subquery produced "Impossible WHERE"
5352 and thus wasn't materialized.
5353*/
5354
5355TABLE *create_dummy_tmp_table(THD *thd)
5356{
5357 DBUG_ENTER("create_dummy_tmp_table");
5358 TABLE *table;
5359 TMP_TABLE_PARAM sjm_table_param;
5360 sjm_table_param.init();
5361 sjm_table_param.field_count= 1;
5362 List<Item> sjm_table_cols;
5363 const LEX_CSTRING dummy_name= { STRING_WITH_LEN("dummy") };
5364 Item *column_item= new (thd->mem_root) Item_int(thd, 1);
5365 if (!column_item)
5366 DBUG_RETURN(NULL);
5367
5368 sjm_table_cols.push_back(column_item, thd->mem_root);
5369 if (!(table= create_tmp_table(thd, &sjm_table_param,
5370 sjm_table_cols, (ORDER*) 0,
5371 TRUE /* distinct */,
5372 1, /*save_sum_fields*/
5373 thd->variables.option_bits |
5374 TMP_TABLE_ALL_COLUMNS,
5375 HA_POS_ERROR /*rows_limit */,
5376 &dummy_name, TRUE /* Do not open */)))
5377 {
5378 DBUG_RETURN(NULL);
5379 }
5380 DBUG_RETURN(table);
5381}
5382
5383
5384/*
5385 A class that is used to catch one single tuple that is sent to the join
5386 output, and save it in Item_cache element(s).
5387
5388 It is very similar to select_singlerow_subselect but doesn't require a
5389 Item_singlerow_subselect item.
5390*/
5391
5392class select_value_catcher :public select_subselect
5393{
5394public:
5395 select_value_catcher(THD *thd_arg, Item_subselect *item_arg):
5396 select_subselect(thd_arg, item_arg)
5397 {}
5398 int send_data(List<Item> &items);
5399 int setup(List<Item> *items);
5400 bool assigned; /* TRUE <=> we've caught a value */
5401 uint n_elements; /* How many elements we get */
5402 Item_cache **row; /* Array of cache elements */
5403};
5404
5405
5406int select_value_catcher::setup(List<Item> *items)
5407{
5408 assigned= FALSE;
5409 n_elements= items->elements;
5410
5411 if (!(row= (Item_cache**) thd->alloc(sizeof(Item_cache*) * n_elements)))
5412 return TRUE;
5413
5414 Item *sel_item;
5415 List_iterator<Item> li(*items);
5416 for (uint i= 0; (sel_item= li++); i++)
5417 {
5418 if (!(row[i]= sel_item->get_cache(thd)))
5419 return TRUE;
5420 row[i]->setup(thd, sel_item);
5421 }
5422 return FALSE;
5423}
5424
5425
5426int select_value_catcher::send_data(List<Item> &items)
5427{
5428 DBUG_ENTER("select_value_catcher::send_data");
5429 DBUG_ASSERT(!assigned);
5430 DBUG_ASSERT(items.elements == n_elements);
5431
5432 if (unit->offset_limit_cnt)
5433 { // Using limit offset,count
5434 unit->offset_limit_cnt--;
5435 DBUG_RETURN(0);
5436 }
5437
5438 Item *val_item;
5439 List_iterator_fast<Item> li(items);
5440 for (uint i= 0; (val_item= li++); i++)
5441 {
5442 row[i]->store(val_item);
5443 row[i]->cache_value();
5444 }
5445 assigned= TRUE;
5446 DBUG_RETURN(0);
5447}
5448
5449
5450/*
5451 Setup JTBM join tabs for execution
5452*/
5453
5454bool setup_jtbm_semi_joins(JOIN *join, List<TABLE_LIST> *join_list,
5455 Item **join_where)
5456{
5457 TABLE_LIST *table;
5458 NESTED_JOIN *nested_join;
5459 List_iterator<TABLE_LIST> li(*join_list);
5460 THD *thd= join->thd;
5461 DBUG_ENTER("setup_jtbm_semi_joins");
5462
5463 while ((table= li++))
5464 {
5465 Item_in_subselect *item;
5466
5467 if ((item= table->jtbm_subselect))
5468 {
5469 Item_in_subselect *subq_pred= item;
5470 double rows;
5471 double read_time;
5472
5473 /*
5474 Perform optimization of the subquery, so that we know estmated
5475 - cost of materialization process
5476 - how many records will be in the materialized temp.table
5477 */
5478 if (subq_pred->optimize(&rows, &read_time))
5479 DBUG_RETURN(TRUE);
5480
5481 subq_pred->jtbm_read_time= read_time;
5482 subq_pred->jtbm_record_count=rows;
5483 JOIN *subq_join= subq_pred->unit->first_select()->join;
5484
5485 if (!subq_join->tables_list || !subq_join->table_count)
5486 {
5487 /*
5488 A special case; subquery's join is degenerate, and it either produces
5489 0 or 1 record. Examples of both cases:
5490
5491 select * from ot where col in (select ... from it where 2>3)
5492 select * from ot where col in (select MY_MIN(it.key) from it)
5493
5494 in this case, the subquery predicate has not been setup for
5495 materialization. In particular, there is no materialized temp.table.
5496 We'll now need to
5497 1. Check whether 1 or 0 records are produced, setup this as a
5498 constant join tab.
5499 2. Create a dummy temporary table, because all of the join
5500 optimization code relies on TABLE object being present (here we
5501 follow a bad tradition started by derived tables)
5502 */
5503 DBUG_ASSERT(subq_pred->engine->engine_type() ==
5504 subselect_engine::SINGLE_SELECT_ENGINE);
5505 subselect_single_select_engine *engine=
5506 (subselect_single_select_engine*)subq_pred->engine;
5507 select_value_catcher *new_sink;
5508 if (!(new_sink=
5509 new (thd->mem_root) select_value_catcher(thd, subq_pred)))
5510 DBUG_RETURN(TRUE);
5511 if (new_sink->setup(&engine->select_lex->join->fields_list) ||
5512 engine->select_lex->join->change_result(new_sink, NULL) ||
5513 engine->exec())
5514 {
5515 DBUG_RETURN(TRUE);
5516 }
5517 subq_pred->is_jtbm_const_tab= TRUE;
5518
5519 if (new_sink->assigned)
5520 {
5521 subq_pred->jtbm_const_row_found= TRUE;
5522 /*
5523 Subselect produced one row, which is saved in new_sink->row.
5524 Inject "left_expr[i] == row[i] equalities into parent's WHERE.
5525 */
5526 Item *eq_cond;
5527 for (uint i= 0; i < subq_pred->left_expr->cols(); i++)
5528 {
5529 eq_cond= new (thd->mem_root)
5530 Item_func_eq(thd, subq_pred->left_expr->element_index(i),
5531 new_sink->row[i]);
5532 if (!eq_cond)
5533 DBUG_RETURN(1);
5534
5535 if (!((*join_where)= and_items(thd, *join_where, eq_cond)) ||
5536 (*join_where)->fix_fields(thd, join_where))
5537 DBUG_RETURN(1);
5538 }
5539 }
5540 else
5541 {
5542 /* Subselect produced no rows. Just set the flag, */
5543 subq_pred->jtbm_const_row_found= FALSE;
5544 }
5545
5546 /* Set up a dummy TABLE*, optimizer code needs JOIN_TABs to have TABLE */
5547 TABLE *dummy_table;
5548 if (!(dummy_table= create_dummy_tmp_table(thd)))
5549 DBUG_RETURN(1);
5550 table->table= dummy_table;
5551 table->table->pos_in_table_list= table;
5552 /*
5553 Note: the table created above may be freed by:
5554 1. JOIN_TAB::cleanup(), when the parent join is a regular join.
5555 2. cleanup_empty_jtbm_semi_joins(), when the parent join is a
5556 degenerate join (e.g. one with "Impossible where").
5557 */
5558 setup_table_map(table->table, table, table->jtbm_table_no);
5559 }
5560 else
5561 {
5562 DBUG_ASSERT(subq_pred->test_set_strategy(SUBS_MATERIALIZATION));
5563 subq_pred->is_jtbm_const_tab= FALSE;
5564 subselect_hash_sj_engine *hash_sj_engine=
5565 ((subselect_hash_sj_engine*)item->engine);
5566
5567 table->table= hash_sj_engine->tmp_table;
5568 table->table->pos_in_table_list= table;
5569
5570 setup_table_map(table->table, table, table->jtbm_table_no);
5571
5572 Item *sj_conds= hash_sj_engine->semi_join_conds;
5573
5574 (*join_where)= and_items(thd, *join_where, sj_conds);
5575 if (!(*join_where)->fixed)
5576 (*join_where)->fix_fields(thd, join_where);
5577 }
5578 table->table->maybe_null= MY_TEST(join->mixed_implicit_grouping);
5579 }
5580
5581 if ((nested_join= table->nested_join))
5582 {
5583 if (setup_jtbm_semi_joins(join, &nested_join->join_list, join_where))
5584 DBUG_RETURN(TRUE);
5585 }
5586 }
5587 DBUG_RETURN(FALSE);
5588}
5589
5590
5591/*
5592 Cleanup non-merged semi-joins (JBMs) that have empty.
5593
5594 This function is to cleanups for a special case:
5595 Consider a query like
5596
5597 select * from t1 where 1=2 AND t1.col IN (select max(..) ... having 1=2)
5598
5599 For this query, optimization of subquery will short-circuit, and
5600 setup_jtbm_semi_joins() will call create_dummy_tmp_table() so that we have
5601 empty, constant temp.table to stand in as materialized temp. table.
5602
5603 Now, suppose that the upper join is also found to be degenerate. In that
5604 case, no JOIN_TAB array will be produced, and hence, JOIN::cleanup() will
5605 have a problem with cleaning up empty JTBMs (non-empty ones are cleaned up
5606 through Item::cleanup() calls).
5607*/
5608
5609void cleanup_empty_jtbm_semi_joins(JOIN *join, List<TABLE_LIST> *join_list)
5610{
5611 List_iterator<TABLE_LIST> li(*join_list);
5612 TABLE_LIST *table;
5613 while ((table= li++))
5614 {
5615 if ((table->jtbm_subselect && table->jtbm_subselect->is_jtbm_const_tab))
5616 {
5617 if (table->table)
5618 {
5619 free_tmp_table(join->thd, table->table);
5620 table->table= NULL;
5621 }
5622 }
5623 else if (table->nested_join && table->sj_subq_pred)
5624 {
5625 cleanup_empty_jtbm_semi_joins(join, &table->nested_join->join_list);
5626 }
5627 }
5628}
5629
5630
5631/**
5632 Choose an optimal strategy to execute an IN/ALL/ANY subquery predicate
5633 based on cost.
5634
5635 @param join_tables the set of tables joined in the subquery
5636
5637 @notes
5638 The method chooses between the materialization and IN=>EXISTS rewrite
5639 strategies for the execution of a non-flattened subquery IN predicate.
5640 The cost-based decision is made as follows:
5641
5642 1. compute materialize_strategy_cost based on the unmodified subquery
5643 2. reoptimize the subquery taking into account the IN-EXISTS predicates
5644 3. compute in_exists_strategy_cost based on the reoptimized plan
5645 4. compare and set the cheaper strategy
5646 if (materialize_strategy_cost >= in_exists_strategy_cost)
5647 in_strategy = MATERIALIZATION
5648 else
5649 in_strategy = IN_TO_EXISTS
5650 5. if in_strategy = MATERIALIZATION and it is not possible to initialize it
5651 revert to IN_TO_EXISTS
5652 6. if (in_strategy == MATERIALIZATION)
5653 revert the subquery plan to the original one before reoptimizing
5654 else
5655 inject the IN=>EXISTS predicates into the new EXISTS subquery plan
5656
5657 The implementation itself is a bit more complicated because it takes into
5658 account two more factors:
5659 - whether the user allowed both strategies through an optimizer_switch, and
5660 - if materialization was the cheaper strategy, whether it can be executed
5661 or not.
5662
5663 @retval FALSE success.
5664 @retval TRUE error occurred.
5665*/
5666
5667bool JOIN::choose_subquery_plan(table_map join_tables)
5668{
5669 enum_reopt_result reopt_result= REOPT_NONE;
5670 Item_in_subselect *in_subs;
5671
5672 /*
5673 IN/ALL/ANY optimizations are not applicable for so called fake select
5674 (this select exists only to filter results of union if it is needed).
5675 */
5676 if (select_lex == select_lex->master_unit()->fake_select_lex)
5677 return 0;
5678
5679 if (is_in_subquery())
5680 {
5681 in_subs= (Item_in_subselect*) unit->item;
5682 if (in_subs->create_in_to_exists_cond(this))
5683 return true;
5684 }
5685 else
5686 return false;
5687
5688 /* A strategy must be chosen earlier. */
5689 DBUG_ASSERT(in_subs->has_strategy());
5690 DBUG_ASSERT(in_to_exists_where || in_to_exists_having);
5691 DBUG_ASSERT(!in_to_exists_where || in_to_exists_where->fixed);
5692 DBUG_ASSERT(!in_to_exists_having || in_to_exists_having->fixed);
5693
5694 /* The original QEP of the subquery. */
5695 Join_plan_state save_qep(table_count);
5696
5697 /*
5698 Compute and compare the costs of materialization and in-exists if both
5699 strategies are possible and allowed by the user (checked during the prepare
5700 phase.
5701 */
5702 if (in_subs->test_strategy(SUBS_MATERIALIZATION) &&
5703 in_subs->test_strategy(SUBS_IN_TO_EXISTS))
5704 {
5705 JOIN *outer_join;
5706 JOIN *inner_join= this;
5707 /* Number of unique value combinations filtered by the IN predicate. */
5708 double outer_lookup_keys;
5709 /* Cost and row count of the unmodified subquery. */
5710 double inner_read_time_1, inner_record_count_1;
5711 /* Cost of the subquery with injected IN-EXISTS predicates. */
5712 double inner_read_time_2;
5713 /* The cost to compute IN via materialization. */
5714 double materialize_strategy_cost;
5715 /* The cost of the IN->EXISTS strategy. */
5716 double in_exists_strategy_cost;
5717 double dummy;
5718
5719 /*
5720 A. Estimate the number of rows of the outer table that will be filtered
5721 by the IN predicate.
5722 */
5723 outer_join= unit->outer_select() ? unit->outer_select()->join : NULL;
5724 /*
5725 Get the cost of the outer join if:
5726 (1) It has at least one table, and
5727 (2) It has been already optimized (if there is no join_tab, then the
5728 outer join has not been optimized yet).
5729 */
5730 if (outer_join && outer_join->table_count > 0 && // (1)
5731 outer_join->join_tab && // (2)
5732 !in_subs->const_item())
5733 {
5734 /*
5735 TODO:
5736 Currently outer_lookup_keys is computed as the number of rows in
5737 the partial join including the JOIN_TAB where the IN predicate is
5738 pushed to. In the general case this is a gross overestimate because
5739 due to caching we are interested only in the number of unique keys.
5740 The search key may be formed by columns from much fewer than all
5741 tables in the partial join. Example:
5742 select * from t1, t2 where t1.c1 = t2.key AND t2.c2 IN (select ...);
5743 If the join order: t1, t2, the number of unique lookup keys is ~ to
5744 the number of unique values t2.c2 in the partial join t1 join t2.
5745 */
5746 outer_join->get_partial_cost_and_fanout(in_subs->get_join_tab_idx(),
5747 table_map(-1),
5748 &dummy,
5749 &outer_lookup_keys);
5750 }
5751 else
5752 {
5753 /*
5754 TODO: outer_join can be NULL for DELETE statements.
5755 How to compute its cost?
5756 */
5757 outer_lookup_keys= 1;
5758 }
5759
5760 /*
5761 B. Estimate the cost and number of records of the subquery both
5762 unmodified, and with injected IN->EXISTS predicates.
5763 */
5764 inner_read_time_1= inner_join->best_read;
5765 inner_record_count_1= inner_join->join_record_count;
5766
5767 if (in_to_exists_where && const_tables != table_count)
5768 {
5769 /*
5770 Re-optimize and cost the subquery taking into account the IN-EXISTS
5771 conditions.
5772 */
5773 reopt_result= reoptimize(in_to_exists_where, join_tables, &save_qep);
5774 if (reopt_result == REOPT_ERROR)
5775 return TRUE;
5776
5777 /* Get the cost of the modified IN-EXISTS plan. */
5778 inner_read_time_2= inner_join->best_read;
5779
5780 }
5781 else
5782 {
5783 /* Reoptimization would not produce any better plan. */
5784 inner_read_time_2= inner_read_time_1;
5785 }
5786
5787 /*
5788 C. Compute execution costs.
5789 */
5790 /* C.1 Compute the cost of the materialization strategy. */
5791 //uint rowlen= get_tmp_table_rec_length(unit->first_select()->item_list);
5792 uint rowlen= get_tmp_table_rec_length(ref_ptrs,
5793 select_lex->item_list.elements);
5794 /* The cost of writing one row into the temporary table. */
5795 double write_cost= get_tmp_table_write_cost(thd, inner_record_count_1,
5796 rowlen);
5797 /* The cost of a lookup into the unique index of the materialized table. */
5798 double lookup_cost= get_tmp_table_lookup_cost(thd, inner_record_count_1,
5799 rowlen);
5800 /*
5801 The cost of executing the subquery and storing its result in an indexed
5802 temporary table.
5803 */
5804 double materialization_cost= inner_read_time_1 +
5805 write_cost * inner_record_count_1;
5806
5807 materialize_strategy_cost= materialization_cost +
5808 outer_lookup_keys * lookup_cost;
5809
5810 /* C.2 Compute the cost of the IN=>EXISTS strategy. */
5811 in_exists_strategy_cost= outer_lookup_keys * inner_read_time_2;
5812
5813 /* C.3 Compare the costs and choose the cheaper strategy. */
5814 if (materialize_strategy_cost >= in_exists_strategy_cost)
5815 in_subs->set_strategy(SUBS_IN_TO_EXISTS);
5816 else
5817 in_subs->set_strategy(SUBS_MATERIALIZATION);
5818
5819 DBUG_PRINT("info",
5820 ("mat_strategy_cost: %.2f, mat_cost: %.2f, write_cost: %.2f, lookup_cost: %.2f",
5821 materialize_strategy_cost, materialization_cost, write_cost, lookup_cost));
5822 DBUG_PRINT("info",
5823 ("inx_strategy_cost: %.2f, inner_read_time_2: %.2f",
5824 in_exists_strategy_cost, inner_read_time_2));
5825 DBUG_PRINT("info",("outer_lookup_keys: %.2f", outer_lookup_keys));
5826 }
5827
5828 /*
5829 If (1) materialization is a possible strategy based on semantic analysis
5830 during the prepare phase, then if
5831 (2) it is more expensive than the IN->EXISTS transformation, and
5832 (3) it is not possible to create usable indexes for the materialization
5833 strategy,
5834 fall back to IN->EXISTS.
5835 otherwise
5836 use materialization.
5837 */
5838 if (in_subs->test_strategy(SUBS_MATERIALIZATION) &&
5839 in_subs->setup_mat_engine())
5840 {
5841 /*
5842 If materialization was the cheaper or the only user-selected strategy,
5843 but it is not possible to execute it due to limitations in the
5844 implementation, fall back to IN-TO-EXISTS.
5845 */
5846 in_subs->set_strategy(SUBS_IN_TO_EXISTS);
5847 }
5848
5849 if (in_subs->test_strategy(SUBS_MATERIALIZATION))
5850 {
5851 /* Restore the original query plan used for materialization. */
5852 if (reopt_result == REOPT_NEW_PLAN)
5853 restore_query_plan(&save_qep);
5854
5855 in_subs->unit->uncacheable&= ~UNCACHEABLE_DEPENDENT_INJECTED;
5856 select_lex->uncacheable&= ~UNCACHEABLE_DEPENDENT_INJECTED;
5857
5858 /*
5859 Reset the "LIMIT 1" set in Item_exists_subselect::fix_length_and_dec.
5860 TODO:
5861 Currently we set the subquery LIMIT to infinity, and this is correct
5862 because we forbid at parse time LIMIT inside IN subqueries (see
5863 Item_in_subselect::test_limit). However, once we allow this, here
5864 we should set the correct limit if given in the query.
5865 */
5866 in_subs->unit->global_parameters()->select_limit= NULL;
5867 in_subs->unit->set_limit(unit->global_parameters());
5868 /*
5869 Set the limit of this JOIN object as well, because normally its being
5870 set in the beginning of JOIN::optimize, which was already done.
5871 */
5872 select_limit= in_subs->unit->select_limit_cnt;
5873 }
5874 else if (in_subs->test_strategy(SUBS_IN_TO_EXISTS))
5875 {
5876 if (reopt_result == REOPT_NONE && in_to_exists_where &&
5877 const_tables != table_count)
5878 {
5879 /*
5880 The subquery was not reoptimized with the newly injected IN-EXISTS
5881 conditions either because the user allowed only the IN-EXISTS strategy,
5882 or because materialization was not possible based on semantic analysis.
5883 */
5884 reopt_result= reoptimize(in_to_exists_where, join_tables, NULL);
5885 if (reopt_result == REOPT_ERROR)
5886 return TRUE;
5887 }
5888
5889 if (in_subs->inject_in_to_exists_cond(this))
5890 return TRUE;
5891 /*
5892 If the injected predicate is correlated the IN->EXISTS transformation
5893 make the subquery dependent.
5894 */
5895 if ((in_to_exists_where &&
5896 in_to_exists_where->used_tables() & OUTER_REF_TABLE_BIT) ||
5897 (in_to_exists_having &&
5898 in_to_exists_having->used_tables() & OUTER_REF_TABLE_BIT))
5899 {
5900 in_subs->unit->uncacheable|= UNCACHEABLE_DEPENDENT_INJECTED;
5901 select_lex->uncacheable|= UNCACHEABLE_DEPENDENT_INJECTED;
5902 }
5903 select_limit= 1;
5904 }
5905 else
5906 DBUG_ASSERT(FALSE);
5907
5908 return FALSE;
5909}
5910
5911
5912/**
5913 Choose a query plan for a table-less subquery.
5914
5915 @notes
5916
5917 @retval FALSE success.
5918 @retval TRUE error occurred.
5919*/
5920
5921bool JOIN::choose_tableless_subquery_plan()
5922{
5923 DBUG_ASSERT(!tables_list || !table_count);
5924 if (unit->item)
5925 {
5926 DBUG_ASSERT(unit->item->type() == Item::SUBSELECT_ITEM);
5927 Item_subselect *subs_predicate= unit->item;
5928
5929 /*
5930 If the optimizer determined that his query has an empty result,
5931 in most cases the subquery predicate is a known constant value -
5932 either of TRUE, FALSE or NULL. The implementation of
5933 Item_subselect::no_rows_in_result() determines which one.
5934 */
5935 if (zero_result_cause)
5936 {
5937 if (!implicit_grouping)
5938 {
5939 /*
5940 Both group by queries and non-group by queries without aggregate
5941 functions produce empty subquery result. There is no need to further
5942 rewrite the subquery because it will not be executed at all.
5943 */
5944 return FALSE;
5945 }
5946
5947 /* @todo
5948 A further optimization is possible when a non-group query with
5949 MIN/MAX/COUNT is optimized by opt_sum_query. Then, if there are
5950 only MIN/MAX functions over an empty result set, the subquery
5951 result is a NULL value/row, thus the value of subs_predicate is
5952 NULL.
5953 */
5954 }
5955
5956 /*
5957 For IN subqueries, use IN->EXISTS transfomation, unless the subquery
5958 has been converted to a JTBM semi-join. In that case, just leave
5959 everything as-is, setup_jtbm_semi_joins() has special handling for cases
5960 like this.
5961 */
5962 if (subs_predicate->is_in_predicate() &&
5963 !(subs_predicate->substype() == Item_subselect::IN_SUBS &&
5964 ((Item_in_subselect*)subs_predicate)->is_jtbm_merged))
5965 {
5966 Item_in_subselect *in_subs;
5967 in_subs= (Item_in_subselect*) subs_predicate;
5968 in_subs->set_strategy(SUBS_IN_TO_EXISTS);
5969 if (in_subs->create_in_to_exists_cond(this) ||
5970 in_subs->inject_in_to_exists_cond(this))
5971 return TRUE;
5972 tmp_having= having;
5973 }
5974 }
5975 exec_const_cond= conds;
5976 return FALSE;
5977}
5978