opt_subselect.cc source code [MariaDB/sql/opt_subselect.cc]

1	/*
2	Copyright (c) 2010, 2015, MariaDB
3
4	This program is free software; you can redistribute it and/or modify
5	it under the terms of the GNU General Public License as published by
6	the Free Software Foundation; version 2 of the License.
7
8	This program is distributed in the hope that it will be useful,
9	but WITHOUT ANY WARRANTY; without even the implied warranty of
10	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11	GNU General Public License for more details.
12
13	You should have received a copy of the GNU General Public License
14	along with this program; if not, write to the Free Software
15	Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111-1301 USA /*
16
17	/**
18	@file
19
20	@brief
21	Semi-join subquery optimizations code
22
23	*/
24
25	#ifdef USE_PRAGMA_IMPLEMENTATION
26	#pragma implementation // gcc: Class implementation
27	#endif
28
29	#include "mariadb.h"
30	#include "sql_base.h"
31	#include "sql_select.h"
32	#include "filesort.h"
33	#include "opt_subselect.h"
34	#include "sql_test.h"
35	#include <my_bit.h>
36
37	/*
38	This file contains optimizations for semi-join subqueries.
39
40	Contents
41	--------
42	1. What is a semi-join subquery
43	2. General idea about semi-join execution
44	2.1 Correlated vs uncorrelated semi-joins
45	2.2 Mergeable vs non-mergeable semi-joins
46	3. Code-level view of semi-join processing
47	3.1 Conversion
48	3.1.1 Merged semi-join TABLE_LIST object
49	3.1.2 Non-merged semi-join data structure
50	3.2 Semi-joins and query optimization
51	3.2.1 Non-merged semi-joins and join optimization
52	3.2.2 Merged semi-joins and join optimization
53	3.3 Semi-joins and query execution
54
55	1. What is a semi-join subquery
56	-------------------------------
57	We use this definition of semi-join:
58
59	outer_tbl SEMI JOIN inner_tbl ON cond = {set of outer_tbl.row such that
60	exist inner_tbl.row, for which
61	cond(outer_tbl.row,inner_tbl.row)
62	is satisfied}
63
64	That is, semi-join operation is similar to inner join operation, with
65	exception that we don't care how many matches a row from outer_tbl has in
66	inner_tbl.
67
68	In SQL terms: a semi-join subquery is an IN subquery that is an AND-part of
69	the WHERE/ON clause.
70
71	2. General idea about semi-join execution
72	-----------------------------------------
73	We can execute semi-join in a way similar to inner join, with exception that
74	we need to somehow ensure that we do not generate record combinations that
75	differ only in rows of inner tables.
76	There is a number of different ways to achieve this property, implemented by
77	a number of semi-join execution strategies.
78	Some strategies can handle any semi-joins, other can be applied only to
79	semi-joins that have certain properties that are described below:
80
81	2.1 Correlated vs uncorrelated semi-joins
82	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
83	Uncorrelated semi-joins are special in the respect that they allow to
84	- execute the subquery (possible as it's uncorrelated)
85	- somehow make sure that generated set does not have duplicates
86	- perform an inner join with outer tables.
87
88	or, rephrasing in SQL form:
89
90	SELECT ... FROM ot WHERE ot.col IN (SELECT it.col FROM it WHERE uncorr_cond)
91	->
92	SELECT ... FROM ot JOIN (SELECT DISTINCT it.col FROM it WHERE uncorr_cond)
93
94	2.2 Mergeable vs non-mergeable semi-joins
95	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
96	Semi-join operation has some degree of commutability with inner join
97	operation: we can join subquery's tables with ouside table(s) and eliminate
98	duplicate record combination after that:
99
100	ot1 JOIN ot2 SEMI_JOIN{it1,it2} (it1 JOIN it2) ON sjcond(ot2,it) ->*
101	\|
102	+-------------------------------+
103	v
104	ot1 SEMI_JOIN{it1,it2} (it1 JOIN it2 JOIN ot2) ON sjcond(ot2,it)*
105
106	In order for this to work, subquery's top-level operation must be join, and
107	grouping or ordering with limit (grouping or ordering with limit are not
108	commutative with duplicate removal). In other words, the conversion is
109	possible when the subquery doesn't have GROUP BY clause, any aggregate
110	functions, or ORDER BY ... LIMIT clause.*
111
112	Definitions:
113	- Subquery whose top-level operation is a join is called mergeable semi-join
114	- All other kinds of semi-join subqueries are considered non-mergeable.
115
116	*- this requirement is actually too strong, but its exceptions are too
117	complicated to be considered here.
118
119	3. Code-level view of semi-join processing
120	------------------------------------------
121
122	3.1 Conversion and pre-optimization data structures
123	---------------------------------------------------
124	* When doing JOIN::prepare for the subquery, we detect that it can be
125	converted into a semi-join and register it in parent_join->sj_subselects
126
127	* At the start of parent_join->optimize(), the predicate is converted into
128	a semi-join node. A semi-join node is a TABLE_LIST object that is linked
129	somewhere in parent_join->join_list (either it is just present there, or
130	it is a descendant of some of its members).
131
132	There are two kinds of semi-joins:
133	- Merged semi-joins
134	- Non-merged semi-joins
135
136	3.1.1 Merged semi-join TABLE_LIST object
137	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
138	Merged semi-join object is a TABLE_LIST that contains a sub-join of
139	subquery tables and the semi-join ON expression (in this respect it is
140	very similar to nested outer join representation)
141	Merged semi-join represents this SQL:
142
143	... SEMI JOIN (inner_tbl1 JOIN ... JOIN inner_tbl_n) ON sj_on_expr
144
145	Semi-join objects of this kind have TABLE_LIST::sj_subq_pred set.
146
147	3.1.2 Non-merged semi-join data structure
148	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
149	Non-merged semi-join object is a leaf TABLE_LIST object that has a subquery
150	that produces rows. It is similar to a base table and represents this SQL:
151
152	... SEMI_JOIN (SELECT non_mergeable_select) ON sj_on_expr
153
154	Subquery items that were converted into semi-joins are removed from the WHERE
155	clause. (They do remain in PS-saved WHERE clause, and they replace themselves
156	with Item_int(1) on subsequent re-executions).
157
158	3.2 Semi-joins and join optimization
159	------------------------------------
160
161	3.2.1 Non-merged semi-joins and join optimization
162	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
163	For join optimization purposes, non-merged semi-join nests are similar to
164	base tables. Each such nest is represented by one one JOIN_TAB, which has
165	two possible access strategies:
166	- full table scan (representing SJ-Materialization-Scan strategy)
167	- eq_ref-like table lookup (representing SJ-Materialization-Lookup)
168
169	Unlike regular base tables, non-merged semi-joins have:
170	- non-zero JOIN_TAB::startup_cost, and
171	- join_tab->table->is_filled_at_execution()==TRUE, which means one
172	cannot do const table detection, range analysis or other dataset-dependent
173	optimizations.
174	Instead, get_delayed_table_estimates() will run optimization for the
175	subquery and produce an E(materialized table size).
176
177	3.2.2 Merged semi-joins and join optimization
178	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
179	- optimize_semijoin_nests() does pre-optimization
180	- during join optimization, the join has one JOIN_TAB (or is it POSITION?)
181	array, and suffix-based detection is used, see advance_sj_state()
182	- after join optimization is done, get_best_combination() switches
183	the data-structure to prefix-based, multiple JOIN_TAB ranges format.
184
185	3.3 Semi-joins and query execution
186	----------------------------------
187	* Join executor has hooks for all semi-join strategies.
188	TODO elaborate.
189
190	*/
191
192	/*
193	EqualityPropagationAndSjmNests
194	******************************
195
196	Equalities are used for:
197	P1. Equality propagation
198	P2. Equality substitution [for a certain join order]
199
200	The equality propagation is not affected by SJM nests. In fact, it is done
201	before we determine the execution plan, i.e. before we even know we will use
202	SJM-nests for execution.
203
204	The equality substitution is affected.
205
206	Substitution without SJMs
207	=========================
208	When one doesn't have SJM nests, tables have a strict join order:
209
210	--------------------------------->
211	t1 -- t2 -- t3 -- t4 --- t5
212
213
214	? ^
215	\
216	--(part-of-WHERE)
217
218
219	parts WHERE/ON and ref. expressions are attached at some point along the axis.
220	Expression is allowed to refer to a table column if the table is to the left of
221	the attachment point. For any given expression, we have a goal:
222
223	"Move leftmost allowed attachment point as much as possible to the left"
224
225	Substitution with SJMs - task setting
226	=====================================
227
228	When SJM nests are present, there is no global strict table ordering anymore:
229
230
231	--------------------------------->
232
233	ot1 -- ot2 --- sjm -- ot4 --- ot5
234	\|
235	\| Main execution
236	- - - - - - - - - - - - - - - - - - - - - - - -
237	\| Materialization
238	it1 -- it2 --/
239
240
241	Besides that, we must take into account that
242	- values for outer table columns, otN.col, are inaccessible at
243	materialization step (SJM-RULE)
244	- values for inner table columns, itN.col, are inaccessible at Main execution
245	step, except for SJ-Materialization-Scan and columns that are in the
246	subquery's select list. (SJM-RULE)
247
248	Substitution with SJMs - solution
249	=================================
250
251	First, we introduce global strict table ordering like this:
252
253	ot1 - ot2 --\ /--- ot3 -- ot5
254	\--- it1 --- it2 --/
255
256	Now, let's see how to meet (SJM-RULE).
257
258	SJ-Materialization is only applicable for uncorrelated subqueries. From this, it
259	follows that any multiple equality will either
260	1. include only columns of outer tables, or
261	2. include only columns of inner tables, or
262	3. include columns of inner and outer tables, joined together through one
263	of IN-equalities.
264
265	Cases #1 and #2 can be handled in the same way as with regular inner joins.
266
267	Case #3 requires special handling, so that we don't construct violations of
268	(SJM-RULE). Let's consider possible ways to build violations.
269
270	Equality propagation starts with the clause in this form
271
272	top_query_where AND subquery_where AND in_equalities
273
274	First, it builds multi-equalities. It can also build a mixed multi-equality
275
276	multiple-equal(ot1.col, ot2.col, ... it1.col, itN.col)
277
278	Multi-equalities are pushed down the OR-clauses in top_query_where and in
279	subquery_where, so it's possible that clauses like this one are built:
280
281	subquery_cond OR (multiple-equal(it1.col, ot1.col,...) AND ...)
282	^^^^^^^^^^^^^ \
283	\| this must be evaluated
284	\- can only be evaluated at the main phase.
285	at the materialization phase
286
287	Finally, equality substitution is started. It does two operations:
288
289
290	1. Field reference substitution
291	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
292
293	(In the code, this is Item_field::replace_equal_field)
294
295	This is a process of replacing each reference to "tblX.col"
296	with the first element of the multi-equality. (REF-SUBST-ORIG)
297
298	This behaviour can cause problems with Semi-join nests. Suppose, we have a
299	condition:
300
301	func(it1.col, it2.col)
302
303	and a multi-equality(ot1.col, it1.col). Then, reference to "it1.col" will be
304	replaced with "ot1.col", constructing a condition
305
306	func(ot1.col, it2.col)
307
308	which will be a violation of (SJM-RULE).
309
310	In order to avoid this, (REF-SUBST-ORIG) is amended as follows:
311
312	- references to tables "itX.col" that are inner wrt some SJM nest, are
313	replaced with references to the first inner table from the same SJM nest.
314
315	- references to top-level tables "otX.col" are replaced with references to
316	the first element of the multi-equality, no matter if that first element is
317	a column of a top-level table or of table from some SJM nest.
318	(REF-SUBST-SJM)
319
320	The case where the first element is a table from an SJM nest $SJM is ok,
321	because it can be proven that $SJM uses SJ-Materialization-Scan, and
322	"unpacks" correct column values to the first element during the main
323	execution phase.
324
325	2. Item_equal elimination
326	~~~~~~~~~~~~~~~~~~~~~~~~~
327	(In the code: eliminate_item_equal) This is a process of taking
328
329	multiple-equal(a,b,c,d,e)
330
331	and replacing it with an equivalent expression which is an AND of pair-wise
332	equalities:
333
334	a=b AND a=c AND ...
335
336	The equalities are picked such that for any given join prefix (t1,t2...) the
337	subset of equalities that can be evaluated gives the most restrictive
338	filtering.
339
340	Without SJM nests, it is sufficient to compare every multi-equality member
341	with the first one:
342
343	elem1=elem2 AND elem1=elem3 AND elem1=elem4 ...
344
345	When SJM nests are present, we should take care not to construct equalities
346	that violate the (SJM-RULE). This is achieved by generating separate sets of
347	equalites for top-level tables and for inner tables. That is, for the join
348	order
349
350	ot1 - ot2 --\ /--- ot3 -- ot5
351	\--- it1 --- it2 --/
352
353	we will generate
354	ot1.col=ot2.col
355	ot1.col=ot3.col
356	ot1.col=ot5.col
357	it2.col=it1.col
358
359
360	2.1 The problem with Item_equals and ORs
361	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
362	As has been mentioned above, multiple equalities are pushed down into OR
363	clauses, possibly building clauses like this:
364
365	func(it.col2) OR multiple-equal(it1.col1, it1.col2, ot1.col) (1)
366
367	where the first part of the clause has references to inner tables, while the
368	second has references to the top-level tables, which is a violation of
369	(SJM-RULE).
370
371	AND-clauses of this kind do not create problems, because make_cond_for_table()
372	will take them apart. OR-clauses will not be split. It is possible to
373	split-out the part that's dependent on the inner table:
374
375	func(it.col2) OR it1.col1=it1.col2
376
377	but this is a less-restrictive condition than condition (1). Current execution
378	scheme will still try to generate the "remainder" condition:
379
380	func(it.col2) OR it1.col1=ot1.col
381
382	which is a violation of (SJM-RULE).
383
384	QQ: "ot1.col=it1.col" is checked at the upper level. Why was it not removed
385	here?
386	AA: because has a proper subset of conditions that are found on this level.
387	consider a join order of ot, sjm(it)
388	and a condition
389	ot.col=it.col AND ( ot.col=it.col='foo' OR it.col2='bar')
390
391	we will produce:
392	table ot: nothing
393	table it: ot.col=it.col AND (ot.col='foo' OR it.col2='bar')
394	^^^^ ^^^^^^^^^^^^^^^^
395	\| \ the problem is that
396	\| this part condition didnt
397	\| receive a substitution
398	\|
399	+--- it was correct to subst, 'ot' is
400	the left-most.
401
402
403	Does it make sense to push "inner=outer" down into ORs?
404	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
405
406	Yes. Consider the query:
407
408	select from ot*
409	where ot.col in (select it.col from it where (it.col='foo' OR it.col='bar'))
410
411	here, it may be useful to infer that
412
413	(ot.col='foo' OR ot.col='bar') (CASE-FOR-SUBST)
414
415	and attach that condition to the table 'ot'.
416
417	Possible solutions for Item_equals and ORs
418	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
419
420	Solution #1
421	~~~~~~~~~~~
422	Let make_cond_for_table() chop analyze the OR clauses it has produced and
423	discard them if they violate (SJM-RULE). This solution would allow to handle
424	cases like (CASE-FOR-SUBST) at the expense of making semantics of
425	make_cond_for_table() complicated.
426
427	Solution #2
428	~~~~~~~~~~~
429	Before the equality propagation phase, none of the OR clauses violate the
430	(SJM-RULE). This way, if we remember which tables the original equality
431	referred to, we can only generate equalities that refer to the outer (or inner)
432	tables. Note that this will disallow handling of cases like (CASE-FOR-SUBST).
433
434	Currently, solution #2 is implemented.
435	*/
436
437	LEX_CSTRING weedout_key= {STRING_WITH_LEN("weedout_key")};
438
439	static
440	bool subquery_types_allow_materialization(Item_in_subselect *in_subs);
441	static bool replace_where_subcondition(JOIN , Item , Item , Item , bool*);
442	static int subq_sj_candidate_cmp(Item_in_subselect* el1, Item_in_subselect* el2,
443	void *arg);
444	static bool convert_subq_to_sj(JOIN parent_join, Item_in_subselect subq_pred);
445	static bool convert_subq_to_jtbm(JOIN *parent_join,
446	Item_in_subselect subq_pred, bool* *remove);
447	static TABLE_LIST alloc_join_nest(THD thd);
448	static uint get_tmp_table_rec_length(Ref_ptr_array p_list, uint elements);
449	bool find_eq_ref_candidate(TABLE *table, table_map sj_inner_tables);
450	static SJ_MATERIALIZATION_INFO *
451	at_sjmat_pos(const JOIN join, table_map remaining_tables, const* JOIN_TAB *tab,
452	uint idx, bool *loose_scan);
453	void best_access_path(JOIN join, JOIN_TAB s,
454	table_map remaining_tables, uint idx,
455	bool disable_jbuf, double record_count,
456	POSITION pos, POSITION loose_scan_pos);
457
458	static Item create_subq_in_equalities(THD thd, SJ_MATERIALIZATION_INFO *sjm,
459	Item_in_subselect *subq_pred);
460	static bool remove_sj_conds(THD thd, Item *tree);
461	static bool is_cond_sj_in_equality(Item *item);
462	static bool sj_table_is_included(JOIN join, JOIN_TAB join_tab);
463	static Item remove_additional_cond(Item conds);
464	static void remove_subq_pushed_predicates(JOIN join, Item *where);
465
466	enum_nested_loop_state
467	end_sj_materialize(JOIN join, JOIN_TAB join_tab, bool end_of_records);
468
469
470	/*
471	Check if Materialization strategy is allowed for given subquery predicate.
472
473	@param thd Thread handle
474	@param in_subs The subquery predicate
475	@param child_select The select inside predicate (the function will
476	check it is the only one)
477
478	@return TRUE - Materialization is applicable
479	FALSE - Otherwise
480	*/
481
482	bool is_materialization_applicable(THD thd, Item_in_subselect in_subs,
483	st_select_lex *child_select)
484	{
485	st_select_lex_unit* parent_unit= child_select->master_unit();
486	/*
487	Check if the subquery predicate can be executed via materialization.
488	The required conditions are:
489	0. The materialization optimizer switch was set.
490	1. Subquery is a single SELECT (not a UNION).
491	TODO: this is a limitation that can be fixed
492	2. Subquery is not a table-less query. In this case there is no
493	point in materializing.
494	2A The upper query is not a table-less SELECT ... FROM DUAL. We
495	can't do materialization for SELECT .. FROM DUAL because it
496	does not call setup_subquery_materialization(). We could make
497	SELECT ... FROM DUAL call that function but that doesn't seem
498	to be the case that is worth handling.
499	3. Either the subquery predicate is a top-level predicate, or at
500	least one partial match strategy is enabled. If no partial match
501	strategy is enabled, then materialization cannot be used for
502	non-top-level queries because it cannot handle NULLs correctly.
503	4. Subquery is non-correlated
504	TODO:
505	This condition is too restrictive (limitation). It can be extended to:
506	(Subquery is non-correlated \|\|
507	Subquery is correlated to any query outer to IN predicate \|\|
508	(Subquery is correlated to the immediate outer query &&
509	Subquery !contains {GROUP BY, ORDER BY [LIMIT],
510	aggregate functions}) && subquery predicate is not under "NOT IN"))
511	5. Subquery does not contain recursive references
512
513	A note about prepared statements: we want the if-branch to be taken on
514	PREPARE and each EXECUTE. The rewrites are only done once, but we need
515	select_lex->sj_subselects list to be populated for every EXECUTE.
516
517	*/
518	if (optimizer_flag(thd, OPTIMIZER_SWITCH_MATERIALIZATION) && // 0
519	!child_select->is_part_of_union() && // 1
520	parent_unit->first_select()->leaf_tables.elements && // 2
521	child_select->outer_select()->leaf_tables.elements && // 2A
522	subquery_types_allow_materialization(in_subs) &&
523	(in_subs->is_top_level_item() \|\| //3
524	optimizer_flag(thd,
525	OPTIMIZER_SWITCH_PARTIAL_MATCH_ROWID_MERGE) \|\| //3
526	optimizer_flag(thd,
527	OPTIMIZER_SWITCH_PARTIAL_MATCH_TABLE_SCAN)) && //3
528	!in_subs->is_correlated && //4
529	!in_subs->with_recursive_reference) //5
530	{
531	return TRUE;
532	}
533	return FALSE;
534	}
535
536
537	/*
538	Check if we need JOIN::prepare()-phase subquery rewrites and if yes, do them
539
540	SYNOPSIS
541	check_and_do_in_subquery_rewrites()
542	join Subquery's join
543
544	DESCRIPTION
545	Check if we need to do
546	- subquery -> mergeable semi-join rewrite
547	- if the subquery can be handled with materialization
548	- 'substitution' rewrite for table-less subqueries like "(select 1)"
549	- IN->EXISTS rewrite
550	and, depending on the rewrite, either do it, or record it to be done at a
551	later phase.
552
553	RETURN
554	0 - OK
555	Other - Some sort of query error
556	*/
557
558	int check_and_do_in_subquery_rewrites(JOIN *join)
559	{
560	THD *thd=join->thd;
561	st_select_lex *select_lex= join->select_lex;
562	st_select_lex_unit* parent_unit= select_lex->master_unit();
563	DBUG_ENTER("check_and_do_in_subquery_rewrites");
564
565	/*
566	IN/ALL/ANY rewrites are not applicable for so called fake select
567	(this select exists only to filter results of union if it is needed).
568	*/
569	if (select_lex == select_lex->master_unit()->fake_select_lex)
570	DBUG_RETURN(`0`);
571
572	/*
573	If
574	1) this join is inside a subquery (of any type except FROM-clause
575	subquery) and
576	2) we aren't just normalizing a VIEW
577
578	Then perform early unconditional subquery transformations:
579	- Convert subquery predicate into semi-join, or
580	- Mark the subquery for execution using materialization, or
581	- Perform IN->EXISTS transformation, or
582	- Perform more/less ALL/ANY -> MIN/MAX rewrite
583	- Substitute trivial scalar-context subquery with its value
584
585	TODO: for PS, make the whole block execute only on the first execution
586	*/
587	Item_subselect *subselect;
588	if (!thd->lex->is_view_context_analysis() && // (1)
589	(subselect= parent_unit->item)) // (2)
590	{
591	Item_in_subselect *in_subs= NULL;
592	Item_allany_subselect *allany_subs= NULL;
593	switch (subselect->substype()) {
594	case Item_subselect::IN_SUBS:
595	in_subs= (Item_in_subselect *)subselect;
596	break;
597	case Item_subselect::ALL_SUBS:
598	case Item_subselect::ANY_SUBS:
599	allany_subs= (Item_allany_subselect *)subselect;
600	break;
601	default:
602	break;
603	}
604
605
606	/ Resolve expressions and perform semantic analysis for IN query /
607	if (in_subs != NULL)
608	/*
609	TODO: Add the condition below to this if statement when we have proper
610	support for is_correlated handling for materialized semijoins.
611	If we were to add this condition now, the fix_fields() call in
612	convert_subq_to_sj() would force the flag is_correlated to be set
613	erroneously for prepared queries.
614
615	thd->stmt_arena->state != Query_arena::PREPARED)
616	*/
617	{
618	SELECT_LEX *current= thd->lex->current_select;
619	thd->lex->current_select= current->return_after_parsing();
620	char const *save_where= thd->where;
621	thd->where= "IN/ALL/ANY subquery";
622
623	bool failure= !in_subs->left_expr->fixed &&
624	in_subs->left_expr->fix_fields(thd, &in_subs->left_expr);
625	thd->lex->current_select= current;
626	thd->where= save_where;
627	if (failure)
628	DBUG_RETURN(-`1`); / purecov: deadcode /
629
630	/*
631	Check if the left and right expressions have the same # of
632	columns, i.e. we don't have a case like
633	(oe1, oe2) IN (SELECT ie1, ie2, ie3 ...)
634
635	TODO why do we have this duplicated in IN->EXISTS transformers?
636	psergey-todo: fix these: grep for duplicated_subselect_card_check
637	*/
638	if (select_lex->item_list.elements != in_subs->left_expr->cols())
639	{
640	my_error(ER_OPERAND_COLUMNS, MYF(`0`), in_subs->left_expr->cols());
641	DBUG_RETURN(-`1`);
642	}
643	}
644
645	DBUG_PRINT("info", ("Checking if subq can be converted to semi-join"));
646	/*
647	Check if we're in subquery that is a candidate for flattening into a
648	semi-join (which is done in flatten_subqueries()). The
649	requirements are:
650	1. Subquery predicate is an IN/=ANY subq predicate
651	2. Subquery is a single SELECT (not a UNION)
652	3. Subquery does not have GROUP BY or ORDER BY
653	4. Subquery does not use aggregate functions or HAVING
654	5. Subquery predicate is at the AND-top-level of ON/WHERE clause
655	6. We are not in a subquery of a single table UPDATE/DELETE that
656	doesn't have a JOIN (TODO: We should handle this at some
657	point by switching to multi-table UPDATE/DELETE)
658	7. We're not in a table-less subquery like "SELECT 1"
659	8. No execution method was already chosen (by a prepared statement)
660	9. Parent select is not a table-less select
661	10. Neither parent nor child select have STRAIGHT_JOIN option.
662	11. It is first optimisation (the subquery could be moved from ON
663	clause during first optimisation and then be considered for SJ
664	on the second when it is too late)
665	*/
666	if (optimizer_flag(thd, OPTIMIZER_SWITCH_SEMIJOIN) &&
667	in_subs && // 1
668	!select_lex->is_part_of_union() && // 2
669	!select_lex->group_list.elements && !join->order && // 3
670	!join->having && !select_lex->with_sum_func && // 4
671	in_subs->emb_on_expr_nest && // 5
672	select_lex->outer_select()->join && // 6
673	parent_unit->first_select()->leaf_tables.elements && // 7
674	!in_subs->has_strategy() && // 8
675	select_lex->outer_select()->leaf_tables.elements && // 9
676	!((join->select_options \| // 10
677	select_lex->outer_select()->join->select_options) // 10
678	& SELECT_STRAIGHT_JOIN) && // 10
679	select_lex->first_cond_optimization) // 11
680	{
681	DBUG_PRINT("info", ("Subquery is semi-join conversion candidate"));
682
683	(void)subquery_types_allow_materialization(in_subs);
684
685	in_subs->is_flattenable_semijoin= TRUE;
686
687	/ Register the subquery for further processing in flatten_subqueries() /
688	if (!in_subs->is_registered_semijoin)
689	{
690	Query_arena *arena, backup;
691	arena= thd->activate_stmt_arena_if_needed(&backup);
692	select_lex->outer_select()->sj_subselects.push_back(in_subs,
693	thd->mem_root);
694	if (arena)
695	thd->restore_active_arena(arena, &backup);
696	in_subs->is_registered_semijoin= TRUE;
697	}
698	}
699	else
700	{
701	DBUG_PRINT("info", ("Subquery can't be converted to merged semi-join"));
702	/ Test if the user has set a legal combination of optimizer switches. /
703	if (!optimizer_flag(thd, OPTIMIZER_SWITCH_IN_TO_EXISTS) &&
704	!optimizer_flag(thd, OPTIMIZER_SWITCH_MATERIALIZATION))
705	my_error(ER_ILLEGAL_SUBQUERY_OPTIMIZER_SWITCHES, MYF(`0`));
706	/*
707	Transform each subquery predicate according to its overloaded
708	transformer.
709	*/
710	if (subselect->select_transformer(join))
711	DBUG_RETURN(-`1`);
712
713	/*
714	If the subquery predicate is IN/=ANY, analyse and set all possible
715	subquery execution strategies based on optimizer switches and syntactic
716	properties.
717	*/
718	if (in_subs && !in_subs->has_strategy())
719	{
720	if (is_materialization_applicable(thd, in_subs, select_lex))
721	{
722	in_subs->add_strategy(SUBS_MATERIALIZATION);
723
724	/*
725	If the subquery is an AND-part of WHERE register for being processed
726	with jtbm strategy
727	*/
728	if (in_subs->emb_on_expr_nest == NO_JOIN_NEST &&
729	optimizer_flag(thd, OPTIMIZER_SWITCH_SEMIJOIN))
730	{
731	in_subs->is_flattenable_semijoin= FALSE;
732	if (!in_subs->is_registered_semijoin)
733	{
734	Query_arena *arena, backup;
735	arena= thd->activate_stmt_arena_if_needed(&backup);
736	select_lex->outer_select()->sj_subselects.push_back(in_subs,
737	thd->mem_root);
738	if (arena)
739	thd->restore_active_arena(arena, &backup);
740	in_subs->is_registered_semijoin= TRUE;
741	}
742	}
743	}
744
745	/*
746	IN-TO-EXISTS is the only universal strategy. Choose it if the user
747	allowed it via an optimizer switch, or if materialization is not
748	possible.
749	*/
750	if (optimizer_flag(thd, OPTIMIZER_SWITCH_IN_TO_EXISTS) \|\|
751	!in_subs->has_strategy())
752	in_subs->add_strategy(SUBS_IN_TO_EXISTS);
753	}
754
755	/ Check if max/min optimization applicable /
756	if (allany_subs && !allany_subs->is_set_strategy())
757	{
758	uchar strategy= (allany_subs->is_maxmin_applicable(join) ?
759	(SUBS_MAXMIN_INJECTED \| SUBS_MAXMIN_ENGINE) :
760	SUBS_IN_TO_EXISTS);
761	allany_subs->add_strategy(strategy);
762	}
763
764	}
765	}
766	DBUG_RETURN(`0`);
767	}
768
769
770	/**
771	@brief Check if subquery's compared types allow materialization.
772
773	@param in_subs Subquery predicate, updated as follows:
774	types_allow_materialization TRUE if subquery materialization is allowed.
775	sjm_scan_allowed If types_allow_materialization is TRUE,
776	indicates whether it is possible to use subquery
777	materialization and scan the materialized table.
778
779	@retval TRUE If subquery types allow materialization.
780	@retval FALSE Otherwise.
781
782	@details
783	This is a temporary fix for BUG#36752.
784
785	There are two subquery materialization strategies:
786
787	1. Materialize and do index lookups in the materialized table. See
788	BUG#36752 for description of restrictions we need to put on the
789	compared expressions.
790
791	2. Materialize and then do a full scan of the materialized table. At the
792	moment, this strategy's applicability criteria are even stricter than
793	in #1.
794
795	This is so because of the following: consider an uncorrelated subquery
796
797	...WHERE (ot1.col1, ot2.col2 ...) IN (SELECT ie1,ie2,... FROM it1 ...)
798
799	and a join order that could be used to do sjm-materialization:
800
801	SJM-Scan(it1, it1), ot1, ot2
802
803	IN-equalities will be parts of conditions attached to the outer tables:
804
805	ot1: ot1.col1 = ie1 AND ... (C1)
806	ot2: ot1.col2 = ie2 AND ... (C2)
807
808	besides those there may be additional references to ie1 and ie2
809	generated by equality propagation. The problem with evaluating C1 and
810	C2 is that ie{1,2} refer to subquery tables' columns, while we only have
811	current value of materialization temptable. Our solution is to
812	* require that all ie{N} are table column references. This allows
813	to copy the values of materialization temptable columns to the
814	original table's columns (see setup_sj_materialization for more
815	details)
816	* require that compared columns have exactly the same type. This is
817	a temporary measure to avoid BUG#36752-type problems.
818	*/
819
820	static
821	bool subquery_types_allow_materialization(Item_in_subselect *in_subs)
822	{
823	DBUG_ENTER("subquery_types_allow_materialization");
824
825	DBUG_ASSERT(in_subs->left_expr->fixed);
826
827	List_iterator<Item> it(in_subs->unit->first_select()->item_list);
828	uint elements= in_subs->unit->first_select()->item_list.elements;
829
830	in_subs->types_allow_materialization= FALSE; // Assign default values
831	in_subs->sjm_scan_allowed= FALSE;
832
833	bool all_are_fields= TRUE;
834	uint32 total_key_length = `0`;
835	for (uint i= `0`; i < elements; i++)
836	{
837	Item *outer= in_subs->left_expr->element_index(i);
838	Item *inner= it ++;
839	all_are_fields &= (outer->real_item()->type() == Item::FIELD_ITEM &&
840	inner->real_item()->type() == Item::FIELD_ITEM);
841	total_key_length += inner->max_length;
842	if (!inner->type_handler()->subquery_type_allows_materialization(inner,
843	outer))
844	DBUG_RETURN(FALSE);
845	}
846
847	/*
848	Make sure that create_tmp_table will not fail due to too long keys.
849	See MDEV-7122. This check is performed inside create_tmp_table also and
850	we must do it so that we know the table has keys created.
851	Make sure that the length of the key for the temp_table is atleast
852	greater than 0.
853	*/
854	if (!total_key_length \|\| total_key_length > tmp_table_max_key_length() \|\|
855	elements > tmp_table_max_key_parts())
856	DBUG_RETURN(FALSE);
857
858	in_subs->types_allow_materialization= TRUE;
859	in_subs->sjm_scan_allowed= all_are_fields;
860	DBUG_PRINT("info",("subquery_types_allow_materialization: ok, allowed"));
861	DBUG_RETURN(TRUE);
862	}
863
864
865	/**
866	Apply max min optimization of all/any subselect
867	*/
868
869	bool JOIN::transform_max_min_subquery()
870	{
871	DBUG_ENTER("JOIN::transform_max_min_subquery");
872	Item_subselect *subselect= unit->item;
873	if (!subselect \|\| (subselect->substype() != Item_subselect::ALL_SUBS &&
874	subselect->substype() != Item_subselect::ANY_SUBS))
875	DBUG_RETURN(`0`);
876	DBUG_RETURN(((Item_allany_subselect *) subselect)->
877	transform_into_max_min(this));
878	}
879
880
881	/*
882	Finalize IN->EXISTS conversion in case we couldn't use materialization.
883
884	DESCRIPTION Invoke the IN->EXISTS converter
885	Replace the Item_in_subselect with its wrapper Item_in_optimizer in WHERE.
886
887	RETURN
888	FALSE - Ok
889	TRUE - Fatal error
890	*/
891
892	bool make_in_exists_conversion(THD thd, JOIN join, Item_in_subselect *item)
893	{
894	DBUG_ENTER("make_in_exists_conversion");
895	JOIN *child_join= item->unit->first_select()->join;
896	bool res;
897
898	/*
899	We're going to finalize IN->EXISTS conversion.
900	Normally, IN->EXISTS conversion takes place inside the
901	Item_subselect::fix_fields() call, where item_subselect->fixed==FALSE (as
902	fix_fields() haven't finished yet) and item_subselect->changed==FALSE (as
903	the conversion haven't been finalized)
904
905	At the end of Item_subselect::fix_fields() we had to set fixed=TRUE,
906	changed=TRUE (the only other option would have been to return error).
907
908	So, now we have to set these back for the duration of select_transformer()
909	call.
910	*/
911	item->changed= `0`;
912	item->fixed= `0`;
913
914	SELECT_LEX *save_select_lex= thd->lex->current_select;
915	thd->lex->current_select= item->unit->first_select();
916
917	res= item->select_transformer(child_join);
918
919	thd->lex->current_select= save_select_lex;
920
921	if (res)
922	DBUG_RETURN(TRUE);
923
924	item->changed= `1`;
925	item->fixed= `1`;
926
927	Item *substitute= item->substitution;
928	bool do_fix_fields= !item->substitution->fixed;
929	/*
930	The Item_subselect has already been wrapped with Item_in_optimizer, so we
931	should search for item->optimizer, not 'item'.
932	*/
933	Item *replace_me= item->optimizer;
934	DBUG_ASSERT(replace_me==substitute);
935
936	Item **tree= (item->emb_on_expr_nest == NO_JOIN_NEST)?
937	&join->conds : &(item->emb_on_expr_nest->on_expr);
938	if (replace_where_subcondition(join, tree, replace_me, substitute,
939	do_fix_fields))
940	DBUG_RETURN(TRUE);
941	item->substitution= NULL;
942
943	/*
944	If this is a prepared statement, repeat the above operation for
945	prep_where (or prep_on_expr).
946	*/
947	if (!thd->stmt_arena->is_conventional())
948	{
949	tree= (item->emb_on_expr_nest == (TABLE_LIST*)NO_JOIN_NEST)?
950	&join->select_lex->prep_where :
951	&(item->emb_on_expr_nest->prep_on_expr);
952
953	if (replace_where_subcondition(join, tree, replace_me, substitute,
954	FALSE))
955	DBUG_RETURN(TRUE);
956	}
957	DBUG_RETURN(FALSE);
958	}
959
960
961	bool check_for_outer_joins(List<TABLE_LIST> *join_list)
962	{
963	TABLE_LIST *table;
964	NESTED_JOIN *nested_join;
965	List_iterator<TABLE_LIST> li(*join_list);
966	while ((table= li ++))
967	{
968	if ((nested_join= table->nested_join))
969	{
970	if (check_for_outer_joins(&nested_join->join_list))
971	return TRUE;
972	}
973
974	if (table->outer_join)
975	return TRUE;
976	}
977	return FALSE;
978	}
979
980
981	void find_and_block_conversion_to_sj(Item *to_find,
982	List_iterator_fast<Item_in_subselect> &li)
983	{
984	if (to_find->type() == Item::FUNC_ITEM &&
985	((Item_func*)to_find)->functype() == Item_func::IN_OPTIMIZER_FUNC)
986	to_find= ((Item_in_optimizer*)to_find)->get_wrapped_in_subselect_item();
987
988	if (to_find->type() != Item::SUBSELECT_ITEM \|\|
989	((Item_subselect *) to_find)->substype() != Item_subselect::IN_SUBS)
990	return;
991	Item_in_subselect *in_subq;
992	li.rewind();
993	while ((in_subq= li ++))
994	{
995	if (in_subq == to_find)
996	{
997	in_subq->block_conversion_to_sj();
998	return;
999	}
1000	}
1001	}
1002
1003
1004	/*
1005	Convert semi-join subquery predicates into semi-join join nests
1006
1007	SYNOPSIS
1008	convert_join_subqueries_to_semijoins()
1009
1010	DESCRIPTION
1011
1012	Convert candidate subquery predicates into semi-join join nests. This
1013	transformation is performed once in query lifetime and is irreversible.
1014
1015	Conversion of one subquery predicate
1016	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1017	We start with a join that has a semi-join subquery:
1018
1019	SELECT ...
1020	FROM ot, ...
1021	WHERE oe IN (SELECT ie FROM it1 ... itN WHERE subq_where) AND outer_where
1022
1023	and convert it into a semi-join nest:
1024
1025	SELECT ...
1026	FROM ot SEMI JOIN (it1 ... itN), ...
1027	WHERE outer_where AND subq_where AND oe=ie
1028
1029	that is, in order to do the conversion, we need to
1030
1031	* Create the "SEMI JOIN (it1 .. itN)" part and add it into the parent
1032	query's FROM structure.
1033	* Add "AND subq_where AND oe=ie" into parent query's WHERE (or ON if
1034	the subquery predicate was in an ON expression)
1035	* Remove the subquery predicate from the parent query's WHERE
1036
1037	Considerations when converting many predicates
1038	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1039	A join may have at most MAX_TABLES tables. This may prevent us from
1040	flattening all subqueries when the total number of tables in parent and
1041	child selects exceeds MAX_TABLES.
1042	We deal with this problem by flattening children's subqueries first and
1043	then using a heuristic rule to determine each subquery predicate's
1044	"priority".
1045
1046	RETURN
1047	FALSE OK
1048	TRUE Error
1049	*/
1050
1051	bool convert_join_subqueries_to_semijoins(JOIN *join)
1052	{
1053	Query_arena *arena, backup;
1054	Item_in_subselect *in_subq;
1055	THD *thd= join->thd;
1056	DBUG_ENTER("convert_join_subqueries_to_semijoins");
1057
1058	if (join->select_lex->sj_subselects.is_empty())
1059	DBUG_RETURN(FALSE);
1060
1061	List_iterator_fast<Item_in_subselect> li(join->select_lex->sj_subselects);
1062
1063	while ((in_subq= li ++))
1064	{
1065	SELECT_LEX *subq_sel= in_subq->get_select_lex();
1066	if (subq_sel->handle_derived(thd->lex, DT_MERGE))
1067	DBUG_RETURN(TRUE);
1068	if (subq_sel->join->transform_in_predicates_into_in_subq(thd))
1069	DBUG_RETURN(TRUE);
1070	subq_sel->update_used_tables();
1071	}
1072
1073	/*
1074	Check all candidates to semi-join conversion that occur
1075	in ON expressions of outer join. Set the flag blocking
1076	this conversion for them.
1077	*/
1078	TABLE_LIST *tbl;
1079	List_iterator<TABLE_LIST> ti(join->select_lex->leaf_tables);
1080	while ((tbl= ti ++))
1081	{
1082	TABLE_LIST *embedded;
1083	TABLE_LIST *embedding= tbl;
1084	do
1085	{
1086	embedded= embedding;
1087	bool block_conversion_to_sj= false;
1088	if (embedded->on_expr)
1089	{
1090	/*
1091	Conversion of an IN subquery predicate into semi-join
1092	is blocked now if the predicate occurs:
1093	- in the ON expression of an outer join
1094	- in the ON expression of an inner join embedded directly
1095	or indirectly in the inner nest of an outer join
1096	*/
1097	for (TABLE_LIST *tl= embedded; tl; tl= tl->embedding)
1098	{
1099	if (tl->outer_join)
1100	{
1101	block_conversion_to_sj= true;
1102	break;
1103	}
1104	}
1105	}
1106	if (block_conversion_to_sj)
1107	{
1108	Item *cond= embedded->on_expr;
1109	if (!cond)
1110	;
1111	else if (cond->type() != Item::COND_ITEM)
1112	find_and_block_conversion_to_sj(cond, li);
1113	else if (((Item_cond*) cond)->functype() ==
1114	Item_func::COND_AND_FUNC)
1115	{
1116	Item *item;
1117	List_iterator<Item> it((((Item_cond) cond)->argument_list()));
1118	while ((item= it ++))
1119	{
1120	find_and_block_conversion_to_sj(item, li);
1121	}
1122	}
1123	}
1124	embedding= embedded->embedding;
1125	}
1126	while (embedding &&
1127	embedding->nested_join->join_list.head() == embedded);
1128	}
1129
1130	/*
1131	Block conversion to semi-joins for those candidates that
1132	are encountered in the WHERE condition of the multi-table view
1133	with CHECK OPTION if this view is used in UPDATE/DELETE.
1134	(This limitation can be, probably, easily lifted.)
1135	*/
1136	li.rewind();
1137	while ((in_subq= li ++))
1138	{
1139	if (in_subq->emb_on_expr_nest != NO_JOIN_NEST &&
1140	in_subq->emb_on_expr_nest->effective_with_check)
1141	{
1142	in_subq->block_conversion_to_sj();
1143	}
1144	}
1145
1146	if (join->select_options & SELECT_STRAIGHT_JOIN)
1147	{
1148	/ Block conversion to semijoins for all candidates /
1149	li.rewind();
1150	while ((in_subq= li ++))
1151	{
1152	in_subq->block_conversion_to_sj();
1153	}
1154	}
1155
1156	li.rewind();
1157	/ First, convert child join's subqueries. We proceed bottom-up here /
1158	while ((in_subq= li ++))
1159	{
1160	st_select_lex *child_select= in_subq->get_select_lex();
1161	JOIN *child_join= child_select->join;
1162	child_join->outer_tables = child_join->table_count;
1163
1164	/*
1165	child_select->where contains only the WHERE predicate of the
1166	subquery itself here. We may be selecting from a VIEW, which has its
1167	own predicate. The combined predicates are available in child_join->conds,
1168	which was built by setup_conds() doing prepare_where() for all views.
1169	*/
1170	child_select->where= child_join->conds;
1171
1172	if (convert_join_subqueries_to_semijoins(child_join))
1173	DBUG_RETURN(TRUE);
1174
1175
1176	in_subq->sj_convert_priority=
1177	MY_TEST(in_subq->do_not_convert_to_sj) * MAX_TABLES * `2` +
1178	in_subq->is_correlated * MAX_TABLES + child_join->outer_tables;
1179	}
1180
1181	// Temporary measure: disable semi-joins when they are together with outer
1182	// joins.
1183	#if 0
1184	if (check_for_outer_joins(join->join_list))
1185	{
1186	in_subq= join->select_lex->sj_subselects.head();
1187	arena= thd->activate_stmt_arena_if_needed(&backup);
1188	goto skip_conversion;
1189	}
1190	#endif
1191	//dump_TABLE_LIST_struct(select_lex, select_lex->leaf_tables);
1192	/*
1193	2. Pick which subqueries to convert:
1194	sort the subquery array
1195	- prefer correlated subqueries over uncorrelated;
1196	- prefer subqueries that have greater number of outer tables;
1197	*/
1198	bubble_sort<Item_in_subselect>(&join->select_lex->sj_subselects,
1199	subq_sj_candidate_cmp, NULL);
1200	// #tables-in-parent-query + #tables-in-subquery < MAX_TABLES
1201	/ Replace all subqueries to be flattened with Item_int(1) /
1202	arena= thd->activate_stmt_arena_if_needed(&backup);
1203
1204	li.rewind();
1205	while ((in_subq= li ++))
1206	{
1207	bool remove_item= TRUE;
1208
1209	/ Stop processing if we've reached a subquery that's attached to the ON clause /
1210	if (in_subq->do_not_convert_to_sj)
1211	break;
1212
1213	if (in_subq->is_flattenable_semijoin)
1214	{
1215	if (join->table_count +
1216	in_subq->unit->first_select()->join->table_count >= MAX_TABLES)
1217	break;
1218	if (convert_subq_to_sj(join, in_subq))
1219	goto restore_arena_and_fail;
1220	}
1221	else
1222	{
1223	if (join->table_count + `1` >= MAX_TABLES)
1224	break;
1225	if (convert_subq_to_jtbm(join, in_subq, &remove_item))
1226	goto restore_arena_and_fail;
1227	}
1228	if (remove_item)
1229	{
1230	Item **tree= (in_subq->emb_on_expr_nest == NO_JOIN_NEST)?
1231	&join->conds : &(in_subq->emb_on_expr_nest->on_expr);
1232	Item *replace_me= in_subq->original_item();
1233	if (replace_where_subcondition(join, tree, replace_me,
1234	new (thd->mem_root) Item_int (thd, `1`),
1235	FALSE))
1236	goto restore_arena_and_fail;
1237	}
1238	}
1239	//skip_conversion:
1240	/*
1241	3. Finalize (perform IN->EXISTS rewrite) the subqueries that we didn't
1242	convert:
1243	*/
1244	while (in_subq)
1245	{
1246	JOIN *child_join= in_subq->unit->first_select()->join;
1247	in_subq->changed= `0`;
1248	in_subq->fixed= `0`;
1249
1250	SELECT_LEX *save_select_lex= thd->lex->current_select;
1251	thd->lex->current_select= in_subq->unit->first_select();
1252
1253	bool res= in_subq->select_transformer(child_join);
1254
1255	thd->lex->current_select= save_select_lex;
1256
1257	if (res)
1258	DBUG_RETURN(TRUE);
1259
1260	in_subq->changed= `1`;
1261	in_subq->fixed= `1`;
1262
1263	Item *substitute= in_subq->substitution;
1264	bool do_fix_fields= !in_subq->substitution->fixed;
1265	Item **tree= (in_subq->emb_on_expr_nest == NO_JOIN_NEST)?
1266	&join->conds : &(in_subq->emb_on_expr_nest->on_expr);
1267	Item *replace_me= in_subq->original_item();
1268	if (replace_where_subcondition(join, tree, replace_me, substitute,
1269	do_fix_fields))
1270	DBUG_RETURN(TRUE);
1271	in_subq->substitution= NULL;
1272	/*
1273	If this is a prepared statement, repeat the above operation for
1274	prep_where (or prep_on_expr). Subquery-to-semijoin conversion is
1275	done once for prepared statement.
1276	*/
1277	if (!thd->stmt_arena->is_conventional())
1278	{
1279	tree= (in_subq->emb_on_expr_nest == NO_JOIN_NEST)?
1280	&join->select_lex->prep_where :
1281	&(in_subq->emb_on_expr_nest->prep_on_expr);
1282	/*
1283	prep_on_expr/ prep_where may be NULL in some cases.
1284	If that is the case, do nothing - simplify_joins() will copy
1285	ON/WHERE expression into prep_on_expr/prep_where.
1286	*/
1287	if (*tree && replace_where_subcondition(join, tree, replace_me, substitute,
1288	FALSE))
1289	DBUG_RETURN(TRUE);
1290	}
1291	/*
1292	Revert to the IN->EXISTS strategy in the rare case when the subquery could
1293	not be flattened.
1294	*/
1295	in_subq->reset_strategy(SUBS_IN_TO_EXISTS);
1296	if (is_materialization_applicable(thd, in_subq,
1297	in_subq->unit->first_select()))
1298	{
1299	in_subq->add_strategy(SUBS_MATERIALIZATION);
1300	}
1301
1302	in_subq= li ++;
1303	}
1304
1305	if (arena)
1306	thd->restore_active_arena(arena, &backup);
1307	join->select_lex->sj_subselects.empty();
1308	DBUG_RETURN(FALSE);
1309
1310	restore_arena_and_fail:
1311	if (arena)
1312	thd->restore_active_arena(arena, &backup);
1313	DBUG_RETURN(TRUE);
1314	}
1315
1316
1317	/*
1318	Get #output_rows and scan_time estimates for a "delayed" table.
1319
1320	SYNOPSIS
1321	get_delayed_table_estimates()
1322	table IN Table to get estimates for
1323	out_rows OUT E(#rows in the table)
1324	scan_time OUT E(scan_time).
1325	startup_cost OUT cost to populate the table.
1326
1327	DESCRIPTION
1328	Get #output_rows and scan_time estimates for a "delayed" table. By
1329	"delayed" here we mean that the table is filled at the start of query
1330	execution. This means that the optimizer can't use table statistics to
1331	get #rows estimate for it, it has to call this function instead.
1332
1333	This function is expected to make different actions depending on the nature
1334	of the table. At the moment there is only one kind of delayed tables,
1335	non-flattenable semi-joins.
1336	*/
1337
1338	void get_delayed_table_estimates(TABLE *table,
1339	ha_rows *out_rows,
1340	double *scan_time,
1341	double *startup_cost)
1342	{
1343	Item_in_subselect *item= table->pos_in_table_list->jtbm_subselect;
1344
1345	DBUG_ASSERT(item->engine->engine_type() ==
1346	subselect_engine::HASH_SJ_ENGINE);
1347
1348	subselect_hash_sj_engine *hash_sj_engine=
1349	((subselect_hash_sj_engine*)item->engine);
1350
1351	*out_rows= (ha_rows)item->jtbm_record_count;
1352	*startup_cost= item->jtbm_read_time;
1353
1354	/ Calculate cost of scanning the temptable /
1355	double data_size= item->jtbm_record_count *
1356	hash_sj_engine->tmp_table->s->reclength;
1357	/ Do like in handler::read_time /
1358	*scan_time= data_size/IO_SIZE + `2`;
1359	}
1360
1361
1362	/**
1363	@brief Replaces an expression destructively inside the expression tree of
1364	the WHERE clase.
1365
1366	@note We substitute AND/OR structure because it was copied by
1367	copy_andor_structure and some changes could be done in the copy but
1368	should be left permanent, also there could be several layers of AND over
1369	AND and OR over OR because ::fix_field() possibly is not called.
1370
1371	@param join The top-level query.
1372	@param old_cond The expression to be replaced.
1373	@param new_cond The expression to be substituted.
1374	@param do_fix_fields If true, Item::fix_fields(THD, Item*) is called for
1375	the new expression.
1376	@return <code>true</code> if there was an error, <code>false</code> if
1377	successful.
1378	*/
1379
1380	static bool replace_where_subcondition(JOIN join, Item *expr,
1381	Item old_cond, Item new_cond,
1382	bool do_fix_fields)
1383	{
1384	if (*expr == old_cond)
1385	{
1386	*expr= new_cond;
1387	if (do_fix_fields)
1388	new_cond->fix_fields(join->thd, expr);
1389	return FALSE;
1390	}
1391
1392	if ((*expr)->type() == Item::COND_ITEM)
1393	{
1394	List_iterator<Item> li(((Item_cond)(*expr))->argument_list());
1395	Item *item;
1396	while ((item= li ++))
1397	{
1398	if (item == old_cond)
1399	{
1400	li.replace(new_cond);
1401	if (do_fix_fields)
1402	new_cond->fix_fields(join->thd, li.ref());
1403	return FALSE;
1404	}
1405	else if (item->type() == Item::COND_ITEM)
1406	{
1407	replace_where_subcondition(join, li.ref(),
1408	old_cond, new_cond,
1409	do_fix_fields);
1410	}
1411	}
1412	}
1413	/*
1414	We can come to here when
1415	- we're doing replace operations on both on_expr and prep_on_expr
1416	- on_expr is the same as prep_on_expr, or they share a sub-tree
1417	(so, when we do replace in on_expr, we replace in prep_on_expr, too,
1418	and when we try doing a replace in prep_on_expr, the item we wanted
1419	to replace there has already been replaced)
1420	*/
1421	return FALSE;
1422	}
1423
1424	static int subq_sj_candidate_cmp(Item_in_subselect* el1, Item_in_subselect* el2,
1425	void *arg)
1426	{
1427	return (el1->sj_convert_priority > el2->sj_convert_priority) ? -`1` :
1428	( (el1->sj_convert_priority == el2->sj_convert_priority)? `0` : `1`);
1429	}
1430
1431
1432	/*
1433	Convert a subquery predicate into a TABLE_LIST semi-join nest
1434
1435	SYNOPSIS
1436	convert_subq_to_sj()
1437	parent_join Parent join, the one that has subq_pred in its WHERE/ON
1438	clause
1439	subq_pred Subquery predicate to be converted
1440
1441	DESCRIPTION
1442	Convert a subquery predicate into a TABLE_LIST semi-join nest. All the
1443	prerequisites are already checked, so the conversion is always successfull.
1444
1445	Prepared Statements: the transformation is permanent:
1446	- Changes in TABLE_LIST structures are naturally permanent
1447	- Item tree changes are performed on statement MEM_ROOT:
1448	= we activate statement MEM_ROOT
1449	= this function is called before the first fix_prepare_information
1450	call.
1451
1452	This is intended because the criteria for subquery-to-sj conversion remain
1453	constant for the lifetime of the Prepared Statement.
1454
1455	RETURN
1456	FALSE OK
1457	TRUE Out of memory error
1458	*/
1459
1460	static bool convert_subq_to_sj(JOIN parent_join, Item_in_subselect subq_pred)
1461	{
1462	SELECT_LEX *parent_lex= parent_join->select_lex;
1463	TABLE_LIST *emb_tbl_nest= NULL;
1464	List<TABLE_LIST> *emb_join_list= &parent_lex->top_join_list;
1465	THD *thd= parent_join->thd;
1466	DBUG_ENTER("convert_subq_to_sj");
1467
1468	/*
1469	1. Find out where to put the predicate into.
1470	Note: for "t1 LEFT JOIN t2" this will be t2, a leaf.
1471	*/
1472	if ((void)subq_pred->emb_on_expr_nest != (void**)NO_JOIN_NEST)
1473	{
1474	if (subq_pred->emb_on_expr_nest->nested_join)
1475	{
1476	/*
1477	We're dealing with
1478
1479	... [LEFT] JOIN ( ... ) ON (subquery AND whatever) ...
1480
1481	The sj-nest will be inserted into the brackets nest.
1482	*/
1483	emb_tbl_nest= subq_pred->emb_on_expr_nest;
1484	emb_join_list= &emb_tbl_nest->nested_join->join_list;
1485	}
1486	else if (!subq_pred->emb_on_expr_nest->outer_join)
1487	{
1488	/*
1489	We're dealing with
1490
1491	... INNER JOIN tblX ON (subquery AND whatever) ...
1492
1493	The sj-nest will be tblX's "sibling", i.e. another child of its
1494	parent. This is ok because tblX is joined as an inner join.
1495	*/
1496	emb_tbl_nest= subq_pred->emb_on_expr_nest->embedding;
1497	if (emb_tbl_nest)
1498	emb_join_list= &emb_tbl_nest->nested_join->join_list;
1499	}
1500	else if (!subq_pred->emb_on_expr_nest->nested_join)
1501	{
1502	TABLE_LIST *outer_tbl= subq_pred->emb_on_expr_nest;
1503	TABLE_LIST *wrap_nest;
1504	LEX_CSTRING sj_wrap_name= { STRING_WITH_LEN("(sj-wrap)") };
1505	/*
1506	We're dealing with
1507
1508	... LEFT JOIN tbl ON (on_expr AND subq_pred) ...
1509
1510	we'll need to convert it into:
1511
1512	... LEFT JOIN ( tbl SJ (subq_tables) ) ON (on_expr AND subq_pred) ...
1513	\| \|
1514	\|<----- wrap_nest ---->\|
1515
1516	Q: other subqueries may be pointing to this element. What to do?
1517	A1: simple solution: copy subq_pred->expr_join_nest= parent_nest.
1518	But we'll need to fix other pointers.
1519	A2: Another way: have TABLE_LIST::next_ptr so the following
1520	subqueries know the table has been nested.
1521	A3: changes in the TABLE_LIST::outer_join will make everything work
1522	automatically.
1523	*/
1524	if (!(wrap_nest= alloc_join_nest(thd)))
1525	{
1526	DBUG_RETURN(TRUE);
1527	}
1528	wrap_nest->embedding= outer_tbl->embedding;
1529	wrap_nest->join_list= outer_tbl->join_list;
1530	wrap_nest->alias = sj_wrap_name;
1531
1532	wrap_nest->nested_join->join_list.empty();
1533	wrap_nest->nested_join->join_list.push_back(outer_tbl, thd->mem_root);
1534
1535	outer_tbl->embedding= wrap_nest;
1536	outer_tbl->join_list= &wrap_nest->nested_join->join_list;
1537
1538	/*
1539	wrap_nest will take place of outer_tbl, so move the outer join flag
1540	and on_expr
1541	*/
1542	wrap_nest->outer_join= outer_tbl->outer_join;
1543	outer_tbl->outer_join= `0`;
1544
1545	wrap_nest->on_expr= outer_tbl->on_expr;
1546	outer_tbl->on_expr= NULL;
1547
1548	List_iterator<TABLE_LIST> li(*wrap_nest->join_list);
1549	TABLE_LIST *tbl;
1550	while ((tbl= li ++))
1551	{
1552	if (tbl == outer_tbl)
1553	{
1554	li.replace(wrap_nest);
1555	break;
1556	}
1557	}
1558	/*
1559	Ok now wrap_nest 'contains' outer_tbl and we're ready to add the
1560	semi-join nest into it
1561	*/
1562	emb_join_list= &wrap_nest->nested_join->join_list;
1563	emb_tbl_nest= wrap_nest;
1564	}
1565	}
1566
1567	TABLE_LIST *sj_nest;
1568	NESTED_JOIN *nested_join;
1569	LEX_CSTRING sj_nest_name= { STRING_WITH_LEN("(sj-nest)") };
1570	if (!(sj_nest= alloc_join_nest(thd)))
1571	{
1572	DBUG_RETURN(TRUE);
1573	}
1574	nested_join= sj_nest->nested_join;
1575
1576	sj_nest->join_list= emb_join_list;
1577	sj_nest->embedding= emb_tbl_nest;
1578	sj_nest->alias = sj_nest_name;
1579	sj_nest->sj_subq_pred= subq_pred;
1580	sj_nest->original_subq_pred_used_tables= subq_pred->used_tables() \|
1581	subq_pred->left_expr->used_tables();
1582	/ Nests do not participate in those 'chains', so: /
1583	/ sj_nest->next_leaf= sj_nest->next_local= sj_nest->next_global == NULL/
1584	emb_join_list->push_back(sj_nest, thd->mem_root);
1585
1586	/*
1587	nested_join->used_tables and nested_join->not_null_tables are
1588	initialized in simplify_joins().
1589	*/
1590
1591	/*
1592	2. Walk through subquery's top list and set 'embedding' to point to the
1593	sj-nest.
1594	*/
1595	st_select_lex *subq_lex= subq_pred->unit->first_select();
1596	DBUG_ASSERT(subq_lex->next_select() == NULL);
1597	nested_join->join_list.empty();
1598	List_iterator_fast<TABLE_LIST> li(subq_lex->top_join_list);
1599	TABLE_LIST *tl;
1600	while ((tl= li ++))
1601	{
1602	tl->embedding= sj_nest;
1603	tl->join_list= &nested_join->join_list;
1604	nested_join->join_list.push_back(tl, thd->mem_root);
1605	}
1606
1607	/*
1608	Reconnect the next_leaf chain.
1609	TODO: Do we have to put subquery's tables at the end of the chain?
1610	Inserting them at the beginning would be a bit faster.
1611	NOTE: We actually insert them at the front! That's because the order is
1612	reversed in this list.
1613	*/
1614	parent_lex->leaf_tables.append(&subq_lex->leaf_tables);
1615
1616	if (subq_lex->options & OPTION_SCHEMA_TABLE)
1617	parent_lex->options \|= OPTION_SCHEMA_TABLE;
1618
1619	/*
1620	Same as above for next_local chain
1621	(a theory: a next_local chain always starts with ::leaf_tables
1622	because view's tables are inserted after the view)
1623	*/
1624
1625	for (tl= (TABLE_LIST*)(parent_lex->table_list.first); tl->next_local; tl= tl->next_local)
1626	{}
1627
1628	tl->next_local= subq_lex->join->tables_list;
1629
1630	/ A theory: no need to re-connect the next_global chain /
1631
1632	/ 3. Remove the original subquery predicate from the WHERE/ON /
1633
1634	// The subqueries were replaced for Item_int(1) earlier
1635	subq_pred->reset_strategy(SUBS_SEMI_JOIN); // for subsequent executions
1636	/TODO: also reset the 'm_with_subquery' there. /
1637
1638	/ n. Adjust the parent_join->table_count counter /
1639	uint table_no= parent_join->table_count;
1640	/ n. Walk through child's tables and adjust table->map /
1641	List_iterator_fast<TABLE_LIST> si(subq_lex->leaf_tables);
1642	while ((tl= si ++))
1643	{
1644	tl->set_tablenr(table_no);
1645	if (tl->is_jtbm())
1646	{
1647	tl->jtbm_table_no= table_no;
1648	Item *dummy= tl->jtbm_subselect;
1649	tl->jtbm_subselect->fix_after_pullout(parent_lex, &dummy, true);
1650	DBUG_ASSERT(dummy == tl->jtbm_subselect);
1651	}
1652	SELECT_LEX *old_sl= tl->select_lex;
1653	tl->select_lex= parent_join->select_lex;
1654	for (TABLE_LIST *emb= tl->embedding;
1655	emb && emb->select_lex == old_sl;
1656	emb= emb->embedding)
1657	emb->select_lex= parent_join->select_lex;
1658	table_no++;
1659	}
1660	parent_join->table_count += subq_lex->join->table_count;
1661	//parent_join->table_count += subq_lex->leaf_tables.elements;
1662
1663	/*
1664	Put the subquery's WHERE into semi-join's sj_on_expr
1665	Add the subquery-induced equalities too.
1666	*/
1667	SELECT_LEX *save_lex= thd->lex->current_select;
1668	thd->lex->current_select=subq_lex;
1669	if (!subq_pred->left_expr->fixed &&
1670	subq_pred->left_expr->fix_fields(thd, &subq_pred->left_expr))
1671	DBUG_RETURN(TRUE);
1672	thd->lex->current_select=save_lex;
1673
1674	table_map subq_pred_used_tables= subq_pred->used_tables();
1675	sj_nest->nested_join->sj_corr_tables= subq_pred_used_tables;
1676	sj_nest->nested_join->sj_depends_on= subq_pred_used_tables \|
1677	subq_pred->left_expr->used_tables();
1678	sj_nest->sj_on_expr= subq_lex->join->conds;
1679
1680	/*
1681	Create the IN-equalities and inject them into semi-join's ON expression.
1682	Additionally, for LooseScan strategy
1683	- Record the number of IN-equalities.
1684	- Create list of pointers to (oe1, ..., ieN). We'll need the list to
1685	see which of the expressions are bound and which are not (for those
1686	we'll produce a distinct stream of (ie_i1,...ie_ik).
1687
1688	(TODO: can we just create a list of pointers and hope the expressions
1689	will not substitute themselves on fix_fields()? or we need to wrap
1690	them into Item_direct_view_refs and store pointers to those. The
1691	pointers to Item_direct_view_refs are guaranteed to be stable as
1692	Item_direct_view_refs doesn't substitute itself with anything in
1693	Item_direct_view_ref::fix_fields.
1694	*/
1695	sj_nest->sj_in_exprs= subq_pred->left_expr->cols();
1696	sj_nest->nested_join->sj_outer_expr_list.empty();
1697
1698	if (subq_pred->left_expr->cols() == `1`)
1699	{
1700	/ add left = select_list_element /
1701	nested_join->sj_outer_expr_list.push_back(&subq_pred->left_expr,
1702	thd->mem_root);
1703	/*
1704	Create Item_func_eq. Note that
1705	1. this is done on the statement, not execution, arena
1706	2. if it's a PS then this happens only once - on the first execution.
1707	On following re-executions, the item will be fix_field-ed normally.
1708	3. Thus it should be created as if it was fix_field'ed, in particular
1709	all pointers to items in the execution arena should be protected
1710	with thd->change_item_tree
1711	*/
1712	Item_func_eq *item_eq=
1713	new (thd->mem_root) Item_func_eq (thd, subq_pred->left_expr_orig,
1714	subq_lex->ref_pointer_array [`0`]);
1715	if (!item_eq)
1716	DBUG_RETURN(TRUE);
1717	if (subq_pred->left_expr_orig != subq_pred->left_expr)
1718	thd->change_item_tree(item_eq->arguments(), subq_pred->left_expr);
1719	item_eq->in_equality_no= `0`;
1720	sj_nest->sj_on_expr= and_items(thd, sj_nest->sj_on_expr, item_eq);
1721	}
1722	else if (subq_pred->left_expr->type() == Item::ROW_ITEM)
1723	{
1724	/*
1725	disassemple left expression and add
1726	left1 = select_list_element1 and left2 = select_list_element2 ...
1727	*/
1728	for (uint i= `0`; i < subq_pred->left_expr->cols(); i++)
1729	{
1730	nested_join->sj_outer_expr_list.push_back(subq_pred->left_expr->addr(i),
1731	thd->mem_root);
1732	Item_func_eq *item_eq=
1733	new (thd->mem_root)
1734	Item_func_eq (thd, subq_pred->left_expr_orig->element_index(i),
1735	subq_lex->ref_pointer_array [i]);
1736	if (!item_eq)
1737	DBUG_RETURN(TRUE);
1738	DBUG_ASSERT(subq_pred->left_expr->element_index(i)->fixed);
1739	if (subq_pred->left_expr_orig->element_index(i) !=
1740	subq_pred->left_expr->element_index(i))
1741	thd->change_item_tree(item_eq->arguments(),
1742	subq_pred->left_expr->element_index(i));
1743	item_eq->in_equality_no= i;
1744	sj_nest->sj_on_expr= and_items(thd, sj_nest->sj_on_expr, item_eq);
1745	}
1746	}
1747	else
1748	{
1749	/*
1750	add row operation
1751	left = (select_list_element1, select_list_element2, ...)
1752	*/
1753	Item_row row= new* (thd->mem_root) Item_row (thd, subq_lex->pre_fix);
1754	/ fix fields on subquery was call so they should be the same /
1755	if (!row)
1756	DBUG_RETURN(TRUE);
1757	DBUG_ASSERT(subq_pred->left_expr->cols() == row->cols());
1758	nested_join->sj_outer_expr_list.push_back(&subq_pred->left_expr);
1759	Item_func_eq *item_eq=
1760	new (thd->mem_root) Item_func_eq (thd, subq_pred->left_expr_orig, row);
1761	if (!item_eq)
1762	DBUG_RETURN(TRUE);
1763	for (uint i= `0`; i < row->cols(); i++)
1764	{
1765	if (row->element_index(i) != subq_lex->ref_pointer_array [i])
1766	thd->change_item_tree(row->addr(i), subq_lex->ref_pointer_array [i]);
1767	}
1768	item_eq->in_equality_no= `0`;
1769	sj_nest->sj_on_expr= and_items(thd, sj_nest->sj_on_expr, item_eq);
1770	}
1771	/*
1772	Fix the created equality and AND
1773
1774	Note that fix_fields() can actually fail in a meaningful way here. One
1775	example is when the IN-equality is not valid, because it compares columns
1776	with incompatible collations. (One can argue it would be more appropriate
1777	to check for this at name resolution stage, but as a legacy of IN->EXISTS
1778	we have in here).
1779	*/
1780	if (!sj_nest->sj_on_expr->fixed &&
1781	sj_nest->sj_on_expr->fix_fields(thd, &sj_nest->sj_on_expr))
1782	{
1783	DBUG_RETURN(TRUE);
1784	}
1785
1786	/*
1787	Walk through sj nest's WHERE and ON expressions and call
1788	item->fix_table_changes() for all items.
1789	*/
1790	sj_nest->sj_on_expr->fix_after_pullout(parent_lex, &sj_nest->sj_on_expr,
1791	TRUE);
1792	fix_list_after_tbl_changes(parent_lex, &sj_nest->nested_join->join_list);
1793
1794
1795	/ Unlink the child select_lex so it doesn't show up in EXPLAIN: /
1796	subq_lex->master_unit()->exclude_level();
1797
1798	DBUG_EXECUTE("where",
1799	print_where(sj_nest->sj_on_expr,"SJ-EXPR", QT_ORDINARY););
1800
1801	/ Inject sj_on_expr into the parent's WHERE or ON /
1802	if (emb_tbl_nest)
1803	{
1804	emb_tbl_nest->on_expr= and_items(thd, emb_tbl_nest->on_expr,
1805	sj_nest->sj_on_expr);
1806	emb_tbl_nest->on_expr->top_level_item();
1807	if (!emb_tbl_nest->on_expr->fixed &&
1808	emb_tbl_nest->on_expr->fix_fields(thd,
1809	&emb_tbl_nest->on_expr))
1810	{
1811	DBUG_RETURN(TRUE);
1812	}
1813	}
1814	else
1815	{
1816	/ Inject into the WHERE /
1817	parent_join->conds= and_items(thd, parent_join->conds, sj_nest->sj_on_expr);
1818	parent_join->conds->top_level_item();
1819	/*
1820	fix_fields must update the properties (e.g. st_select_lex::cond_count of
1821	the correct select_lex.
1822	*/
1823	save_lex= thd->lex->current_select;
1824	thd->lex->current_select=parent_join->select_lex;
1825	if (!parent_join->conds->fixed &&
1826	parent_join->conds->fix_fields(thd,
1827	&parent_join->conds))
1828	{
1829	DBUG_RETURN(`1`);
1830	}
1831	thd->lex->current_select=save_lex;
1832	parent_join->select_lex->where= parent_join->conds;
1833	}
1834
1835	if (subq_lex->ftfunc_list->elements)
1836	{
1837	Item_func_match *ifm;
1838	List_iterator_fast<Item_func_match> li(*(subq_lex->ftfunc_list));
1839	while ((ifm= li ++))
1840	parent_lex->ftfunc_list->push_front(ifm, thd->mem_root);
1841	}
1842
1843	parent_lex->have_merged_subqueries= TRUE;
1844	/ Fatal error may have been set to by fix_after_pullout() /
1845	DBUG_RETURN(thd->is_fatal_error);
1846	}
1847
1848
1849	const int SUBQERY_TEMPTABLE_NAME_MAX_LEN= `20`;
1850
1851	static void create_subquery_temptable_name(LEX_STRING *str, uint number)
1852	{
1853	char *to= str->str;
1854	DBUG_ASSERT(number < `10000`);
1855	to= strmov(to, "<subquery");
1856	to= int10_to_str((int) number, to, `10`);
1857	to[`0`]= `'>'`;
1858	to[`1`]= `0`;
1859	str->length= (size_t) (to - str->str)+`1`;
1860	}
1861
1862
1863	/*
1864	Convert subquery predicate into non-mergeable semi-join nest.
1865
1866	TODO:
1867	why does this do IN-EXISTS conversion? Can't we unify it with mergeable
1868	semi-joins? currently, convert_subq_to_sj() cannot fail to convert (unless
1869	fatal errors)
1870
1871
1872	RETURN
1873	FALSE - Ok
1874	TRUE - Fatal error
1875	*/
1876
1877	static bool convert_subq_to_jtbm(JOIN *parent_join,
1878	Item_in_subselect *subq_pred,
1879	bool *remove_item)
1880	{
1881	SELECT_LEX *parent_lex= parent_join->select_lex;
1882	List<TABLE_LIST> *emb_join_list= &parent_lex->top_join_list;
1883	TABLE_LIST emb_tbl_nest= NULL; // will change when we learn to handle outer joins*
1884	TABLE_LIST *tl;
1885	bool optimization_delayed= TRUE;
1886	TABLE_LIST *jtbm;
1887	LEX_STRING tbl_alias;
1888	THD *thd= parent_join->thd;
1889	DBUG_ENTER("convert_subq_to_jtbm");
1890
1891	subq_pred->set_strategy(SUBS_MATERIALIZATION);
1892	subq_pred->is_jtbm_merged= TRUE;
1893
1894	*remove_item= TRUE;
1895
1896	if (!(tbl_alias.str= (char*)thd->calloc(SUBQERY_TEMPTABLE_NAME_MAX_LEN)) \|\|
1897	!(jtbm= alloc_join_nest(thd))) //todo: this is not a join nest!
1898	{
1899	DBUG_RETURN(TRUE);
1900	}
1901
1902	jtbm->join_list= emb_join_list;
1903	jtbm->embedding= emb_tbl_nest;
1904	jtbm->jtbm_subselect= subq_pred;
1905	jtbm->nested_join= NULL;
1906
1907	/ Nests do not participate in those 'chains', so: /
1908	/ jtbm->next_leaf= jtbm->next_local= jtbm->next_global == NULL/
1909	emb_join_list->push_back(jtbm, thd->mem_root);
1910
1911	/*
1912	Inject the jtbm table into TABLE_LIST::next_leaf list, so that
1913	make_join_statistics() and co. can find it.
1914	*/
1915	parent_lex->leaf_tables.push_back(jtbm, thd->mem_root);
1916
1917	if (subq_pred->unit->first_select()->options & OPTION_SCHEMA_TABLE)
1918	parent_lex->options \|= OPTION_SCHEMA_TABLE;
1919
1920	/*
1921	Same as above for TABLE_LIST::next_local chain
1922	(a theory: a next_local chain always starts with ::leaf_tables
1923	because view's tables are inserted after the view)
1924	*/
1925	for (tl= (TABLE_LIST*)(parent_lex->table_list.first); tl->next_local; tl= tl->next_local)
1926	{}
1927	tl->next_local= jtbm;
1928
1929	/ A theory: no need to re-connect the next_global chain /
1930	if (optimization_delayed)
1931	{
1932	DBUG_ASSERT(parent_join->table_count < MAX_TABLES);
1933
1934	jtbm->jtbm_table_no= parent_join->table_count;
1935
1936	create_subquery_temptable_name(&tbl_alias,
1937	subq_pred->unit->first_select()->select_number);
1938	jtbm->alias.str= tbl_alias.str;
1939	jtbm->alias.length= tbl_alias.length;
1940	parent_join->table_count++;
1941	DBUG_RETURN(thd->is_fatal_error);
1942	}
1943	subselect_hash_sj_engine *hash_sj_engine=
1944	((subselect_hash_sj_engine*)subq_pred->engine);
1945	jtbm->table= hash_sj_engine->tmp_table;
1946
1947	jtbm->table->tablenr= parent_join->table_count;
1948	jtbm->table->map= table_map(`1`) << (parent_join->table_count);
1949	jtbm->jtbm_table_no= jtbm->table->tablenr;
1950
1951	parent_join->table_count++;
1952	DBUG_ASSERT(parent_join->table_count < MAX_TABLES);
1953
1954	Item *conds= hash_sj_engine->semi_join_conds;
1955	conds->fix_after_pullout(parent_lex, &conds, TRUE);
1956
1957	DBUG_EXECUTE("where", print_where(conds,"SJ-EXPR", QT_ORDINARY););
1958
1959	create_subquery_temptable_name(&tbl_alias, hash_sj_engine->materialize_join->
1960	select_lex->select_number);
1961	jtbm->alias.str= tbl_alias.str;
1962	jtbm->alias.length= tbl_alias.length;
1963
1964	parent_lex->have_merged_subqueries= TRUE;
1965
1966	/ Don't unlink the child subselect, as the subquery will be used. /
1967
1968	DBUG_RETURN(thd->is_fatal_error);
1969	}
1970
1971
1972	static TABLE_LIST alloc_join_nest(THD thd)
1973	{
1974	TABLE_LIST *tbl;
1975	if (!(tbl= (TABLE_LIST) thd->calloc(ALIGN_SIZE(sizeof*(TABLE_LIST))+
1976	sizeof(NESTED_JOIN))))
1977	return NULL;
1978	tbl->nested_join= (NESTED_JOIN) ((uchar)tbl +
1979	ALIGN_SIZE(sizeof(TABLE_LIST)));
1980	return tbl;
1981	}
1982
1983	/*
1984	@Note thd->is_fatal_error can be set in case of OOM
1985	*/
1986
1987	void fix_list_after_tbl_changes(SELECT_LEX new_parent, List<TABLE_LIST> tlist)
1988	{
1989	List_iterator<TABLE_LIST> it(*tlist);
1990	TABLE_LIST *table;
1991	while ((table= it ++))
1992	{
1993	if (table->on_expr)
1994	table->on_expr->fix_after_pullout(new_parent, &table->on_expr, TRUE);
1995	if (table->nested_join)
1996	fix_list_after_tbl_changes(new_parent, &table->nested_join->join_list);
1997	}
1998	}
1999
2000
2001	static void set_emb_join_nest(List<TABLE_LIST> tables, TABLE_LIST emb_sj_nest)
2002	{
2003	List_iterator<TABLE_LIST> it(*tables);
2004	TABLE_LIST *tbl;
2005	while ((tbl= it ++))
2006	{
2007	/*
2008	Note: check for nested_join first.
2009	derived-merged tables have tbl->table!=NULL &&
2010	tbl->table->reginfo==NULL.
2011	*/
2012	if (tbl->nested_join)
2013	set_emb_join_nest(&tbl->nested_join->join_list, emb_sj_nest);
2014	else if (tbl->table)
2015	tbl->table->reginfo.join_tab->emb_sj_nest= emb_sj_nest;
2016
2017	}
2018	}
2019
2020	/*
2021	Pull tables out of semi-join nests, if possible
2022
2023	SYNOPSIS
2024	pull_out_semijoin_tables()
2025	join The join where to do the semi-join flattening
2026
2027	DESCRIPTION
2028	Try to pull tables out of semi-join nests.
2029
2030	PRECONDITIONS
2031	When this function is called, the join may have several semi-join nests
2032	but it is guaranteed that one semi-join nest does not contain another.
2033
2034	ACTION
2035	A table can be pulled out of the semi-join nest if
2036	- It is a constant table, or
2037	- It is accessed via eq_ref(outer_tables)
2038
2039	POSTCONDITIONS
2040	* Tables that were pulled out have JOIN_TAB::emb_sj_nest == NULL
2041	* Tables that were not pulled out have JOIN_TAB::emb_sj_nest pointing
2042	to semi-join nest they are in.
2043	* Semi-join nests' TABLE_LIST::sj_inner_tables is updated accordingly
2044
2045	This operation is (and should be) performed at each PS execution since
2046	tables may become/cease to be constant across PS reexecutions.
2047
2048	NOTE
2049	Table pullout may make uncorrelated subquery correlated. Consider this
2050	example:
2051
2052	... WHERE oe IN (SELECT it1.primary_key WHERE p(it1, it2) ... )
2053
2054	here table it1 can be pulled out (we have it1.primary_key=oe which gives
2055	us functional dependency). Once it1 is pulled out, all references to it1
2056	from p(it1, it2) become references to outside of the subquery and thus
2057	make the subquery (i.e. its semi-join nest) correlated.
2058	Making the subquery (i.e. its semi-join nest) correlated prevents us from
2059	using Materialization or LooseScan to execute it.
2060
2061	RETURN
2062	0 - OK
2063	1 - Out of memory error
2064	*/
2065
2066	int pull_out_semijoin_tables(JOIN *join)
2067	{
2068	TABLE_LIST *sj_nest;
2069	DBUG_ENTER("pull_out_semijoin_tables");
2070	List_iterator<TABLE_LIST> sj_list_it(join->select_lex->sj_nests);
2071
2072	/ Try pulling out of the each of the semi-joins /
2073	while ((sj_nest= sj_list_it ++))
2074	{
2075	List_iterator<TABLE_LIST> child_li(sj_nest->nested_join->join_list);
2076	TABLE_LIST *tbl;
2077
2078	/*
2079	Don't do table pull-out for nested joins (if we get nested joins here, it
2080	means these are outer joins. It is theoretically possible to do pull-out
2081	for some of the outer tables but we dont support this currently.
2082	*/
2083	bool have_join_nest_children= FALSE;
2084
2085	set_emb_join_nest(&sj_nest->nested_join->join_list, sj_nest);
2086
2087	while ((tbl= child_li ++))
2088	{
2089	if (tbl->nested_join)
2090	{
2091	have_join_nest_children= TRUE;
2092	break;
2093	}
2094	}
2095
2096	table_map pulled_tables= `0`;
2097	table_map dep_tables= `0`;
2098	if (have_join_nest_children)
2099	goto skip;
2100
2101	/*
2102	Calculate set of tables within this semi-join nest that have
2103	other dependent tables
2104	*/
2105	child_li.rewind();
2106	while ((tbl= child_li ++))
2107	{
2108	TABLE *const table= tbl->table;
2109	if (table &&
2110	(table->reginfo.join_tab->dependent &
2111	sj_nest->nested_join->used_tables))
2112	dep_tables\|= table->reginfo.join_tab->dependent;
2113	}
2114
2115	/ Action #1: Mark the constant tables to be pulled out /
2116	child_li.rewind();
2117	while ((tbl= child_li ++))
2118	{
2119	if (tbl->table)
2120	{
2121	tbl->table->reginfo.join_tab->emb_sj_nest= sj_nest;
2122	#if 0
2123	/*
2124	Do not pull out tables because they are constant. This operation has
2125	a problem:
2126	- Some constant tables may become/cease to be constant across PS
2127	re-executions
2128	- Contrary to our initial assumption, it turned out that table pullout
2129	operation is not easily undoable.
2130
2131	The solution is to leave constant tables where they are. This will
2132	affect only constant tables that are 1-row or empty, tables that are
2133	constant because they are accessed via eq_ref(const) access will
2134	still be pulled out as functionally-dependent.
2135
2136	This will cause us to miss the chance to flatten some of the
2137	subqueries, but since const tables do not generate many duplicates,
2138	it really doesn't matter that much whether they were pulled out or
2139	not.
2140
2141	All of this was done as fix for BUG#43768.
2142	*/
2143	if (tbl->table->map & join->const_table_map)
2144	{
2145	pulled_tables \|= tbl->table->map;
2146	DBUG_PRINT("info", ("Table %s pulled out (reason: constant)",
2147	tbl->table->alias));
2148	}
2149	#endif
2150	}
2151	}
2152
2153	/*
2154	Action #2: Find which tables we can pull out based on
2155	update_ref_and_keys() data. Note that pulling one table out can allow
2156	us to pull out some other tables too.
2157	*/
2158	bool pulled_a_table;
2159	do
2160	{
2161	pulled_a_table= FALSE;
2162	child_li.rewind();
2163	while ((tbl= child_li ++))
2164	{
2165	if (tbl->table && !(pulled_tables & tbl->table->map) &&
2166	!(dep_tables & tbl->table->map))
2167	{
2168	if (find_eq_ref_candidate(tbl->table,
2169	sj_nest->nested_join->used_tables &
2170	~pulled_tables))
2171	{
2172	pulled_a_table= TRUE;
2173	pulled_tables \|= tbl->table->map;
2174	DBUG_PRINT("info", ("Table %s pulled out (reason: func dep)",
2175	tbl->table->alias.c_ptr()));
2176	/*
2177	Pulling a table out of uncorrelated subquery in general makes
2178	makes it correlated. See the NOTE to this funtion.
2179	*/
2180	sj_nest->sj_subq_pred->is_correlated= TRUE;
2181	sj_nest->nested_join->sj_corr_tables\|= tbl->table->map;
2182	sj_nest->nested_join->sj_depends_on\|= tbl->table->map;
2183	}
2184	}
2185	}
2186	} while (pulled_a_table);
2187
2188	child_li.rewind();
2189	skip:
2190	/*
2191	Action #3: Move the pulled out TABLE_LIST elements to the parents.
2192	*/
2193	table_map inner_tables= sj_nest->nested_join->used_tables &
2194	~pulled_tables;
2195	/ Record the bitmap of inner tables /
2196	sj_nest->sj_inner_tables= inner_tables;
2197	if (pulled_tables)
2198	{
2199	List<TABLE_LIST> *upper_join_list= (sj_nest->embedding != NULL)?
2200	(&sj_nest->embedding->nested_join->join_list):
2201	(&join->select_lex->top_join_list);
2202	Query_arena *arena, backup;
2203	arena= join->thd->activate_stmt_arena_if_needed(&backup);
2204	while ((tbl= child_li ++))
2205	{
2206	if (tbl->table)
2207	{
2208	if (inner_tables & tbl->table->map)
2209	{
2210	/ This table is not pulled out /
2211	tbl->table->reginfo.join_tab->emb_sj_nest= sj_nest;
2212	}
2213	else
2214	{
2215	/ This table has been pulled out of the semi-join nest /
2216	tbl->table->reginfo.join_tab->emb_sj_nest= NULL;
2217	/*
2218	Pull the table up in the same way as simplify_joins() does:
2219	update join_list and embedding pointers but keep next[_local]
2220	pointers.
2221	*/
2222	child_li.remove();
2223	sj_nest->nested_join->used_tables &= ~tbl->table->map;
2224	upper_join_list->push_back(tbl, join->thd->mem_root);
2225	tbl->join_list= upper_join_list;
2226	tbl->embedding= sj_nest->embedding;
2227	}
2228	}
2229	}
2230
2231	/ Remove the sj-nest itself if we've removed everything from it /
2232	if (!inner_tables)
2233	{
2234	List_iterator<TABLE_LIST> li(*upper_join_list);
2235	/ Find the sj_nest in the list. /
2236	while (sj_nest != li ++) ;
2237	li.remove();
2238	/ Also remove it from the list of SJ-nests: /
2239	sj_list_it.remove();
2240	}
2241
2242	if (arena)
2243	join->thd->restore_active_arena(arena, &backup);
2244	}
2245	}
2246	DBUG_RETURN(`0`);
2247	}
2248
2249
2250	/*
2251	Optimize semi-join nests that could be run with sj-materialization
2252
2253	SYNOPSIS
2254	optimize_semijoin_nests()
2255	join The join to optimize semi-join nests for
2256	all_table_map Bitmap of all tables in the join
2257
2258	DESCRIPTION
2259	Optimize each of the semi-join nests that can be run with
2260	materialization. For each of the nests, we
2261	- Generate the best join order for this "sub-join" and remember it;
2262	- Remember the sub-join execution cost (it's part of materialization
2263	cost);
2264	- Calculate other costs that will be incurred if we decide
2265	to use materialization strategy for this semi-join nest.
2266
2267	All obtained information is saved and will be used by the main join
2268	optimization pass.
2269
2270	NOTES
2271	Because of Join::reoptimize(), this function may be called multiple times.
2272
2273	RETURN
2274	FALSE Ok
2275	TRUE Out of memory error
2276	*/
2277
2278	bool optimize_semijoin_nests(JOIN *join, table_map all_table_map)
2279	{
2280	DBUG_ENTER("optimize_semijoin_nests");
2281	List_iterator<TABLE_LIST> sj_list_it(join->select_lex->sj_nests);
2282	TABLE_LIST *sj_nest;
2283	while ((sj_nest= sj_list_it ++))
2284	{
2285	/ semi-join nests with only constant tables are not valid /
2286	/// DBUG_ASSERT(sj_nest->sj_inner_tables & ~join->const_table_map);
2287
2288	sj_nest->sj_mat_info= NULL;
2289	/*
2290	The statement may have been executed with 'semijoin=on' earlier.
2291	We need to verify that 'semijoin=on' still holds.
2292	*/
2293	if (optimizer_flag(join->thd, OPTIMIZER_SWITCH_SEMIJOIN) &&
2294	optimizer_flag(join->thd, OPTIMIZER_SWITCH_MATERIALIZATION))
2295	{
2296	if ((sj_nest->sj_inner_tables & ~join->const_table_map) && / not everything was pulled out /
2297	!sj_nest->sj_subq_pred->is_correlated &&
2298	sj_nest->sj_subq_pred->types_allow_materialization)
2299	{
2300	join->emb_sjm_nest= sj_nest;
2301	if (choose_plan(join, all_table_map &~join->const_table_map))
2302	DBUG_RETURN(TRUE); / purecov: inspected /
2303	/*
2304	The best plan to run the subquery is now in join->best_positions,
2305	save it.
2306	*/
2307	uint n_tables= my_count_bits(sj_nest->sj_inner_tables & ~join->const_table_map);
2308	SJ_MATERIALIZATION_INFO* sjm;
2309	if (!(sjm= new SJ_MATERIALIZATION_INFO) \|\|
2310	!(sjm->positions= (POSITION)join->thd->alloc(sizeof(POSITION)
2311	n_tables)))
2312	DBUG_RETURN(TRUE); / purecov: inspected /
2313	sjm->tables= n_tables;
2314	sjm->is_used= FALSE;
2315	double subjoin_out_rows, subjoin_read_time;
2316
2317	/*
2318	join->get_partial_cost_and_fanout(n_tables + join->const_tables,
2319	table_map(-1),
2320	&subjoin_read_time,
2321	&subjoin_out_rows);
2322	*/
2323	join->get_prefix_cost_and_fanout(n_tables,
2324	&subjoin_read_time,
2325	&subjoin_out_rows);
2326
2327	sjm->materialization_cost.convert_from_cost(subjoin_read_time);
2328	sjm->rows= subjoin_out_rows;
2329
2330	// Don't use the following list because it has "stale" items. use
2331	// ref_pointer_array instead:
2332	//
2333	//List<Item> &right_expr_list=
2334	// sj_nest->sj_subq_pred->unit->first_select()->item_list;
2335	/*
2336	Adjust output cardinality estimates. If the subquery has form
2337
2338	... oe IN (SELECT t1.colX, t2.colY, func(X,Y,Z) )
2339
2340	then the number of distinct output record combinations has an
2341	upper bound of product of number of records matching the tables
2342	that are used by the SELECT clause.
2343	TODO:
2344	We can get a more precise estimate if we
2345	- use rec_per_key cardinality estimates. For simple cases like
2346	"oe IN (SELECT t.key ...)" it is trivial.
2347	- Functional dependencies between the tables in the semi-join
2348	nest (the payoff is probably less here?)
2349
2350	See also get_post_group_estimate().
2351	*/
2352	SELECT_LEX *subq_select= sj_nest->sj_subq_pred->unit->first_select();
2353	{
2354	for (uint i=`0` ; i < join->const_tables + sjm->tables ; i++)
2355	{
2356	JOIN_TAB *tab= join->best_positions[i].table;
2357	join->map2table[tab->table->tablenr]= tab;
2358	}
2359	table_map map= `0`;
2360	for (uint i=`0`; i < subq_select->item_list.elements; i++)
2361	map\|= subq_select->ref_pointer_array [i]->used_tables();
2362	map= map & ~PSEUDO_TABLE_BITS;
2363	Table_map_iterator tm_it(map);
2364	int tableno;
2365	double rows= `1.0`;
2366	while ((tableno = tm_it.next_bit()) != Table_map_iterator::BITMAP_END)
2367	rows *= join->map2table[tableno]->table->quick_condition_rows;
2368	sjm->rows= MY_MIN(sjm->rows, rows);
2369	}
2370	memcpy((uchar*) sjm->positions,
2371	(uchar*) (join->best_positions + join->const_tables),
2372	sizeof(POSITION) * n_tables);
2373
2374	/*
2375	Calculate temporary table parameters and usage costs
2376	*/
2377	uint rowlen= get_tmp_table_rec_length(subq_select->ref_pointer_array,
2378	subq_select->item_list.elements);
2379	double lookup_cost= get_tmp_table_lookup_cost(join->thd,
2380	subjoin_out_rows, rowlen);
2381	double write_cost= get_tmp_table_write_cost(join->thd,
2382	subjoin_out_rows, rowlen);
2383
2384	/*
2385	Let materialization cost include the cost to write the data into the
2386	temporary table:
2387	*/
2388	sjm->materialization_cost.add_io(subjoin_out_rows, write_cost);
2389
2390	/*
2391	Set the cost to do a full scan of the temptable (will need this to
2392	consider doing sjm-scan):
2393	*/
2394	sjm->scan_cost.reset();
2395	sjm->scan_cost.add_io(sjm->rows, lookup_cost);
2396
2397	sjm->lookup_cost.convert_from_cost(lookup_cost);
2398	sj_nest->sj_mat_info= sjm;
2399	DBUG_EXECUTE("opt", print_sjm(sjm););
2400	}
2401	}
2402	}
2403	join->emb_sjm_nest= NULL;
2404	DBUG_RETURN(FALSE);
2405	}
2406
2407
2408	/*
2409	Get estimated record length for semi-join materialization temptable
2410
2411	SYNOPSIS
2412	get_tmp_table_rec_length()
2413	items IN subquery's select list.
2414
2415	DESCRIPTION
2416	Calculate estimated record length for semi-join materialization
2417	temptable. It's an estimate because we don't follow every bit of
2418	create_tmp_table()'s logic. This isn't necessary as the return value of
2419	this function is used only for cost calculations.
2420
2421	RETURN
2422	Length of the temptable record, in bytes
2423	*/
2424
2425	static uint get_tmp_table_rec_length(Ref_ptr_array p_items, uint elements)
2426	{
2427	uint len= `0`;
2428	Item *item;
2429	//List_iterator<Item> it(items);
2430	for (uint i= `0`; i < elements ; i++)
2431	{
2432	item = p_items [i];
2433	switch (item->result_type()) {
2434	case REAL_RESULT:
2435	len += sizeof(double);
2436	break;
2437	case INT_RESULT:
2438	if (item->max_length >= (MY_INT32_NUM_DECIMAL_DIGITS - `1`))
2439	len += `8`;
2440	else
2441	len += `4`;
2442	break;
2443	case STRING_RESULT:
2444	enum enum_field_types type;
2445	/ DATE/TIME and GEOMETRY fields have STRING_RESULT result type. /
2446	if ((type= item->field_type()) == MYSQL_TYPE_DATETIME \|\|
2447	type == MYSQL_TYPE_TIME \|\| type == MYSQL_TYPE_DATE \|\|
2448	type == MYSQL_TYPE_TIMESTAMP \|\| type == MYSQL_TYPE_GEOMETRY)
2449	len += `8`;
2450	else
2451	len += item->max_length;
2452	break;
2453	case DECIMAL_RESULT:
2454	len += `10`;
2455	break;
2456	case ROW_RESULT:
2457	default:
2458	DBUG_ASSERT(`0`); / purecov: deadcode /
2459	break;
2460	}
2461	}
2462	return len;
2463	}
2464
2465
2466	/**
2467	The cost of a lookup into a unique hash/btree index on a temporary table
2468	with 'row_count' rows each of size 'row_size'.
2469
2470	@param thd current query context
2471	@param row_count number of rows in the temp table
2472	@param row_size average size in bytes of the rows
2473
2474	@return the cost of one lookup
2475	*/
2476
2477	double
2478	get_tmp_table_lookup_cost(THD thd, double* row_count, uint row_size)
2479	{
2480	if (row_count * row_size > thd->variables.max_heap_table_size)
2481	return (double) DISK_TEMPTABLE_LOOKUP_COST;
2482	else
2483	return (double) HEAP_TEMPTABLE_LOOKUP_COST;
2484	}
2485
2486	/**
2487	The cost of writing a row into a temporary table with 'row_count' unique
2488	rows each of size 'row_size'.
2489
2490	@param thd current query context
2491	@param row_count number of rows in the temp table
2492	@param row_size average size in bytes of the rows
2493
2494	@return the cost of writing one row
2495	*/
2496
2497	double
2498	get_tmp_table_write_cost(THD thd, double* row_count, uint row_size)
2499	{
2500	double lookup_cost= get_tmp_table_lookup_cost(thd, row_count, row_size);
2501	/*
2502	TODO:
2503	This is an optimistic estimate. Add additional costs resulting from
2504	actually writing the row to memory/disk and possible index reorganization.
2505	*/
2506	return lookup_cost;
2507	}
2508
2509
2510	/*
2511	Check if table's KEYUSE elements have an eq_ref(outer_tables) candidate
2512
2513	SYNOPSIS
2514	find_eq_ref_candidate()
2515	table Table to be checked
2516	sj_inner_tables Bitmap of inner tables. eq_ref(inner_table) doesn't
2517	count.
2518
2519	DESCRIPTION
2520	Check if table's KEYUSE elements have an eq_ref(outer_tables) candidate
2521
2522	TODO
2523	Check again if it is feasible to factor common parts with constant table
2524	search
2525
2526	Also check if it's feasible to factor common parts with table elimination
2527
2528	RETURN
2529	TRUE - There exists an eq_ref(outer-tables) candidate
2530	FALSE - Otherwise
2531	*/
2532
2533	bool find_eq_ref_candidate(TABLE *table, table_map sj_inner_tables)
2534	{
2535	KEYUSE *keyuse= table->reginfo.join_tab->keyuse;
2536
2537	if (keyuse)
2538	{
2539	do
2540	{
2541	uint key= keyuse->key;
2542	KEY *keyinfo;
2543	key_part_map bound_parts= `0`;
2544	bool is_excluded_key= keyuse->is_for_hash_join();
2545	if (!is_excluded_key)
2546	{
2547	keyinfo= table->key_info + key;
2548	is_excluded_key= !MY_TEST(keyinfo->flags & HA_NOSAME);
2549	}
2550	if (!is_excluded_key)
2551	{
2552	do / For all equalities on all key parts /
2553	{
2554	/ Check if this is "t.keypart = expr(outer_tables) /
2555	if (!(keyuse->used_tables & sj_inner_tables) &&
2556	!(keyuse->optimize & KEY_OPTIMIZE_REF_OR_NULL))
2557	{
2558	bound_parts \|= `1` << keyuse->keypart;
2559	}
2560	keyuse++;
2561	} while (keyuse->key == key && keyuse->table == table);
2562
2563	if (bound_parts == PREV_BITS(uint, keyinfo->user_defined_key_parts))
2564	return TRUE;
2565	}
2566	else
2567	{
2568	do
2569	{
2570	keyuse++;
2571	} while (keyuse->key == key && keyuse->table == table);
2572	}
2573	} while (keyuse->table == table);
2574	}
2575	return FALSE;
2576	}
2577
2578
2579	/*
2580	Do semi-join optimization step after we've added a new tab to join prefix
2581
2582	SYNOPSIS
2583	advance_sj_state()
2584	join The join we're optimizing
2585	remaining_tables Tables not in the join prefix
2586	new_join_tab Join tab we've just added to the join prefix
2587	idx Index of this join tab (i.e. number of tables
2588	in the prefix minus one)
2589	current_record_count INOUT Estimate of #records in join prefix's output
2590	current_read_time INOUT Cost to execute the join prefix
2591	loose_scan_pos IN A POSITION with LooseScan plan to access
2592	table new_join_tab
2593	(produced by the last best_access_path call)
2594
2595	DESCRIPTION
2596	Update semi-join optimization state after we've added another tab (table
2597	and access method) to the join prefix.
2598
2599	The state is maintained in join->positions[#prefix_size]. Each of the
2600	available strategies has its own state variables.
2601
2602	for each semi-join strategy
2603	{
2604	update strategy's state variables;
2605
2606	if (join prefix has all the tables that are needed to consider
2607	using this strategy for the semi-join(s))
2608	{
2609	calculate cost of using the strategy
2610	if ((this is the first strategy to handle the semi-join nest(s) \|\|
2611	the cost is less than other strategies))
2612	{
2613	// Pick this strategy
2614	pos->sj_strategy= ..
2615	..
2616	}
2617	}
2618
2619	Most of the new state is saved join->positions[idx] (and hence no undo
2620	is necessary). Several members of class JOIN are updated also, these
2621	changes can be rolled back with restore_prev_sj_state().
2622
2623	See setup_semijoin_dups_elimination() for a description of what kinds of
2624	join prefixes each strategy can handle.
2625	*/
2626
2627	bool is_multiple_semi_joins(JOIN join, POSITION prefix, uint idx, table_map inner_tables)
2628	{
2629	for (int i= (int)idx; i >= `0`; i--)
2630	{
2631	TABLE_LIST *emb_sj_nest;
2632	if ((emb_sj_nest= prefix[i].table->emb_sj_nest))
2633	{
2634	if (inner_tables & emb_sj_nest->sj_inner_tables)
2635	return !MY_TEST(inner_tables == (emb_sj_nest->sj_inner_tables &
2636	~join->const_table_map));
2637	}
2638	}
2639	return FALSE;
2640	}
2641
2642
2643	void advance_sj_state(JOIN *join, table_map remaining_tables, uint idx,
2644	double current_record_count, double* *current_read_time,
2645	POSITION *loose_scan_pos)
2646	{
2647	POSITION *pos= join->positions + idx;
2648	const JOIN_TAB *new_join_tab= pos->table;
2649	Semi_join_strategy_picker *pickers[]=
2650	{
2651	&pos->firstmatch_picker,
2652	&pos->loosescan_picker,
2653	&pos->sjmat_picker,
2654	&pos->dups_weedout_picker,
2655	NULL,
2656	};
2657
2658	if (join->emb_sjm_nest)
2659	{
2660	/*
2661	We're performing optimization inside SJ-Materialization nest:
2662	- there are no other semi-joins inside semi-join nests
2663	- attempts to build semi-join strategies here will confuse
2664	the optimizer, so bail out.
2665	*/
2666	pos->sj_strategy= SJ_OPT_NONE;
2667	return;
2668	}
2669
2670	/*
2671	Update join->cur_sj_inner_tables (Used by FirstMatch in this function and
2672	LooseScan detector in best_access_path)
2673	*/
2674	remaining_tables &= ~new_join_tab->table->map;
2675	table_map dups_producing_tables, UNINIT_VAR(prev_dups_producing_tables),
2676	UNINIT_VAR(prev_sjm_lookup_tables);
2677
2678	if (idx == join->const_tables)
2679	dups_producing_tables= `0`;
2680	else
2681	dups_producing_tables= pos[-`1`].dups_producing_tables;
2682
2683	TABLE_LIST *emb_sj_nest;
2684	if ((emb_sj_nest= new_join_tab->emb_sj_nest))
2685	dups_producing_tables \|= emb_sj_nest->sj_inner_tables;
2686
2687	Semi_join_strategy_picker strategy, prev_strategy= `0`;
2688	if (idx == join->const_tables)
2689	{
2690	/ First table, initialize pickers /
2691	for (strategy= pickers; *strategy != NULL; strategy++)
2692	(*strategy)->set_empty();
2693	pos->inner_tables_handled_with_other_sjs= `0`;
2694	}
2695	else
2696	{
2697	for (strategy= pickers; *strategy != NULL; strategy++)
2698	{
2699	(*strategy)->set_from_prev(pos - `1`);
2700	}
2701	pos->inner_tables_handled_with_other_sjs=
2702	pos[-`1`].inner_tables_handled_with_other_sjs;
2703	}
2704
2705	pos->prefix_cost.convert_from_cost(*current_read_time);
2706	pos->prefix_record_count= *current_record_count;
2707
2708	{
2709	pos->sj_strategy= SJ_OPT_NONE;
2710
2711	for (strategy= pickers; *strategy != NULL; strategy++)
2712	{
2713	table_map handled_fanout;
2714	sj_strategy_enum sj_strategy;
2715	double rec_count= *current_record_count;
2716	double read_time= *current_read_time;
2717	if ((*strategy)->check_qep(join, idx, remaining_tables,
2718	new_join_tab,
2719	&rec_count,
2720	&read_time,
2721	&handled_fanout,
2722	&sj_strategy,
2723	loose_scan_pos))
2724	{
2725	/*
2726	It's possible to use the strategy. Use it, if
2727	- it removes semi-join fanout that was not removed before
2728	- using it is cheaper than using something else,
2729	and {if some other strategy has removed fanout
2730	that this strategy is trying to remove, then it
2731	did remove the fanout only for one semi-join}
2732	This is to avoid a situation when
2733	1. strategy X removes fanout for semijoin X,Y
2734	2. using strategy Z is cheaper, but it only removes
2735	fanout from semijoin X.
2736	3. We have no clue what to do about fanount of semi-join Y.
2737	*/
2738	if ((dups_producing_tables & handled_fanout) \|\|
2739	(read_time < *current_read_time &&
2740	!(handled_fanout & pos->inner_tables_handled_with_other_sjs)))
2741	{
2742	DBUG_ASSERT(pos->sj_strategy != sj_strategy);
2743	/*
2744	If the strategy choosen first time or
2745	the strategy replace strategy which was used to exectly the same
2746	tables
2747	*/
2748	if (pos->sj_strategy == SJ_OPT_NONE \|\|
2749	handled_fanout ==
2750	(prev_dups_producing_tables ^ dups_producing_tables))
2751	{
2752	prev_strategy= strategy;
2753	if (pos->sj_strategy == SJ_OPT_NONE)
2754	{
2755	prev_dups_producing_tables= dups_producing_tables;
2756	prev_sjm_lookup_tables= join->sjm_lookup_tables;
2757	}
2758	/ Mark strategy as used /
2759	(*strategy)->mark_used();
2760	pos->sj_strategy= sj_strategy;
2761	if (sj_strategy == SJ_OPT_MATERIALIZE)
2762	join->sjm_lookup_tables \|= handled_fanout;
2763	else
2764	join->sjm_lookup_tables &= ~handled_fanout;
2765	*current_read_time= read_time;
2766	*current_record_count= rec_count;
2767	dups_producing_tables &= ~handled_fanout;
2768	//TODO: update bitmap of semi-joins that were handled together with
2769	// others.
2770	if (is_multiple_semi_joins(join, join->positions, idx,
2771	handled_fanout))
2772	pos->inner_tables_handled_with_other_sjs \|= handled_fanout;
2773	}
2774	else
2775	{
2776	/ Conflict fall to most general variant /
2777	(*prev_strategy)->set_empty();
2778	dups_producing_tables= prev_dups_producing_tables;
2779	join->sjm_lookup_tables= prev_sjm_lookup_tables;
2780	// mark it 'none' to avpoid loops
2781	pos->sj_strategy= SJ_OPT_NONE;
2782	// next skip to last;
2783	strategy= pickers +
2784	(sizeof(pickers)/sizeof(Semi_join_strategy_picker*) - `3`);
2785	continue;
2786	}
2787	}
2788	else
2789	{
2790	/ We decided not to apply the strategy. /
2791	(*strategy)->set_empty();
2792	}
2793	}
2794	}
2795	}
2796
2797	if ((emb_sj_nest= new_join_tab->emb_sj_nest))
2798	{
2799	join->cur_sj_inner_tables \|= emb_sj_nest->sj_inner_tables;
2800
2801	/ Remove the sj_nest if all of its SJ-inner tables are in cur_table_map /
2802	if (!(remaining_tables &
2803	emb_sj_nest->sj_inner_tables & ~new_join_tab->table->map))
2804	join->cur_sj_inner_tables &= ~emb_sj_nest->sj_inner_tables;
2805	}
2806
2807	pos->prefix_cost.convert_from_cost(*current_read_time);
2808	pos->prefix_record_count= *current_record_count;
2809	pos->dups_producing_tables= dups_producing_tables;
2810	}
2811
2812
2813	void Sj_materialization_picker::set_from_prev(struct st_position *prev)
2814	{
2815	if (prev->sjmat_picker.is_used)
2816	set_empty();
2817	else
2818	{
2819	sjm_scan_need_tables= prev->sjmat_picker.sjm_scan_need_tables;
2820	sjm_scan_last_inner= prev->sjmat_picker.sjm_scan_last_inner;
2821	}
2822	is_used= FALSE;
2823	}
2824
2825
2826	bool Sj_materialization_picker::check_qep(JOIN *join,
2827	uint idx,
2828	table_map remaining_tables,
2829	const JOIN_TAB *new_join_tab,
2830	double *record_count,
2831	double *read_time,
2832	table_map *handled_fanout,
2833	sj_strategy_enum *strategy,
2834	POSITION *loose_scan_pos)
2835	{
2836	bool sjm_scan;
2837	SJ_MATERIALIZATION_INFO *mat_info;
2838	if ((mat_info= at_sjmat_pos(join, remaining_tables,
2839	new_join_tab, idx, &sjm_scan)))
2840	{
2841	if (sjm_scan)
2842	{
2843	/*
2844	We can't yet evaluate this option yet. This is because we can't
2845	accout for fanout of sj-inner tables yet:
2846
2847	ntX SJM-SCAN(it1 ... itN) \| ot1 ... otN \|
2848	^(1) ^(2)
2849
2850	we're now at position (1). SJM temptable in general has multiple
2851	records, so at point (1) we'll get the fanout from sj-inner tables (ie
2852	there will be multiple record combinations).
2853
2854	The final join result will not contain any semi-join produced
2855	fanout, i.e. tables within SJM-SCAN(...) will not contribute to
2856	the cardinality of the join output. Extra fanout produced by
2857	SJM-SCAN(...) will be 'absorbed' into fanout produced by ot1 ... otN.
2858
2859	The simple way to model this is to remove SJM-SCAN(...) fanout once
2860	we reach the point #2.
2861	*/
2862	sjm_scan_need_tables=
2863	new_join_tab->emb_sj_nest->sj_inner_tables \|
2864	new_join_tab->emb_sj_nest->nested_join->sj_depends_on \|
2865	new_join_tab->emb_sj_nest->nested_join->sj_corr_tables;
2866	sjm_scan_last_inner= idx;
2867	}
2868	else
2869	{
2870	/ This is SJ-Materialization with lookups /
2871	Cost_estimate prefix_cost;
2872	signed int first_tab= (int)idx - mat_info->tables;
2873	double prefix_rec_count;
2874	if (first_tab < (int)join->const_tables)
2875	{
2876	prefix_cost.reset();
2877	prefix_rec_count= `1.0`;
2878	}
2879	else
2880	{
2881	prefix_cost = join->positions[first_tab].prefix_cost;
2882	prefix_rec_count= join->positions[first_tab].prefix_record_count;
2883	}
2884
2885	double mat_read_time= prefix_cost.total_cost();
2886	mat_read_time += mat_info->materialization_cost.total_cost() +
2887	prefix_rec_count * mat_info->lookup_cost.total_cost();
2888
2889	/*
2890	NOTE: When we pick to use SJM[-Scan] we don't memcpy its POSITION
2891	elements to join->positions as that makes it hard to return things
2892	back when making one step back in join optimization. That's done
2893	after the QEP has been chosen.
2894	*/
2895	*read_time= mat_read_time;
2896	*record_count= prefix_rec_count;
2897	*handled_fanout= new_join_tab->emb_sj_nest->sj_inner_tables;
2898	*strategy= SJ_OPT_MATERIALIZE;
2899	return TRUE;
2900	}
2901	}
2902
2903	/ 4.A SJM-Scan second phase check /
2904	if (sjm_scan_need_tables && / Have SJM-Scan prefix /
2905	!(sjm_scan_need_tables & remaining_tables))
2906	{
2907	TABLE_LIST *mat_nest=
2908	join->positions[sjm_scan_last_inner].table->emb_sj_nest;
2909	SJ_MATERIALIZATION_INFO *mat_info= mat_nest->sj_mat_info;
2910
2911	double prefix_cost;
2912	double prefix_rec_count;
2913	int first_tab= sjm_scan_last_inner + `1` - mat_info->tables;
2914	/ Get the prefix cost /
2915	if (first_tab == (int)join->const_tables)
2916	{
2917	prefix_rec_count= `1.0`;
2918	prefix_cost= `0.0`;
2919	}
2920	else
2921	{
2922	prefix_cost= join->positions[first_tab - `1`].prefix_cost.total_cost();
2923	prefix_rec_count= join->positions[first_tab - `1`].prefix_record_count;
2924	}
2925
2926	/ Add materialization cost /
2927	prefix_cost += mat_info->materialization_cost.total_cost() +
2928	prefix_rec_count * mat_info->scan_cost.total_cost();
2929	prefix_rec_count *= mat_info->rows;
2930
2931	uint i;
2932	table_map rem_tables= remaining_tables;
2933	for (i= idx; i != (first_tab + mat_info->tables - `1`); i--)
2934	rem_tables \|= join->positions[i].table->table->map;
2935
2936	POSITION curpos, dummy;
2937	/ Need to re-run best-access-path as we prefix_rec_count has changed /
2938	bool disable_jbuf= (join->thd->variables.join_cache_level == `0`);
2939	for (i= first_tab + mat_info->tables; i <= idx; i++)
2940	{
2941	best_access_path(join, join->positions[i].table, rem_tables, i,
2942	disable_jbuf, prefix_rec_count, &curpos, &dummy);
2943	prefix_rec_count *= curpos.records_read;
2944	prefix_cost += curpos.read_time;
2945	}
2946
2947	*strategy= SJ_OPT_MATERIALIZE_SCAN;
2948	*read_time= prefix_cost;
2949	*record_count= prefix_rec_count;
2950	*handled_fanout= mat_nest->sj_inner_tables;
2951	return TRUE;
2952	}
2953	return FALSE;
2954	}
2955
2956
2957	void LooseScan_picker::set_from_prev(struct st_position *prev)
2958	{
2959	if (prev->loosescan_picker.is_used)
2960	set_empty();
2961	else
2962	{
2963	first_loosescan_table= prev->loosescan_picker.first_loosescan_table;
2964	loosescan_need_tables= prev->loosescan_picker.loosescan_need_tables;
2965	}
2966	is_used= FALSE;
2967	}
2968
2969
2970	bool LooseScan_picker::check_qep(JOIN *join,
2971	uint idx,
2972	table_map remaining_tables,
2973	const JOIN_TAB *new_join_tab,
2974	double *record_count,
2975	double *read_time,
2976	table_map *handled_fanout,
2977	sj_strategy_enum *strategy,
2978	struct st_position *loose_scan_pos)
2979	{
2980	POSITION *first= join->positions + first_loosescan_table;
2981	/*
2982	LooseScan strategy can't handle interleaving between tables from the
2983	semi-join that LooseScan is handling and any other tables.
2984
2985	If we were considering LooseScan for the join prefix (1)
2986	and the table we're adding creates an interleaving (2)
2987	then
2988	stop considering loose scan
2989	*/
2990	if ((first_loosescan_table != MAX_TABLES) && // (1)
2991	(first->table->emb_sj_nest->sj_inner_tables & remaining_tables) && //(2)
2992	new_join_tab->emb_sj_nest != first->table->emb_sj_nest) //(2)
2993	{
2994	first_loosescan_table= MAX_TABLES;
2995	}
2996
2997	/*
2998	If we got an option to use LooseScan for the current table, start
2999	considering using LooseScan strategy
3000	*/
3001	if (loose_scan_pos->read_time != DBL_MAX && !join->outer_join)
3002	{
3003	first_loosescan_table= idx;
3004	loosescan_need_tables=
3005	new_join_tab->emb_sj_nest->sj_inner_tables \|
3006	new_join_tab->emb_sj_nest->nested_join->sj_depends_on \|
3007	new_join_tab->emb_sj_nest->nested_join->sj_corr_tables;
3008	}
3009
3010	if ((first_loosescan_table != MAX_TABLES) &&
3011	!(remaining_tables & loosescan_need_tables) &&
3012	(new_join_tab->table->map & loosescan_need_tables))
3013	{
3014	/*
3015	Ok we have LooseScan plan and also have all LooseScan sj-nest's
3016	inner tables and outer correlated tables into the prefix.
3017	*/
3018
3019	first= join->positions + first_loosescan_table;
3020	uint n_tables= my_count_bits(first->table->emb_sj_nest->sj_inner_tables);
3021	/ Got a complete LooseScan range. Calculate its cost /
3022	/*
3023	The same problem as with FirstMatch - we need to save POSITIONs
3024	somewhere but reserving space for all cases would require too
3025	much space. We will re-calculate POSITION structures later on.
3026	*/
3027	bool disable_jbuf= (join->thd->variables.join_cache_level == `0`);
3028	optimize_wo_join_buffering(join, first_loosescan_table, idx,
3029	remaining_tables,
3030	TRUE, //first_alt
3031	disable_jbuf ? join->table_count :
3032	first_loosescan_table + n_tables,
3033	record_count,
3034	read_time);
3035	/*
3036	We don't yet have any other strategies that could handle this
3037	semi-join nest (the other options are Duplicate Elimination or
3038	Materialization, which need at least the same set of tables in
3039	the join prefix to be considered) so unconditionally pick the
3040	LooseScan.
3041	*/
3042	*strategy= SJ_OPT_LOOSE_SCAN;
3043	*handled_fanout= first->table->emb_sj_nest->sj_inner_tables;
3044	return TRUE;
3045	}
3046	return FALSE;
3047	}
3048
3049	void Firstmatch_picker::set_from_prev(struct st_position *prev)
3050	{
3051	if (prev->firstmatch_picker.is_used)
3052	invalidate_firstmatch_prefix();
3053	else
3054	{
3055	first_firstmatch_table= prev->firstmatch_picker.first_firstmatch_table;
3056	first_firstmatch_rtbl= prev->firstmatch_picker.first_firstmatch_rtbl;
3057	firstmatch_need_tables= prev->firstmatch_picker.firstmatch_need_tables;
3058	}
3059	is_used= FALSE;
3060	}
3061
3062	bool Firstmatch_picker::check_qep(JOIN *join,
3063	uint idx,
3064	table_map remaining_tables,
3065	const JOIN_TAB *new_join_tab,
3066	double *record_count,
3067	double *read_time,
3068	table_map *handled_fanout,
3069	sj_strategy_enum *strategy,
3070	POSITION *loose_scan_pos)
3071	{
3072	if (new_join_tab->emb_sj_nest &&
3073	optimizer_flag(join->thd, OPTIMIZER_SWITCH_FIRSTMATCH) &&
3074	!join->outer_join)
3075	{
3076	const table_map outer_corr_tables=
3077	new_join_tab->emb_sj_nest->nested_join->sj_corr_tables \|
3078	new_join_tab->emb_sj_nest->nested_join->sj_depends_on;
3079	const table_map sj_inner_tables=
3080	new_join_tab->emb_sj_nest->sj_inner_tables & ~join->const_table_map;
3081
3082	/*
3083	Enter condition:
3084	1. The next join tab belongs to semi-join nest
3085	(verified for the encompassing code block above).
3086	2. We're not in a duplicate producer range yet
3087	3. All outer tables that
3088	- the subquery is correlated with, or
3089	- referred to from the outer_expr
3090	are in the join prefix
3091	4. All inner tables are still part of remaining_tables.
3092	*/
3093	if (!join->cur_sj_inner_tables && // (2)
3094	!(remaining_tables & outer_corr_tables) && // (3)
3095	(sj_inner_tables == // (4)
3096	((remaining_tables \| new_join_tab->table->map) & sj_inner_tables)))
3097	{
3098	/ Start tracking potential FirstMatch range /
3099	first_firstmatch_table= idx;
3100	firstmatch_need_tables= sj_inner_tables;
3101	first_firstmatch_rtbl= remaining_tables;
3102	}
3103
3104	if (in_firstmatch_prefix())
3105	{
3106	if (outer_corr_tables & first_firstmatch_rtbl)
3107	{
3108	/*
3109	Trying to add an sj-inner table whose sj-nest has an outer correlated
3110	table that was not in the prefix. This means FirstMatch can't be used.
3111	*/
3112	invalidate_firstmatch_prefix();
3113	}
3114	else
3115	{
3116	/ Record that we need all of this semi-join's inner tables, too /
3117	firstmatch_need_tables\|= sj_inner_tables;
3118	}
3119
3120	if (in_firstmatch_prefix() &&
3121	!(firstmatch_need_tables & remaining_tables))
3122	{
3123	/*
3124	Got a complete FirstMatch range. Calculate correct costs and fanout
3125	*/
3126
3127	if (idx == first_firstmatch_table &&
3128	optimizer_flag(join->thd, OPTIMIZER_SWITCH_SEMIJOIN_WITH_CACHE))
3129	{
3130	/*
3131	An important special case: only one inner table, and @@optimizer_switch
3132	allows join buffering.
3133	- read_time is the same (i.e. FirstMatch doesn't add any cost
3134	- remove fanout added by the last table
3135	*/
3136	if (*record_count)
3137	*record_count /= join->positions[idx].records_read;
3138	}
3139	else
3140	{
3141	optimize_wo_join_buffering(join, first_firstmatch_table, idx,
3142	remaining_tables, FALSE, idx,
3143	record_count,
3144	read_time);
3145	}
3146	/*
3147	We ought to save the alternate POSITIONs produced by
3148	optimize_wo_join_buffering but the problem is that providing save
3149	space uses too much space. Instead, we will re-calculate the
3150	alternate POSITIONs after we've picked the best QEP.
3151	*/
3152	*handled_fanout= firstmatch_need_tables;
3153	/ record_count and read_time were set by the above call /
3154	*strategy= SJ_OPT_FIRST_MATCH;
3155	return TRUE;
3156	}
3157	}
3158	}
3159	else
3160	invalidate_firstmatch_prefix();
3161	return FALSE;
3162	}
3163
3164
3165	void Duplicate_weedout_picker::set_from_prev(POSITION *prev)
3166	{
3167	if (prev->dups_weedout_picker.is_used)
3168	set_empty();
3169	else
3170	{
3171	dupsweedout_tables= prev->dups_weedout_picker.dupsweedout_tables;
3172	first_dupsweedout_table= prev->dups_weedout_picker.first_dupsweedout_table;
3173	}
3174	is_used= FALSE;
3175	}
3176
3177
3178	bool Duplicate_weedout_picker::check_qep(JOIN *join,
3179	uint idx,
3180	table_map remaining_tables,
3181	const JOIN_TAB *new_join_tab,
3182	double *record_count,
3183	double *read_time,
3184	table_map *handled_fanout,
3185	sj_strategy_enum *strategy,
3186	POSITION *loose_scan_pos
3187	)
3188	{
3189	TABLE_LIST *nest;
3190	if ((nest= new_join_tab->emb_sj_nest))
3191	{
3192	if (!dupsweedout_tables)
3193	first_dupsweedout_table= idx;
3194
3195	dupsweedout_tables \|= nest->sj_inner_tables \|
3196	nest->nested_join->sj_depends_on \|
3197	nest->nested_join->sj_corr_tables;
3198	}
3199
3200	if (dupsweedout_tables)
3201	{
3202	/ we're in the process of constructing a DuplicateWeedout range /
3203	TABLE_LIST *emb= new_join_tab->table->pos_in_table_list->embedding;
3204	/ and we've entered an inner side of an outer join/
3205	if (emb && emb->on_expr)
3206	dupsweedout_tables \|= emb->nested_join->used_tables;
3207	}
3208
3209	/ If this is the last table that we need for DuplicateWeedout range /
3210	if (dupsweedout_tables && !(remaining_tables & ~new_join_tab->table->map &
3211	dupsweedout_tables))
3212	{
3213	/*
3214	Ok, reached a state where we could put a dups weedout point.
3215	Walk back and calculate
3216	- the join cost (this is needed as the accumulated cost may assume
3217	some other duplicate elimination method)
3218	- extra fanout that will be removed by duplicate elimination
3219	- duplicate elimination cost
3220	There are two cases:
3221	1. We have other strategy/ies to remove all of the duplicates.
3222	2. We don't.
3223
3224	We need to calculate the cost in case #2 also because we need to make
3225	choice between this join order and others.
3226	*/
3227	uint first_tab= first_dupsweedout_table;
3228	double dups_cost;
3229	double prefix_rec_count;
3230	double sj_inner_fanout= `1.0`;
3231	double sj_outer_fanout= `1.0`;
3232	uint temptable_rec_size;
3233	if (first_tab == join->const_tables)
3234	{
3235	prefix_rec_count= `1.0`;
3236	temptable_rec_size= `0`;
3237	dups_cost= `0.0`;
3238	}
3239	else
3240	{
3241	dups_cost= join->positions[first_tab - `1`].prefix_cost.total_cost();
3242	prefix_rec_count= join->positions[first_tab - `1`].prefix_record_count;
3243	temptable_rec_size= `8`; / This is not true but we'll make it so /
3244	}
3245
3246	table_map dups_removed_fanout= `0`;
3247	double current_fanout= prefix_rec_count;
3248	for (uint j= first_dupsweedout_table; j <= idx; j++)
3249	{
3250	POSITION *p= join->positions + j;
3251	current_fanout *= p->records_read;
3252	dups_cost += p->read_time + current_fanout / TIME_FOR_COMPARE;
3253	if (p->table->emb_sj_nest)
3254	{
3255	sj_inner_fanout *= p->records_read;
3256	dups_removed_fanout \|= p->table->table->map;
3257	}
3258	else
3259	{
3260	sj_outer_fanout *= p->records_read;
3261	temptable_rec_size += p->table->table->file->ref_length;
3262	}
3263	}
3264
3265	/*
3266	Add the cost of temptable use. The table will have sj_outer_fanout
3267	records, and we will make
3268	- sj_outer_fanout table writes
3269	- sj_inner_fanoutsj_outer_fanout lookups.*
3270
3271	*/
3272	double one_lookup_cost= get_tmp_table_lookup_cost(join->thd,
3273	sj_outer_fanout,
3274	temptable_rec_size);
3275	double one_write_cost= get_tmp_table_write_cost(join->thd,
3276	sj_outer_fanout,
3277	temptable_rec_size);
3278
3279	double write_cost= join->positions[first_tab].prefix_record_count*
3280	sj_outer_fanout * one_write_cost;
3281	double full_lookup_cost= join->positions[first_tab].prefix_record_count*
3282	sj_outer_fanout* sj_inner_fanout *
3283	one_lookup_cost;
3284	dups_cost += write_cost + full_lookup_cost;
3285
3286	*read_time= dups_cost;
3287	record_count= prefix_rec_count sj_outer_fanout;
3288	*handled_fanout= dups_removed_fanout;
3289	*strategy= SJ_OPT_DUPS_WEEDOUT;
3290	return TRUE;
3291	}
3292	return FALSE;
3293	}
3294
3295
3296	/*
3297	Remove the last join tab from from join->cur_sj_inner_tables bitmap
3298	we assume remaining_tables doesnt contain @tab.
3299	*/
3300
3301	void restore_prev_sj_state(const table_map remaining_tables,
3302	const JOIN_TAB *tab, uint idx)
3303	{
3304	TABLE_LIST *emb_sj_nest;
3305
3306	if (tab->emb_sj_nest)
3307	{
3308	table_map subq_tables= tab->emb_sj_nest->sj_inner_tables;
3309	tab->join->sjm_lookup_tables &= ~subq_tables;
3310	}
3311
3312	if ((emb_sj_nest= tab->emb_sj_nest))
3313	{
3314	/ If we're removing the last SJ-inner table, remove the sj-nest /
3315	if ((remaining_tables & emb_sj_nest->sj_inner_tables) ==
3316	(emb_sj_nest->sj_inner_tables & ~tab->table->map))
3317	{
3318	tab->join->cur_sj_inner_tables &= ~emb_sj_nest->sj_inner_tables;
3319	}
3320	}
3321	}
3322
3323
3324	/*
3325	Given a semi-join nest, find out which of the IN-equalities are bound
3326
3327	SYNOPSIS
3328	get_bound_sj_equalities()
3329	sj_nest Semi-join nest
3330	remaining_tables Tables that are not yet bound
3331
3332	DESCRIPTION
3333	Given a semi-join nest, find out which of the IN-equalities have their
3334	left part expression bound (i.e. the said expression doesn't refer to
3335	any of remaining_tables and can be evaluated).
3336
3337	RETURN
3338	Bitmap of bound IN-equalities.
3339	*/
3340
3341	ulonglong get_bound_sj_equalities(TABLE_LIST *sj_nest,
3342	table_map remaining_tables)
3343	{
3344	List_iterator<Item_ptr> li(sj_nest->nested_join->sj_outer_expr_list);
3345	Item **item;
3346	uint i= `0`;
3347	ulonglong res= `0`;
3348	while ((item= li ++))
3349	{
3350	/*
3351	Q: should this take into account equality propagation and how?
3352	A: If e->outer_side is an Item_field, walk over the equality
3353	class and see if there is an element that is bound?
3354	(this is an optional feature)
3355	*/
3356	if (!(item[`0`]->used_tables() & remaining_tables))
3357	{
3358	res \|= `1ULL` << i;
3359	}
3360	i++;
3361	}
3362	return res;
3363	}
3364
3365
3366	/*
3367	Check if the last tables of the partial join order allow to use
3368	sj-materialization strategy for them
3369
3370	SYNOPSIS
3371	at_sjmat_pos()
3372	join
3373	remaining_tables
3374	tab the last table's join tab
3375	idx last table's index
3376	loose_scan OUT TRUE <=> use LooseScan
3377
3378	RETURN
3379	TRUE Yes, can apply sj-materialization
3380	FALSE No, some of the requirements are not met
3381	*/
3382
3383	static SJ_MATERIALIZATION_INFO *
3384	at_sjmat_pos(const JOIN join, table_map remaining_tables, const* JOIN_TAB *tab,
3385	uint idx, bool *loose_scan)
3386	{
3387	/*
3388	Check if
3389	1. We're in a semi-join nest that can be run with SJ-materialization
3390	2. All the tables correlated through the IN subquery are in the prefix
3391	*/
3392	TABLE_LIST *emb_sj_nest= tab->emb_sj_nest;
3393	table_map suffix= remaining_tables & ~tab->table->map;
3394	if (emb_sj_nest && emb_sj_nest->sj_mat_info &&
3395	!(suffix & emb_sj_nest->sj_inner_tables))
3396	{
3397	/*
3398	Walk back and check if all immediately preceding tables are from
3399	this semi-join.
3400	*/
3401	uint n_tables= my_count_bits(tab->emb_sj_nest->sj_inner_tables);
3402	for (uint i= `1`; i < n_tables ; i++)
3403	{
3404	if (join->positions[idx - i].table->emb_sj_nest != tab->emb_sj_nest)
3405	return NULL;
3406	}
3407	*loose_scan= MY_TEST(remaining_tables & ~tab->table->map &
3408	(emb_sj_nest->sj_inner_tables \|
3409	emb_sj_nest->nested_join->sj_depends_on));
3410	if (*loose_scan && !emb_sj_nest->sj_subq_pred->sjm_scan_allowed)
3411	return NULL;
3412	else
3413	return emb_sj_nest->sj_mat_info;
3414	}
3415	return NULL;
3416	}
3417
3418
3419	/*
3420	Re-calculate values of join->best_positions[start..end].prefix_record_count
3421	*/
3422
3423	static void recalculate_prefix_record_count(JOIN *join, uint start, uint end)
3424	{
3425	for (uint j= start; j < end ;j++)
3426	{
3427	double prefix_count;
3428	if (j == join->const_tables)
3429	prefix_count= `1.0`;
3430	else
3431	prefix_count= join->best_positions[j-`1`].prefix_record_count *
3432	join->best_positions[j-`1`].records_read;
3433
3434	join->best_positions[j].prefix_record_count= prefix_count;
3435	}
3436	}
3437
3438
3439	/*
3440	Fix semi-join strategies for the picked join order
3441
3442	SYNOPSIS
3443	fix_semijoin_strategies_for_picked_join_order()
3444	join The join with the picked join order
3445
3446	DESCRIPTION
3447	Fix semi-join strategies for the picked join order. This is a step that
3448	needs to be done right after we have fixed the join order. What we do
3449	here is switch join's semi-join strategy description from backward-based
3450	to forwards based.
3451
3452	When join optimization is in progress, we re-consider semi-join
3453	strategies after we've added another table. Here's an illustration.
3454	Suppose the join optimization is underway:
3455
3456	1) ot1 it1 it2
3457	sjX -- looking at (ot1, it1, it2) join prefix, we decide
3458	to use semi-join strategy sjX.
3459
3460	2) ot1 it1 it2 ot2
3461	sjX sjY -- Having added table ot2, we now may consider
3462	another semi-join strategy and decide to use a
3463	different strategy sjY. Note that the record
3464	of sjX has remained under it2. That is
3465	necessary because we need to be able to get
3466	back to (ot1, it1, it2) join prefix.
3467	what makes things even worse is that there are cases where the choice
3468	of sjY changes the way we should access it2.
3469
3470	3) [ot1 it1 it2 ot2 ot3]
3471	sjX sjY -- This means that after join optimization is
3472	finished, semi-join info should be read
3473	right-to-left (while nearly all plan refinement
3474	functions, EXPLAIN, etc proceed from left to
3475	right)
3476
3477	This function does the needed reversal, making it possible to read the
3478	join and semi-join order from left to right.
3479	*/
3480
3481	void fix_semijoin_strategies_for_picked_join_order(JOIN *join)
3482	{
3483	uint table_count=join->table_count;
3484	uint tablenr;
3485	table_map remaining_tables= `0`;
3486	table_map handled_tabs= `0`;
3487	join->sjm_lookup_tables= `0`;
3488	join->sjm_scan_tables= `0`;
3489	for (tablenr= table_count - `1` ; tablenr != join->const_tables - `1`; tablenr--)
3490	{
3491	POSITION *pos= join->best_positions + tablenr;
3492	JOIN_TAB *s= pos->table;
3493	uint UNINIT_VAR(first); // Set by every branch except SJ_OPT_NONE which doesn't use it
3494
3495	if ((handled_tabs & s->table->map) \|\| pos->sj_strategy == SJ_OPT_NONE)
3496	{
3497	remaining_tables \|= s->table->map;
3498	continue;
3499	}
3500
3501	if (pos->sj_strategy == SJ_OPT_MATERIALIZE)
3502	{
3503	SJ_MATERIALIZATION_INFO *sjm= s->emb_sj_nest->sj_mat_info;
3504	sjm->is_used= TRUE;
3505	sjm->is_sj_scan= FALSE;
3506	memcpy((uchar) (pos - sjm->tables + `1`), (uchar) sjm->positions,
3507	sizeof(POSITION) * sjm->tables);
3508	recalculate_prefix_record_count(join, tablenr - sjm->tables + `1`,
3509	tablenr);
3510	first= tablenr - sjm->tables + `1`;
3511	join->best_positions[first].n_sj_tables= sjm->tables;
3512	join->best_positions[first].sj_strategy= SJ_OPT_MATERIALIZE;
3513	join->sjm_lookup_tables\|= s->table->map;
3514	}
3515	else if (pos->sj_strategy == SJ_OPT_MATERIALIZE_SCAN)
3516	{
3517	POSITION *first_inner= join->best_positions + pos->sjmat_picker.sjm_scan_last_inner;
3518	SJ_MATERIALIZATION_INFO *sjm= first_inner->table->emb_sj_nest->sj_mat_info;
3519	sjm->is_used= TRUE;
3520	sjm->is_sj_scan= TRUE;
3521	first= pos->sjmat_picker.sjm_scan_last_inner - sjm->tables + `1`;
3522	memcpy((uchar*) (join->best_positions + first),
3523	(uchar) sjm->positions, sizeof(POSITION) sjm->tables);
3524	recalculate_prefix_record_count(join, first, first + sjm->tables);
3525	join->best_positions[first].sj_strategy= SJ_OPT_MATERIALIZE_SCAN;
3526	join->best_positions[first].n_sj_tables= sjm->tables;
3527	/*
3528	Do what advance_sj_state did: re-run best_access_path for every table
3529	in the [last_inner_table + 1; pos..) range
3530	*/
3531	double prefix_rec_count;
3532	/ Get the prefix record count /
3533	if (first == join->const_tables)
3534	prefix_rec_count= `1.0`;
3535	else
3536	prefix_rec_count= join->best_positions[first-`1`].prefix_record_count;
3537
3538	/ Add materialization record count/
3539	prefix_rec_count *= sjm->rows;
3540
3541	uint i;
3542	table_map rem_tables= remaining_tables;
3543	for (i= tablenr; i != (first + sjm->tables - `1`); i--)
3544	rem_tables \|= join->best_positions[i].table->table->map;
3545
3546	for (i= first; i < first+ sjm->tables; i++)
3547	join->sjm_scan_tables \|= join->best_positions[i].table->table->map;
3548
3549	POSITION dummy;
3550	join->cur_sj_inner_tables= `0`;
3551	for (i= first + sjm->tables; i <= tablenr; i++)
3552	{
3553	best_access_path(join, join->best_positions[i].table, rem_tables, i,
3554	FALSE, prefix_rec_count,
3555	join->best_positions + i, &dummy);
3556	prefix_rec_count *= join->best_positions[i].records_read;
3557	rem_tables &= ~join->best_positions[i].table->table->map;
3558	}
3559	}
3560
3561	if (pos->sj_strategy == SJ_OPT_FIRST_MATCH)
3562	{
3563	first= pos->firstmatch_picker.first_firstmatch_table;
3564	join->best_positions[first].sj_strategy= SJ_OPT_FIRST_MATCH;
3565	join->best_positions[first].n_sj_tables= tablenr - first + `1`;
3566	POSITION dummy; // For loose scan paths
3567	double record_count= (first== join->const_tables)? `1.0`:
3568	join->best_positions[tablenr - `1`].prefix_record_count;
3569
3570	table_map rem_tables= remaining_tables;
3571	uint idx;
3572	for (idx= first; idx <= tablenr; idx++)
3573	{
3574	rem_tables \|= join->best_positions[idx].table->table->map;
3575	}
3576	/*
3577	Re-run best_access_path to produce best access methods that do not use
3578	join buffering
3579	*/
3580	join->cur_sj_inner_tables= `0`;
3581	for (idx= first; idx <= tablenr; idx++)
3582	{
3583	if (join->best_positions[idx].use_join_buffer)
3584	{
3585	best_access_path(join, join->best_positions[idx].table,
3586	rem_tables, idx, TRUE / no jbuf /,
3587	record_count, join->best_positions + idx, &dummy);
3588	}
3589	record_count *= join->best_positions[idx].records_read;
3590	rem_tables &= ~join->best_positions[idx].table->table->map;
3591	}
3592	}
3593
3594	if (pos->sj_strategy == SJ_OPT_LOOSE_SCAN)
3595	{
3596	first= pos->loosescan_picker.first_loosescan_table;
3597	POSITION *first_pos= join->best_positions + first;
3598	POSITION loose_scan_pos; // For loose scan paths
3599	double record_count= (first== join->const_tables)? `1.0`:
3600	join->best_positions[tablenr - `1`].prefix_record_count;
3601
3602	table_map rem_tables= remaining_tables;
3603	uint idx;
3604	for (idx= first; idx <= tablenr; idx++)
3605	rem_tables \|= join->best_positions[idx].table->table->map;
3606	/*
3607	Re-run best_access_path to produce best access methods that do not use
3608	join buffering
3609	*/
3610	join->cur_sj_inner_tables= `0`;
3611	for (idx= first; idx <= tablenr; idx++)
3612	{
3613	if (join->best_positions[idx].use_join_buffer \|\| (idx == first))
3614	{
3615	best_access_path(join, join->best_positions[idx].table,
3616	rem_tables, idx, TRUE / no jbuf /,
3617	record_count, join->best_positions + idx,
3618	&loose_scan_pos);
3619	if (idx==first)
3620	{
3621	join->best_positions[idx]= loose_scan_pos;
3622	/*
3623	If LooseScan is based on ref access (including the "degenerate"
3624	one with 0 key parts), we should use full index scan.
3625
3626	Unfortunately, lots of code assumes that if tab->type==JT_ALL &&
3627	tab->quick!=NULL, then quick select should be used. The only
3628	simple way to fix this is to remove the quick select:
3629	*/
3630	if (join->best_positions[idx].key)
3631	{
3632	delete join->best_positions[idx].table->quick;
3633	join->best_positions[idx].table->quick= NULL;
3634	}
3635	}
3636	}
3637	rem_tables &= ~join->best_positions[idx].table->table->map;
3638	record_count *= join->best_positions[idx].records_read;
3639	}
3640	first_pos->sj_strategy= SJ_OPT_LOOSE_SCAN;
3641	first_pos->n_sj_tables= my_count_bits(first_pos->table->emb_sj_nest->sj_inner_tables);
3642	}
3643
3644	if (pos->sj_strategy == SJ_OPT_DUPS_WEEDOUT)
3645	{
3646	/*
3647	Duplicate Weedout starting at pos->first_dupsweedout_table, ending at
3648	this table.
3649	*/
3650	first= pos->dups_weedout_picker.first_dupsweedout_table;
3651	join->best_positions[first].sj_strategy= SJ_OPT_DUPS_WEEDOUT;
3652	join->best_positions[first].n_sj_tables= tablenr - first + `1`;
3653	}
3654
3655	uint i_end= first + join->best_positions[first].n_sj_tables;
3656	for (uint i= first; i < i_end; i++)
3657	{
3658	if (i != first)
3659	join->best_positions[i].sj_strategy= SJ_OPT_NONE;
3660	handled_tabs \|= join->best_positions[i].table->table->map;
3661	}
3662
3663	if (tablenr != first)
3664	pos->sj_strategy= SJ_OPT_NONE;
3665	remaining_tables \|= s->table->map;
3666	join->join_tab[first].sj_strategy= join->best_positions[first].sj_strategy;
3667	join->join_tab[first].n_sj_tables= join->best_positions[first].n_sj_tables;
3668	}
3669	}
3670
3671
3672	/*
3673	Setup semi-join materialization strategy for one semi-join nest
3674
3675	SYNOPSIS
3676
3677	setup_sj_materialization()
3678	tab The first tab in the semi-join
3679
3680	DESCRIPTION
3681	Setup execution structures for one semi-join materialization nest:
3682	- Create the materialization temporary table
3683	- If we're going to do index lookups
3684	create TABLE_REF structure to make the lookus
3685	- else (if we're going to do a full scan of the temptable)
3686	create Copy_field structures to do copying.
3687
3688	RETURN
3689	FALSE Ok
3690	TRUE Error
3691	*/
3692
3693	bool setup_sj_materialization_part1(JOIN_TAB *sjm_tab)
3694	{
3695	JOIN_TAB *tab= sjm_tab->bush_children->start;
3696	TABLE_LIST *emb_sj_nest= tab->table->pos_in_table_list->embedding;
3697	SJ_MATERIALIZATION_INFO *sjm;
3698	THD *thd;
3699
3700	DBUG_ENTER("setup_sj_materialization");
3701
3702	/ Walk out of outer join nests until we reach the semi-join nest we're in /
3703	while (!emb_sj_nest->sj_mat_info)
3704	emb_sj_nest= emb_sj_nest->embedding;
3705
3706	sjm= emb_sj_nest->sj_mat_info;
3707	thd= tab->join->thd;
3708	/ First the calls come to the materialization function /
3709
3710	DBUG_ASSERT(sjm->is_used);
3711	/*
3712	Set up the table to write to, do as select_union::create_result_table does
3713	*/
3714	sjm->sjm_table_param.init();
3715	sjm->sjm_table_param.bit_fields_as_long= TRUE;
3716	SELECT_LEX *subq_select= emb_sj_nest->sj_subq_pred->unit->first_select();
3717	const LEX_CSTRING sj_materialize_name= { STRING_WITH_LEN("sj-materialize") };
3718	List_iterator<Item> it(subq_select->item_list);
3719	Item *item;
3720	while((item= it ++))
3721	{
3722	/*
3723	This semi-join replaced the subquery (subq_select) and so on
3724	re-executing it will not be prepared. To use the Items from its
3725	select list we have to prepare (fix_fields) them
3726	*/
3727	if (!item->fixed && item->fix_fields(thd, it.ref()))
3728	DBUG_RETURN(TRUE);
3729	item= (it.ref()); // it can be changed by fix_fields*
3730	DBUG_ASSERT(!item->name.length \|\| item->name.length == strlen(item->name.str));
3731	sjm->sjm_table_cols.push_back(item, thd->mem_root);
3732	}
3733
3734	sjm->sjm_table_param.field_count= subq_select->item_list.elements;
3735	sjm->sjm_table_param.force_not_null_cols= TRUE;
3736
3737	if (!(sjm->table= create_tmp_table(thd, &sjm->sjm_table_param,
3738	sjm->sjm_table_cols, (ORDER*) `0`,
3739	TRUE / distinct /,
3740	`1`, /save_sum_fields/
3741	thd->variables.option_bits \| TMP_TABLE_ALL_COLUMNS,
3742	HA_POS_ERROR /rows_limit /,
3743	&sj_materialize_name)))
3744	DBUG_RETURN(TRUE); / purecov: inspected /
3745	sjm->table->map= emb_sj_nest->nested_join->used_tables;
3746	sjm->table->file->extra(HA_EXTRA_WRITE_CACHE);
3747	sjm->table->file->extra(HA_EXTRA_IGNORE_DUP_KEY);
3748
3749	tab->join->sj_tmp_tables.push_back(sjm->table, thd->mem_root);
3750	tab->join->sjm_info_list.push_back(sjm, thd->mem_root);
3751
3752	sjm->materialized= FALSE;
3753	sjm_tab->table= sjm->table;
3754	sjm->table->pos_in_table_list= emb_sj_nest;
3755
3756	DBUG_RETURN(FALSE);
3757	}
3758
3759	/**
3760	@retval
3761	FALSE ok
3762	TRUE error
3763	*/
3764
3765	bool setup_sj_materialization_part2(JOIN_TAB *sjm_tab)
3766	{
3767	DBUG_ENTER("setup_sj_materialization_part2");
3768	JOIN_TAB *tab= sjm_tab->bush_children->start;
3769	TABLE_LIST *emb_sj_nest= tab->table->pos_in_table_list->embedding;
3770	/ Walk out of outer join nests until we reach the semi-join nest we're in /
3771	while (!emb_sj_nest->sj_mat_info)
3772	emb_sj_nest= emb_sj_nest->embedding;
3773	SJ_MATERIALIZATION_INFO *sjm= emb_sj_nest->sj_mat_info;
3774	THD *thd= tab->join->thd;
3775	uint i;
3776
3777	if (!sjm->is_sj_scan)
3778	{
3779	KEY tmp_key; /* The only index on the temporary table. /
3780	uint tmp_key_parts; / Number of keyparts in tmp_key. /
3781	tmp_key= sjm->table->key_info;
3782	tmp_key_parts= tmp_key->user_defined_key_parts;
3783
3784	/*
3785	Create/initialize everything we will need to index lookups into the
3786	temptable.
3787	*/
3788	TABLE_REF *tab_ref;
3789	tab_ref= &sjm_tab->ref;
3790	tab_ref->key= `0`; / The only temp table index. /
3791	tab_ref->key_length= tmp_key->key_length;
3792	if (!(tab_ref->key_buff=
3793	(uchar) thd->calloc(ALIGN_SIZE(tmp_key->key_length) `2`)) \|\|
3794	!(tab_ref->key_copy=
3795	(store_key) thd->alloc((sizeof*(store_key) *
3796	(tmp_key_parts + `1`)))) \|\|
3797	!(tab_ref->items=
3798	(Item) thd->alloc(sizeof*(Item) * tmp_key_parts)))
3799	DBUG_RETURN(TRUE); / purecov: inspected /
3800
3801	tab_ref->key_buff2=tab_ref->key_buff+ALIGN_SIZE(tmp_key->key_length);
3802	tab_ref->key_err=`1`;
3803	tab_ref->null_rejecting= `1`;
3804	tab_ref->disable_cache= FALSE;
3805
3806	KEY_PART_INFO *cur_key_part= tmp_key->key_part;
3807	store_key **ref_key= tab_ref->key_copy;
3808	uchar *cur_ref_buff= tab_ref->key_buff;
3809
3810	for (i= `0`; i < tmp_key_parts; i++, cur_key_part++, ref_key++)
3811	{
3812	tab_ref->items[i]= emb_sj_nest->sj_subq_pred->left_expr->element_index(i);
3813	int null_count= MY_TEST(cur_key_part->field->real_maybe_null());
3814	ref_key= new* store_key_item (thd, cur_key_part->field,
3815	/ TODO:*
3816	the NULL byte is taken into account in
3817	cur_key_part->store_length, so instead of
3818	cur_ref_buff + MY_TEST(maybe_null), we could
3819	use that information instead.
3820	*/
3821	cur_ref_buff + null_count,
3822	null_count ? cur_ref_buff : `0`,
3823	cur_key_part->length, tab_ref->items[i],
3824	FALSE);
3825	if (!*ref_key)
3826	DBUG_RETURN(TRUE);
3827	cur_ref_buff+= cur_key_part->store_length;
3828	}
3829	ref_key= NULL; /* End marker. /
3830
3831	/*
3832	We don't ever have guarded conditions for SJM tables, but code at SQL
3833	layer depends on cond_guards array being alloced.
3834	*/
3835	if (!(tab_ref->cond_guards= (bool) thd->calloc(sizeof*(uint)*tmp_key_parts)))
3836	{
3837	DBUG_RETURN(TRUE);
3838	}
3839
3840	tab_ref->key_err= `1`;
3841	tab_ref->key_parts= tmp_key_parts;
3842	sjm->tab_ref= tab_ref;
3843
3844	/*
3845	Remove the injected semi-join IN-equalities from join_tab conds. This
3846	needs to be done because the IN-equalities refer to columns of
3847	sj-inner tables which are not available after the materialization
3848	has been finished.
3849	*/
3850	for (i= `0`; i < sjm->tables; i++)
3851	{
3852	if (remove_sj_conds(thd, &tab[i].select_cond) \|\|
3853	(tab[i].select && remove_sj_conds(thd, &tab[i].select->cond)))
3854	DBUG_RETURN(TRUE);
3855	}
3856	if (!(sjm->in_equality= create_subq_in_equalities(thd, sjm,
3857	emb_sj_nest->sj_subq_pred)))
3858	DBUG_RETURN(TRUE); / purecov: inspected /
3859	sjm_tab->type= JT_EQ_REF;
3860	sjm_tab->select_cond= sjm->in_equality;
3861	}
3862	else
3863	{
3864	/*
3865	We'll be doing full scan of the temptable.
3866	Setup copying of temptable columns back to the record buffers
3867	for their source tables. We need this because IN-equalities
3868	refer to the original tables.
3869
3870	EXAMPLE
3871
3872	Consider the query:
3873	SELECT FROM ot WHERE ot.col1 IN (SELECT it.col2 FROM it)*
3874
3875	Suppose it's executed with SJ-Materialization-scan. We choose to do scan
3876	if we can't do the lookup, i.e. the join order is (it, ot). The plan
3877	would look as follows:
3878
3879	table access method condition
3880	it materialize+scan -
3881	ot (whatever) ot1.col1=it.col2 (C2)
3882
3883	The condition C2 refers to current row of table it. The problem is
3884	that by the time we evaluate C2, we would have finished with scanning
3885	it itself and will be scanning the temptable.
3886
3887	At the moment, our solution is to copy back: when we get the next
3888	temptable record, we copy its columns to their corresponding columns
3889	in the record buffers for the source tables.
3890	*/
3891	if (!(sjm->copy_field= new Copy_field[sjm->sjm_table_cols.elements]))
3892	DBUG_RETURN(TRUE);
3893
3894	//it.rewind();
3895	Ref_ptr_array p_items= emb_sj_nest->sj_subq_pred->unit->first_select()->ref_pointer_array;
3896	for (uint i=`0`; i < sjm->sjm_table_cols.elements; i++)
3897	{
3898	bool dummy;
3899	Item_equal *item_eq;
3900	//Item item= (it++)->real_item();*
3901	Item *item= p_items [i]->real_item();
3902	DBUG_ASSERT(item->type() == Item::FIELD_ITEM);
3903	Field copy_to= ((Item_field)item)->field;
3904	/*
3905	Tricks with Item_equal are due to the following: suppose we have a
3906	query:
3907
3908	... WHERE cond(ot.col) AND ot.col IN (SELECT it2.col FROM it1,it2
3909	WHERE it1.col= it2.col)
3910	then equality propagation will create an
3911
3912	Item_equal(it1.col, it2.col, ot.col)
3913
3914	then substitute_for_best_equal_field() will change the conditions
3915	according to the join order:
3916
3917	table \| attached condition
3918	------+--------------------
3919	it1 \|
3920	it2 \| it1.col=it2.col
3921	ot \| cond(it1.col)
3922
3923	although we've originally had "SELECT it2.col", conditions attached
3924	to subsequent outer tables will refer to it1.col, so SJM-Scan will
3925	need to unpack data to there.
3926	That is, if an element from subquery's select list participates in
3927	equality propagation, then we need to unpack it to the first
3928	element equality propagation member that refers to table that is
3929	within the subquery.
3930	*/
3931	item_eq= find_item_equal(tab->join->cond_equal, copy_to, &dummy);
3932
3933	if (item_eq)
3934	{
3935	List_iterator<Item> it(item_eq->equal_items);
3936	/ We're interested in field items only /
3937	if (item_eq->get_const())
3938	it ++;
3939	Item *item;
3940	while ((item= it ++))
3941	{
3942	if (!(item->used_tables() & ~emb_sj_nest->sj_inner_tables))
3943	{
3944	DBUG_ASSERT(item->real_item()->type() == Item::FIELD_ITEM);
3945	copy_to= ((Item_field *) (item->real_item()))->field;
3946	break;
3947	}
3948	}
3949	}
3950	sjm->copy_field[i].set(copy_to, sjm->table->field[i], FALSE);
3951	/ The write_set for source tables must be set up to allow the copying /
3952	bitmap_set_bit(copy_to->table->write_set, copy_to->field_index);
3953	}
3954	sjm_tab->type= JT_ALL;
3955
3956	/ Initialize full scan /
3957	sjm_tab->read_first_record= join_read_record_no_init;
3958	sjm_tab->read_record.copy_field= sjm->copy_field;
3959	sjm_tab->read_record.copy_field_end= sjm->copy_field +
3960	sjm->sjm_table_cols.elements;
3961	sjm_tab->read_record.read_record_func= rr_sequential_and_unpack;
3962	}
3963
3964	sjm_tab->bush_children->end[-`1`].next_select= end_sj_materialize;
3965
3966	DBUG_RETURN(FALSE);
3967	}
3968
3969
3970
3971	/*
3972	Create subquery IN-equalities assuming use of materialization strategy
3973
3974	SYNOPSIS
3975	create_subq_in_equalities()
3976	thd Thread handle
3977	sjm Semi-join materialization structure
3978	subq_pred The subquery predicate
3979
3980	DESCRIPTION
3981	Create subquery IN-equality predicates. That is, for a subquery
3982
3983	(oe1, oe2, ...) IN (SELECT ie1, ie2, ... FROM ...)
3984
3985	create "oe1=ie1 AND ie1=ie2 AND ..." expression, such that ie1, ie2, ..
3986	refer to the columns of the table that's used to materialize the
3987	subquery.
3988
3989	RETURN
3990	Created condition
3991	*/
3992
3993	static Item create_subq_in_equalities(THD thd, SJ_MATERIALIZATION_INFO *sjm,
3994	Item_in_subselect *subq_pred)
3995	{
3996	Item *res= NULL;
3997	if (subq_pred->left_expr->cols() == `1`)
3998	{
3999	if (!(res= new (thd->mem_root) Item_func_eq (thd, subq_pred->left_expr,
4000	new (thd->mem_root) Item_field (thd, sjm->table->field[`0`]))))
4001	return NULL; / purecov: inspected /
4002	}
4003	else
4004	{
4005	Item *conj;
4006	for (uint i= `0`; i < subq_pred->left_expr->cols(); i++)
4007	{
4008	if (!(conj= new (thd->mem_root) Item_func_eq (thd, subq_pred->left_expr->element_index(i),
4009	new (thd->mem_root) Item_field (thd, sjm->table->field[i]))) \|\|
4010	!(res= and_items(thd, res, conj)))
4011	return NULL; / purecov: inspected /
4012	}
4013	}
4014	if (res->fix_fields(thd, &res))
4015	return NULL; / purecov: inspected /
4016	return res;
4017	}
4018
4019
4020	/**
4021	@retval
4022	0 ok
4023	1 error
4024	*/
4025
4026	static bool remove_sj_conds(THD thd, Item *tree)
4027	{
4028	if (*tree)
4029	{
4030	if (is_cond_sj_in_equality(*tree))
4031	{
4032	*tree= NULL;
4033	return `0`;
4034	}
4035	else if ((*tree)->type() == Item::COND_ITEM)
4036	{
4037	Item *item;
4038	List_iterator<Item> li((((Item_cond)*tree)->argument_list()));
4039	while ((item= li ++))
4040	{
4041	if (is_cond_sj_in_equality(item))
4042	{
4043	Item_int tmp= new* (thd->mem_root) Item_int (thd, `1`);
4044	if (!tmp)
4045	return `1`;
4046	li.replace(tmp);
4047	}
4048	}
4049	}
4050	}
4051	return `0`;
4052	}
4053
4054
4055	/ Check if given Item was injected by semi-join equality /
4056	static bool is_cond_sj_in_equality(Item *item)
4057	{
4058	if (item->type() == Item::FUNC_ITEM &&
4059	((Item_func*)item)->functype()== Item_func::EQ_FUNC)
4060	{
4061	Item_func_eq item_eq= (Item_func_eq)item;
4062	return MY_TEST(item_eq->in_equality_no != UINT_MAX);
4063	}
4064	return FALSE;
4065	}
4066
4067
4068	/*
4069	Create a temporary table to weed out duplicate rowid combinations
4070
4071	SYNOPSIS
4072
4073	create_sj_weedout_tmp_table()
4074	thd Thread handle
4075
4076	DESCRIPTION
4077	Create a temporary table to weed out duplicate rowid combinations. The
4078	table has a single column that is a concatenation of all rowids in the
4079	combination.
4080
4081	Depending on the needed length, there are two cases:
4082
4083	1. When the length of the column < max_key_length:
4084
4085	CREATE TABLE tmp (col VARBINARY(n) NOT NULL, UNIQUE KEY(col));
4086
4087	2. Otherwise (not a valid SQL syntax but internally supported):
4088
4089	CREATE TABLE tmp (col VARBINARY NOT NULL, UNIQUE CONSTRAINT(col));
4090
4091	The code in this function was produced by extraction of relevant parts
4092	from create_tmp_table().
4093
4094	RETURN
4095	created table
4096	NULL on error
4097	*/
4098
4099	bool
4100	SJ_TMP_TABLE::create_sj_weedout_tmp_table(THD *thd)
4101	{
4102	MEM_ROOT *mem_root_save, own_root;
4103	TABLE *table;
4104	TABLE_SHARE *share;
4105	uint temp_pool_slot=MY_BIT_NONE;
4106	char *tmpname,path[FN_REFLEN];
4107	Field **reg_field;
4108	KEY_PART_INFO *key_part_info;
4109	KEY *keyinfo;
4110	uchar *group_buff;
4111	uchar *bitmaps;
4112	uint *blob_field;
4113	bool using_unique_constraint=FALSE;
4114	bool use_packed_rows= FALSE;
4115	Field field, key_field;
4116	uint null_pack_length, null_count;
4117	uchar *null_flags;
4118	uchar *pos;
4119	DBUG_ENTER("create_sj_weedout_tmp_table");
4120	DBUG_ASSERT(!is_degenerate);
4121
4122	tmp_table= NULL;
4123	uint uniq_tuple_length_arg= rowid_len + null_bytes;
4124	/*
4125	STEP 1: Get temporary table name
4126	*/
4127	if (use_temp_pool && !(test_flags & TEST_KEEP_TMP_TABLES))
4128	temp_pool_slot = bitmap_lock_set_next(&temp_pool);
4129
4130	if (temp_pool_slot != MY_BIT_NONE) // we got a slot
4131	sprintf(path, "%s_%lx_%i", tmp_file_prefix,
4132	current_pid, temp_pool_slot);
4133	else
4134	{
4135	/ if we run out of slots or we are not using tempool /
4136	sprintf(path,"%s%lx_%lx_%x", tmp_file_prefix,current_pid,
4137	(ulong) thd->thread_id, thd->tmp_table++);
4138	}
4139	fn_format(path, path, mysql_tmpdir, "", MY_REPLACE_EXT\|MY_UNPACK_FILENAME);
4140
4141	/ STEP 2: Figure if we'll be using a key or blob+constraint /
4142	/ it always has my_charset_bin, so mbmaxlen==1 /
4143	if (uniq_tuple_length_arg >= CONVERT_IF_BIGGER_TO_BLOB)
4144	using_unique_constraint= TRUE;
4145
4146	/ STEP 3: Allocate memory for temptable description /
4147	init_sql_alloc(&own_root, "SJ_TMP_TABLE",
4148	TABLE_ALLOC_BLOCK_SIZE, `0`, MYF(MY_THREAD_SPECIFIC));
4149	if (!multi_alloc_root(&own_root,
4150	&table, sizeof(*table),
4151	&share, sizeof(*share),
4152	&reg_field, sizeof(Field) (`1`+`1`),
4153	&blob_field, sizeof(uint)*`2`,
4154	&keyinfo, sizeof(*keyinfo),
4155	&key_part_info, sizeof(key_part_info) `2`,
4156	&start_recinfo,
4157	sizeof(recinfo)(`1`*`2`+`4`),
4158	&tmpname, (uint) strlen(path)+`1`,
4159	&group_buff, (!using_unique_constraint ?
4160	uniq_tuple_length_arg : `0`),
4161	&bitmaps, bitmap_buffer_size(`1`)*`6`,
4162	NullS))
4163	{
4164	if (temp_pool_slot != MY_BIT_NONE)
4165	bitmap_lock_clear_bit(&temp_pool, temp_pool_slot);
4166	DBUG_RETURN(TRUE);
4167	}
4168	strmov(tmpname,path);
4169
4170
4171	/ STEP 4: Create TABLE description /
4172	bzero((char) table,sizeof(table));
4173	bzero((char) reg_field,sizeof(Field)*`2`);
4174
4175	table->mem_root = own_root;
4176	mem_root_save= thd->mem_root;
4177	thd->mem_root= &table->mem_root;
4178
4179	table->field=reg_field;
4180	table->alias.set("weedout-tmp", sizeof("weedout-tmp")-`1`,
4181	table_alias_charset);
4182	table->reginfo.lock_type=TL_WRITE; / Will be updated /
4183	table->db_stat=HA_OPEN_KEYFILE;
4184	table->map=`1`;
4185	table->temp_pool_slot = temp_pool_slot;
4186	table->copy_blobs= `1`;
4187	table->in_use= thd;
4188	table->quick_keys.init();
4189	table->covering_keys.init();
4190	table->keys_in_use_for_query.init();
4191
4192	table->s= share;
4193	init_tmp_table_share(thd, share, "", `0`, tmpname, tmpname);
4194	share->blob_field= blob_field;
4195	share->table_charset= NULL;
4196	share->primary_key= MAX_KEY; // Indicate no primary key
4197	share->keys_for_keyread.init();
4198	share->keys_in_use.init();
4199
4200	/ Create the field /
4201	{
4202	LEX_CSTRING field_name= {STRING_WITH_LEN("rowids") };
4203	/*
4204	For the sake of uniformity, always use Field_varstring (altough we could
4205	use Field_string for shorter keys)
4206	*/
4207	field= new Field_varstring (uniq_tuple_length_arg, FALSE, &field_name,
4208	share, &my_charset_bin);
4209	if (!field)
4210	DBUG_RETURN(`0`);
4211	field->table= table;
4212	field->key_start.init(`0`);
4213	field->part_of_key.init(`0`);
4214	field->part_of_sortkey.init(`0`);
4215	field->unireg_check= Field::NONE;
4216	field->flags= (NOT_NULL_FLAG \| BINARY_FLAG \| NO_DEFAULT_VALUE_FLAG);
4217	field->reset_fields();
4218	field->init(table);
4219	field->orig_table= NULL;
4220
4221	field->field_index= `0`;
4222
4223	*(reg_field++)= field;
4224	*blob_field= `0`;
4225	*reg_field= `0`;
4226
4227	share->fields= `1`;
4228	share->blob_fields= `0`;
4229	}
4230
4231	uint reclength= field->pack_length();
4232	if (using_unique_constraint)
4233	{
4234	share->db_plugin= ha_lock_engine(`0`, TMP_ENGINE_HTON);
4235	table->file= get_new_handler(share, &table->mem_root,
4236	share->db_type());
4237	}
4238	else
4239	{
4240	share->db_plugin= ha_lock_engine(`0`, heap_hton);
4241	table->file= get_new_handler(share, &table->mem_root,
4242	share->db_type());
4243	DBUG_ASSERT(!table->file \|\| uniq_tuple_length_arg <= table->file->max_key_length());
4244	}
4245	if (!table->file)
4246	goto err;
4247
4248	if (table->file->set_ha_share_ref(&share->ha_share))
4249	{
4250	delete table->file;
4251	goto err;
4252	}
4253
4254	null_count=`1`;
4255
4256	null_pack_length= `1`;
4257	reclength += null_pack_length;
4258
4259	share->reclength= reclength;
4260	{
4261	uint alloc_length=ALIGN_SIZE(share->reclength + MI_UNIQUE_HASH_LENGTH+`1`);
4262	share->rec_buff_length= alloc_length;
4263	if (!(table->record[`0`]= (uchar*)
4264	alloc_root(&table->mem_root, alloc_length*`3`)))
4265	goto err;
4266	table->record[`1`]= table->record[`0`]+alloc_length;
4267	share->default_values= table->record[`1`]+alloc_length;
4268	}
4269	setup_tmp_table_column_bitmaps(table, bitmaps);
4270
4271	recinfo= start_recinfo;
4272	null_flags=(uchar*) table->record[`0`];
4273	pos=table->record[`0`]+ null_pack_length;
4274	if (null_pack_length)
4275	{
4276	bzero((uchar) recinfo,sizeof(recinfo));
4277	recinfo->type=FIELD_NORMAL;
4278	recinfo->length=null_pack_length;
4279	recinfo++;
4280	bfill(null_flags,null_pack_length,`255`); // Set null fields
4281
4282	table->null_flags= (uchar*) table->record[`0`];
4283	share->null_fields= null_count;
4284	share->null_bytes= null_pack_length;
4285	}
4286	null_count=`1`;
4287
4288	{
4289	//Field field= reg_field;
4290	uint length;
4291	bzero((uchar) recinfo,sizeof(recinfo));
4292	field->move_field(pos,(uchar*) `0`,`0`);
4293
4294	field->reset();
4295	/*
4296	Test if there is a default field value. The test for ->ptr is to skip
4297	'offset' fields generated by initalize_tables
4298	*/
4299	// Initialize the table field:
4300	bzero(field->ptr, field->pack_length());
4301
4302	length=field->pack_length();
4303	pos+= length;
4304
4305	/ Make entry for create table /
4306	recinfo->length=length;
4307	if (field->flags & BLOB_FLAG)
4308	recinfo->type= FIELD_BLOB;
4309	else if (use_packed_rows &&
4310	field->real_type() == MYSQL_TYPE_STRING &&
4311	length >= MIN_STRING_LENGTH_TO_PACK_ROWS)
4312	recinfo->type=FIELD_SKIP_ENDSPACE;
4313	else
4314	recinfo->type=FIELD_NORMAL;
4315
4316	field->set_table_name(&table->alias);
4317	}
4318
4319	if (thd->variables.tmp_memory_table_size == ~ (ulonglong) `0`) // No limit
4320	share->max_rows= ~(ha_rows) `0`;
4321	else
4322	share->max_rows= (ha_rows) (((share->db_type() == heap_hton) ?
4323	MY_MIN(thd->variables.tmp_memory_table_size,
4324	thd->variables.max_heap_table_size) :
4325	thd->variables.tmp_memory_table_size) /
4326	share->reclength);
4327	set_if_bigger(share->max_rows,`1`); // For dummy start options
4328
4329
4330	//// keyinfo= param->keyinfo;
4331	if (TRUE)
4332	{
4333	DBUG_PRINT("info",("Creating group key in temporary table"));
4334	share->keys=`1`;
4335	share->uniques= MY_TEST(using_unique_constraint);
4336	table->key_info=keyinfo;
4337	keyinfo->key_part=key_part_info;
4338	keyinfo->flags=HA_NOSAME;
4339	keyinfo->usable_key_parts= keyinfo->user_defined_key_parts= `1`;
4340	keyinfo->key_length=`0`;
4341	keyinfo->rec_per_key=`0`;
4342	keyinfo->algorithm= HA_KEY_ALG_UNDEF;
4343	keyinfo->name = weedout_key;
4344	{
4345	key_part_info->null_bit=`0`;
4346	key_part_info->field= field;
4347	key_part_info->offset= field->offset(table->record[`0`]);
4348	key_part_info->length= (uint16) field->key_length();
4349	key_part_info->type= (uint8) field->key_type();
4350	key_part_info->key_type = FIELDFLAG_BINARY;
4351	if (!using_unique_constraint)
4352	{
4353	if (!(key_field= field->new_key_field(thd->mem_root, table,
4354	group_buff,
4355	key_part_info->length,
4356	field->null_ptr,
4357	field->null_bit)))
4358	goto err;
4359	}
4360	keyinfo->key_length+= key_part_info->length;
4361	}
4362	}
4363
4364	if (unlikely(thd->is_fatal_error)) // If end of memory
4365	goto err;
4366	share->db_record_offset= `1`;
4367	table->no_rows= `1`; // We don't need the data
4368
4369	// recinfo must point after last field
4370	recinfo++;
4371	if (share->db_type() == TMP_ENGINE_HTON)
4372	{
4373	if (unlikely(create_internal_tmp_table(table, keyinfo, start_recinfo,
4374	&recinfo, `0`)))
4375	goto err;
4376	}
4377	if (unlikely(open_tmp_table(table)))
4378	goto err;
4379
4380	thd->mem_root= mem_root_save;
4381	tmp_table= table;
4382	DBUG_RETURN(FALSE);
4383
4384	err:
4385	thd->mem_root= mem_root_save;
4386	free_tmp_table(thd,table); / purecov: inspected /
4387	if (temp_pool_slot != MY_BIT_NONE)
4388	bitmap_lock_clear_bit(&temp_pool, temp_pool_slot);
4389	DBUG_RETURN(TRUE); / purecov: inspected /
4390	}
4391
4392
4393	/*
4394	SemiJoinDuplicateElimination: Reset the temporary table
4395	*/
4396
4397	int SJ_TMP_TABLE::sj_weedout_delete_rows()
4398	{
4399	DBUG_ENTER("SJ_TMP_TABLE::sj_weedout_delete_rows");
4400	if (tmp_table)
4401	{
4402	int rc= tmp_table->file->ha_delete_all_rows();
4403	DBUG_RETURN(rc);
4404	}
4405	have_degenerate_row= FALSE;
4406	DBUG_RETURN(`0`);
4407	}
4408
4409
4410	/*
4411	SemiJoinDuplicateElimination: Weed out duplicate row combinations
4412
4413	SYNPOSIS
4414	sj_weedout_check_row()
4415	thd Thread handle
4416
4417	DESCRIPTION
4418	Try storing current record combination of outer tables (i.e. their
4419	rowids) in the temporary table. This records the fact that we've seen
4420	this record combination and also tells us if we've seen it before.
4421
4422	RETURN
4423	-1 Error
4424	1 The row combination is a duplicate (discard it)
4425	0 The row combination is not a duplicate (continue)
4426	*/
4427
4428	int SJ_TMP_TABLE::sj_weedout_check_row(THD *thd)
4429	{
4430	int error;
4431	SJ_TMP_TABLE::TAB *tab= tabs;
4432	SJ_TMP_TABLE::TAB *tab_end= tabs_end;
4433	uchar *ptr;
4434	uchar *nulls_ptr;
4435
4436	DBUG_ENTER("SJ_TMP_TABLE::sj_weedout_check_row");
4437
4438	if (is_degenerate)
4439	{
4440	if (have_degenerate_row)
4441	DBUG_RETURN(`1`);
4442
4443	have_degenerate_row= TRUE;
4444	DBUG_RETURN(`0`);
4445	}
4446
4447	ptr= tmp_table->record[`0`] + `1`;
4448
4449	/ Put the the rowids tuple into table->record[0]: /
4450
4451	// 1. Store the length
4452	if (((Field_varstring*)(tmp_table->field[`0`]))->length_bytes == `1`)
4453	{
4454	*ptr= (uchar)(rowid_len + null_bytes);
4455	ptr++;
4456	}
4457	else
4458	{
4459	int2store(ptr, rowid_len + null_bytes);
4460	ptr += `2`;
4461	}
4462
4463	nulls_ptr= ptr;
4464	// 2. Zero the null bytes
4465	if (null_bytes)
4466	{
4467	bzero(ptr, null_bytes);
4468	ptr += null_bytes;
4469	}
4470
4471	// 3. Put the rowids
4472	for (uint i=`0`; tab != tab_end; tab++, i++)
4473	{
4474	handler *h= tab->join_tab->table->file;
4475	if (tab->join_tab->table->maybe_null && tab->join_tab->table->null_row)
4476	{
4477	/ It's a NULL-complemented row /
4478	*(nulls_ptr + tab->null_byte) \|= tab->null_bit;
4479	bzero(ptr + tab->rowid_offset, h->ref_length);
4480	}
4481	else
4482	{
4483	/ Copy the rowid value /
4484	memcpy(ptr + tab->rowid_offset, h->ref, h->ref_length);
4485	}
4486	}
4487
4488	error= tmp_table->file->ha_write_tmp_row(tmp_table->record[`0`]);
4489	if (unlikely(error))
4490	{
4491	/ create_internal_tmp_table_from_heap will generate error if needed /
4492	if (!tmp_table->file->is_fatal_error(error, HA_CHECK_DUP))
4493	DBUG_RETURN(`1`); / Duplicate /
4494
4495	bool is_duplicate;
4496	if (create_internal_tmp_table_from_heap(thd, tmp_table, start_recinfo,
4497	&recinfo, error, `1`, &is_duplicate))
4498	DBUG_RETURN(-`1`);
4499	if (is_duplicate)
4500	DBUG_RETURN(`1`);
4501	}
4502	DBUG_RETURN(`0`);
4503	}
4504
4505
4506	int init_dups_weedout(JOIN join, uint first_table, int* first_fanout_table, uint n_tables)
4507	{
4508	THD *thd= join->thd;
4509	DBUG_ENTER("init_dups_weedout");
4510	SJ_TMP_TABLE::TAB sjtabs[MAX_TABLES];
4511	SJ_TMP_TABLE::TAB *last_tab= sjtabs;
4512	uint jt_rowid_offset= `0`; // # tuple bytes are already occupied (w/o NULL bytes)
4513	uint jt_null_bits= `0`; // # null bits in tuple bytes
4514	/*
4515	Walk through the range and remember
4516	- tables that need their rowids to be put into temptable
4517	- the last outer table
4518	*/
4519	for (JOIN_TAB *j=join->join_tab + first_table;
4520	j < join->join_tab + first_table + n_tables; j++)
4521	{
4522	if (sj_table_is_included(join, j))
4523	{
4524	last_tab->join_tab= j;
4525	last_tab->rowid_offset= jt_rowid_offset;
4526	jt_rowid_offset += j->table->file->ref_length;
4527	if (j->table->maybe_null)
4528	{
4529	last_tab->null_byte= jt_null_bits / `8`;
4530	last_tab->null_bit= jt_null_bits++;
4531	}
4532	last_tab++;
4533	j->table->prepare_for_position();
4534	j->keep_current_rowid= TRUE;
4535	}
4536	}
4537
4538	SJ_TMP_TABLE *sjtbl;
4539	if (jt_rowid_offset) / Temptable has at least one rowid /
4540	{
4541	size_t tabs_size= (last_tab - sjtabs) * sizeof(SJ_TMP_TABLE::TAB);
4542	if (!(sjtbl= (SJ_TMP_TABLE)thd->alloc(sizeof*(SJ_TMP_TABLE))) \|\|
4543	!(sjtbl->tabs= (SJ_TMP_TABLE::TAB*) thd->alloc(tabs_size)))
4544	DBUG_RETURN(TRUE); / purecov: inspected /
4545	memcpy(sjtbl->tabs, sjtabs, tabs_size);
4546	sjtbl->is_degenerate= FALSE;
4547	sjtbl->tabs_end= sjtbl->tabs + (last_tab - sjtabs);
4548	sjtbl->rowid_len= jt_rowid_offset;
4549	sjtbl->null_bits= jt_null_bits;
4550	sjtbl->null_bytes= (jt_null_bits + `7`)/`8`;
4551	if (sjtbl->create_sj_weedout_tmp_table(thd))
4552	DBUG_RETURN(TRUE);
4553	join->sj_tmp_tables.push_back(sjtbl->tmp_table, thd->mem_root);
4554	}
4555	else
4556	{
4557	/*
4558	This is a special case where the entire subquery predicate does
4559	not depend on anything at all, ie this is
4560	WHERE const IN (uncorrelated select)
4561	*/
4562	if (!(sjtbl= (SJ_TMP_TABLE)thd->alloc(sizeof*(SJ_TMP_TABLE))))
4563	DBUG_RETURN(TRUE); / purecov: inspected /
4564	sjtbl->tmp_table= NULL;
4565	sjtbl->is_degenerate= TRUE;
4566	sjtbl->have_degenerate_row= FALSE;
4567	}
4568
4569	sjtbl->next_flush_table= join->join_tab[first_table].flush_weedout_table;
4570	join->join_tab[first_table].flush_weedout_table= sjtbl;
4571	join->join_tab[first_fanout_table].first_weedout_table= sjtbl;
4572	join->join_tab[first_table + n_tables - `1`].check_weed_out_table= sjtbl;
4573	DBUG_RETURN(`0`);
4574	}
4575
4576
4577	/*
4578	@brief
4579	Set up semi-join Loose Scan strategy for execution
4580
4581	@detail
4582	Other strategies are done in setup_semijoin_dups_elimination(),
4583	however, we need to set up Loose Scan earlier, before make_join_select is
4584	called. This is to prevent make_join_select() from switching full index
4585	scans into quick selects (which will break Loose Scan access).
4586
4587	@return
4588	0 OK
4589	1 Error
4590	*/
4591
4592	int setup_semijoin_loosescan(JOIN *join)
4593	{
4594	uint i;
4595	DBUG_ENTER("setup_semijoin_loosescan");
4596
4597	POSITION *pos= join->best_positions + join->const_tables;
4598	for (i= join->const_tables ; i < join->top_join_tab_count; )
4599	{
4600	JOIN_TAB *tab=join->join_tab + i;
4601	switch (pos->sj_strategy) {
4602	case SJ_OPT_MATERIALIZE:
4603	case SJ_OPT_MATERIALIZE_SCAN:
4604	i+= `1`; / join tabs are embedded in the nest /
4605	pos += pos->n_sj_tables;
4606	break;
4607	case SJ_OPT_LOOSE_SCAN:
4608	{
4609	/ We jump from the last table to the first one /
4610	tab->loosescan_match_tab= tab + pos->n_sj_tables - `1`;
4611
4612	/ LooseScan requires records to be produced in order /
4613	if (tab->select && tab->select->quick)
4614	tab->select->quick->need_sorted_output();
4615
4616	for (uint j= i; j < i + pos->n_sj_tables; j++)
4617	join->join_tab[j].inside_loosescan_range= TRUE;
4618
4619	/ Calculate key length /
4620	uint keylen= `0`;
4621	uint keyno= pos->loosescan_picker.loosescan_key;
4622	for (uint kp=`0`; kp < pos->loosescan_picker.loosescan_parts; kp++)
4623	keylen += tab->table->key_info[keyno].key_part[kp].store_length;
4624
4625	tab->loosescan_key= keyno;
4626	tab->loosescan_key_len= keylen;
4627	if (pos->n_sj_tables > `1`)
4628	tab[pos->n_sj_tables - `1`].do_firstmatch= tab;
4629	i+= pos->n_sj_tables;
4630	pos+= pos->n_sj_tables;
4631	break;
4632	}
4633	default:
4634	{
4635	i++;
4636	pos++;
4637	break;
4638	}
4639	}
4640	}
4641	DBUG_RETURN(FALSE);
4642	}
4643
4644
4645	/*
4646	Setup the strategies to eliminate semi-join duplicates.
4647
4648	SYNOPSIS
4649	setup_semijoin_dups_elimination()
4650	join Join to process
4651	options Join options (needed to see if join buffering will be
4652	used or not)
4653	no_jbuf_after Another bit of information re where join buffering will
4654	be used.
4655
4656	DESCRIPTION
4657	Setup the strategies to eliminate semi-join duplicates. ATM there are 4
4658	strategies:
4659
4660	1. DuplicateWeedout (use of temptable to remove duplicates based on rowids
4661	of row combinations)
4662	2. FirstMatch (pick only the 1st matching row combination of inner tables)
4663	3. LooseScan (scanning the sj-inner table in a way that groups duplicates
4664	together and picking the 1st one)
4665	4. SJ-Materialization.
4666
4667	The join order has "duplicate-generating ranges", and every range is
4668	served by one strategy or a combination of FirstMatch with with some
4669	other strategy.
4670
4671	"Duplicate-generating range" is defined as a range within the join order
4672	that contains all of the inner tables of a semi-join. All ranges must be
4673	disjoint, if tables of several semi-joins are interleaved, then the ranges
4674	are joined together, which is equivalent to converting
4675	SELECT ... WHERE oe1 IN (SELECT ie1 ...) AND oe2 IN (SELECT ie2 )
4676	to
4677	SELECT ... WHERE (oe1, oe2) IN (SELECT ie1, ie2 ... ...)
4678	.
4679
4680	Applicability conditions are as follows:
4681
4682	DuplicateWeedout strategy
4683	~~~~~~~~~~~~~~~~~~~~~~~~~
4684
4685	(ot\|nt)* [ it ((it\|ot\|nt)* (it\|ot))] (nt)*
4686	+------+ +=========================+ +---+
4687	(1) (2) (3)
4688
4689	(1) - Prefix of OuterTables (those that participate in
4690	IN-equality and/or are correlated with subquery) and outer
4691	Non-correlated tables.
4692	(2) - The handled range. The range starts with the first sj-inner
4693	table, and covers all sj-inner and outer tables
4694	Within the range, Inner, Outer, outer non-correlated tables
4695	may follow in any order.
4696	(3) - The suffix of outer non-correlated tables.
4697
4698	FirstMatch strategy
4699	~~~~~~~~~~~~~~~~~~~
4700
4701	(ot\|nt)* [ it ((it\|nt)* it) ] (nt)*
4702	+------+ +==================+ +---+
4703	(1) (2) (3)
4704
4705	(1) - Prefix of outer and non-correlated tables
4706	(2) - The handled range, which may contain only inner and
4707	non-correlated tables.
4708	(3) - The suffix of outer non-correlated tables.
4709
4710	LooseScan strategy
4711	~~~~~~~~~~~~~~~~~~
4712
4713	(ot\|ct\|nt) [ loosescan_tbl (ot\|nt\|it)* it ] (ot\|nt)*
4714	+--------+ +===========+ +=============+ +------+
4715	(1) (2) (3) (4)
4716
4717	(1) - Prefix that may contain any outer tables. The prefix must contain
4718	all the non-trivially correlated outer tables. (non-trivially means
4719	that the correlation is not just through the IN-equality).
4720
4721	(2) - Inner table for which the LooseScan scan is performed.
4722
4723	(3) - The remainder of the duplicate-generating range. It is served by
4724	application of FirstMatch strategy, with the exception that
4725	outer IN-correlated tables are considered to be non-correlated.
4726
4727	(4) - THe suffix of outer and outer non-correlated tables.
4728
4729
4730	The choice between the strategies is made by the join optimizer (see
4731	advance_sj_state() and fix_semijoin_strategies_for_picked_join_order()).
4732	This function sets up all fields/structures/etc needed for execution except
4733	for setup/initialization of semi-join materialization which is done in
4734	setup_sj_materialization() (todo: can't we move that to here also?)
4735
4736	RETURN
4737	FALSE OK
4738	TRUE Out of memory error
4739	*/
4740
4741	int setup_semijoin_dups_elimination(JOIN *join, ulonglong options,
4742	uint no_jbuf_after)
4743	{
4744	uint i;
4745	DBUG_ENTER("setup_semijoin_dups_elimination");
4746
4747	join->complex_firstmatch_tables= table_map(`0`);
4748
4749	POSITION *pos= join->best_positions + join->const_tables;
4750	for (i= join->const_tables ; i < join->top_join_tab_count; )
4751	{
4752	JOIN_TAB *tab=join->join_tab + i;
4753	switch (pos->sj_strategy) {
4754	case SJ_OPT_MATERIALIZE:
4755	case SJ_OPT_MATERIALIZE_SCAN:
4756	/ Do nothing /
4757	i+= `1`;// It used to be pos->n_sj_tables, but now they are embedded in a nest
4758	pos += pos->n_sj_tables;
4759	break;
4760	case SJ_OPT_LOOSE_SCAN:
4761	{
4762	/ Setup already handled by setup_semijoin_loosescan /
4763	i+= pos->n_sj_tables;
4764	pos+= pos->n_sj_tables;
4765	break;
4766	}
4767	case SJ_OPT_DUPS_WEEDOUT:
4768	{
4769	/*
4770	Check for join buffering. If there is one, move the first table
4771	forwards, but do not destroy other duplicate elimination methods.
4772	*/
4773	uint first_table= i;
4774
4775	uint join_cache_level= join->thd->variables.join_cache_level;
4776	for (uint j= i; j < i + pos->n_sj_tables; j++)
4777	{
4778	/*
4779	When we'll properly take join buffering into account during
4780	join optimization, the below check should be changed to
4781	"if (join->best_positions[j].use_join_buffer &&
4782	j <= no_jbuf_after)".
4783	For now, use a rough criteria:
4784	*/
4785	JOIN_TAB *js_tab=join->join_tab + j;
4786	if (j != join->const_tables && js_tab->use_quick != `2` &&
4787	j <= no_jbuf_after &&
4788	((js_tab->type == JT_ALL && join_cache_level != `0`) \|\|
4789	(join_cache_level > `2` && (js_tab->type == JT_REF \|\|
4790	js_tab->type == JT_EQ_REF))))
4791	{
4792	/ Looks like we'll be using join buffer /
4793	first_table= join->const_tables;
4794	/*
4795	Make sure that possible sorting of rows from the head table
4796	is not to be employed.
4797	*/
4798	if (join->get_sort_by_join_tab())
4799	{
4800	join->simple_order= `0`;
4801	join->simple_group= `0`;
4802	join->need_tmp= join->test_if_need_tmp_table();
4803	}
4804	break;
4805	}
4806	}
4807
4808	init_dups_weedout(join, first_table, i, i + pos->n_sj_tables - first_table);
4809	i+= pos->n_sj_tables;
4810	pos+= pos->n_sj_tables;
4811	break;
4812	}
4813	case SJ_OPT_FIRST_MATCH:
4814	{
4815	JOIN_TAB *j;
4816	JOIN_TAB *jump_to= tab-`1`;
4817
4818	bool complex_range= FALSE;
4819	table_map tables_in_range= table_map(`0`);
4820
4821	for (j= tab; j != tab + pos->n_sj_tables; j++)
4822	{
4823	tables_in_range \|= j->table->map;
4824	if (!j->emb_sj_nest)
4825	{
4826	/*
4827	Got a table that's not within any semi-join nest. This is a case
4828	like this:
4829
4830	SELECT FROM ot1, nt1 WHERE ot1.col IN (SELECT expr FROM it1, it2)*
4831
4832	with a join order of
4833
4834	+----- FirstMatch range ----+
4835	\| \|
4836	ot1 it1 nt1 nt2 it2 it3 ...
4837	\| ^
4838	\| +-------- 'j' points here
4839	+------------- SJ_OPT_FIRST_MATCH was set for this table as
4840	it's the first one that produces duplicates
4841
4842	*/
4843	DBUG_ASSERT(j != tab); / table ntX must have an itX before it /
4844
4845	/*
4846	If the table right before us is an inner table (like it1 in the
4847	picture), it should be set to jump back to previous outer-table
4848	*/
4849	if (j[-`1`].emb_sj_nest)
4850	j[-`1`].do_firstmatch= jump_to;
4851
4852	jump_to= j; / Jump back to us /
4853	complex_range= TRUE;
4854	}
4855	else
4856	{
4857	j->first_sj_inner_tab= tab;
4858	j->last_sj_inner_tab= tab + pos->n_sj_tables - `1`;
4859	}
4860	}
4861	j[-`1`].do_firstmatch= jump_to;
4862	i+= pos->n_sj_tables;
4863	pos+= pos->n_sj_tables;
4864
4865	if (complex_range)
4866	join->complex_firstmatch_tables\|= tables_in_range;
4867	break;
4868	}
4869	case SJ_OPT_NONE:
4870	i++;
4871	pos++;
4872	break;
4873	}
4874	}
4875	DBUG_RETURN(FALSE);
4876	}
4877
4878
4879	/*
4880	Destroy all temporary tables created by NL-semijoin runtime
4881	*/
4882
4883	void destroy_sj_tmp_tables(JOIN *join)
4884	{
4885	List_iterator<TABLE> it(join->sj_tmp_tables);
4886	TABLE *table;
4887	while ((table= it ++))
4888	{
4889	/*
4890	SJ-Materialization tables are initialized for either sequential reading
4891	or index lookup, DuplicateWeedout tables are not initialized for read
4892	(we only write to them), so need to call ha_index_or_rnd_end.
4893	*/
4894	table->file->ha_index_or_rnd_end();
4895	free_tmp_table(join->thd, table);
4896	}
4897	join->sj_tmp_tables.empty();
4898	join->sjm_info_list.empty();
4899	}
4900
4901
4902	/*
4903	Remove all records from all temp tables used by NL-semijoin runtime
4904
4905	SYNOPSIS
4906	clear_sj_tmp_tables()
4907	join The join to remove tables for
4908
4909	DESCRIPTION
4910	Remove all records from all temp tables used by NL-semijoin runtime. This
4911	must be done before every join re-execution.
4912	*/
4913
4914	int clear_sj_tmp_tables(JOIN *join)
4915	{
4916	int res;
4917	List_iterator<TABLE> it(join->sj_tmp_tables);
4918	TABLE *table;
4919	while ((table= it ++))
4920	{
4921	if ((res= table->file->ha_delete_all_rows()))
4922	return res; / purecov: inspected /
4923	}
4924
4925	SJ_MATERIALIZATION_INFO *sjm;
4926	List_iterator<SJ_MATERIALIZATION_INFO> it2(join->sjm_info_list);
4927	while ((sjm= it2 ++))
4928	{
4929	sjm->materialized= FALSE;
4930	}
4931	return `0`;
4932	}
4933
4934
4935	/*
4936	Check if the table's rowid is included in the temptable
4937
4938	SYNOPSIS
4939	sj_table_is_included()
4940	join The join
4941	join_tab The table to be checked
4942
4943	DESCRIPTION
4944	SemiJoinDuplicateElimination: check the table's rowid should be included
4945	in the temptable. This is so if
4946
4947	1. The table is not embedded within some semi-join nest
4948	2. The has been pulled out of a semi-join nest, or
4949
4950	3. The table is functionally dependent on some previous table
4951
4952	[4. This is also true for constant tables that can't be
4953	NULL-complemented but this function is not called for such tables]
4954
4955	RETURN
4956	TRUE - Include table's rowid
4957	FALSE - Don't
4958	*/
4959
4960	static bool sj_table_is_included(JOIN join, JOIN_TAB join_tab)
4961	{
4962	if (join_tab->emb_sj_nest)
4963	return FALSE;
4964
4965	/ Check if this table is functionally dependent on the tables that*
4966	are within the same outer join nest
4967	*/
4968	TABLE_LIST *embedding= join_tab->table->pos_in_table_list->embedding;
4969	if (join_tab->type == JT_EQ_REF)
4970	{
4971	table_map depends_on= `0`;
4972	uint idx;
4973
4974	for (uint kp= `0`; kp < join_tab->ref.key_parts; kp++)
4975	depends_on \|= join_tab->ref.items[kp]->used_tables();
4976
4977	Table_map_iterator it(depends_on & ~PSEUDO_TABLE_BITS);
4978	while ((idx= it.next_bit())!=Table_map_iterator::BITMAP_END)
4979	{
4980	JOIN_TAB *ref_tab= join->map2table[idx];
4981	if (embedding != ref_tab->table->pos_in_table_list->embedding)
4982	return TRUE;
4983	}
4984	/ Ok, functionally dependent /
4985	return FALSE;
4986	}
4987	/ Not functionally dependent => need to include/
4988	return TRUE;
4989	}
4990
4991
4992	/*
4993	Index lookup-based subquery: save some flags for EXPLAIN output
4994
4995	SYNOPSIS
4996	save_index_subquery_explain_info()
4997	join_tab Subquery's join tab (there is only one as index lookup is
4998	only used for subqueries that are single-table SELECTs)
4999	where Subquery's WHERE clause
5000
5001	DESCRIPTION
5002	For index lookup-based subquery (i.e. one executed with
5003	subselect_uniquesubquery_engine or subselect_indexsubquery_engine),
5004	check its EXPLAIN output row should contain
5005	"Using index" (TAB_INFO_FULL_SCAN_ON_NULL)
5006	"Using Where" (TAB_INFO_USING_WHERE)
5007	"Full scan on NULL key" (TAB_INFO_FULL_SCAN_ON_NULL)
5008	and set appropriate flags in join_tab->packed_info.
5009	*/
5010
5011	static void save_index_subquery_explain_info(JOIN_TAB join_tab, Item where)
5012	{
5013	join_tab->packed_info= TAB_INFO_HAVE_VALUE;
5014	if (join_tab->table->covering_keys.is_set(join_tab->ref.key))
5015	join_tab->packed_info \|= TAB_INFO_USING_INDEX;
5016	if (where)
5017	join_tab->packed_info \|= TAB_INFO_USING_WHERE;
5018	for (uint i = `0`; i < join_tab->ref.key_parts; i++)
5019	{
5020	if (join_tab->ref.cond_guards[i])
5021	{
5022	join_tab->packed_info \|= TAB_INFO_FULL_SCAN_ON_NULL;
5023	break;
5024	}
5025	}
5026	}
5027
5028
5029	/*
5030	Check if the join can be rewritten to [unique_]indexsubquery_engine
5031
5032	DESCRIPTION
5033	Check if the join can be changed into [unique_]indexsubquery_engine.
5034
5035	The check is done after join optimization, the idea is that if the join
5036	has only one table and uses a [eq_]ref access generated from subselect's
5037	IN-equality then we replace it with a subselect_indexsubquery_engine or a
5038	subselect_uniquesubquery_engine.
5039
5040	RETURN
5041	0 - Ok, rewrite done (stop join optimization and return)
5042	1 - Fatal error (stop join optimization and return)
5043	-1 - No rewrite performed, continue with join optimization
5044	*/
5045
5046	int rewrite_to_index_subquery_engine(JOIN *join)
5047	{
5048	THD *thd= join->thd;
5049	JOIN_TAB* join_tab=join->join_tab;
5050	SELECT_LEX_UNIT *unit= join->unit;
5051	DBUG_ENTER("rewrite_to_index_subquery_engine");
5052
5053	/*
5054	is this simple IN subquery?
5055	*/
5056	/ TODO: In order to use these more efficient subquery engines in more cases,*
5057	the following problems need to be solved:
5058	- the code that removes GROUP BY (group_list), also adds an ORDER BY
5059	(order), thus GROUP BY queries (almost?) never pass through this branch.
5060	Solution: remove the test below '!join->order', because we remove the
5061	ORDER clase for subqueries anyway.
5062	- in order to set a more efficient engine, the optimizer needs to both
5063	decide to remove GROUP BY, and* select one of the JT_[EQ_]REF[_OR_NULL]*
5064	access methods, and* loose scan should be more expensive or*
5065	inapliccable. When is that possible?
5066	- Consider expanding the applicability of this rewrite for loose scan
5067	for group by queries.
5068	*/
5069	if (!join->group_list && !join->order &&
5070	join->unit->item &&
5071	join->unit->item->substype() == Item_subselect::IN_SUBS &&
5072	join->table_count == `1` && join->conds &&
5073	!join->unit->is_unit_op())
5074	{
5075	if (!join->having)
5076	{
5077	Item *where= join->conds;
5078	if (join_tab[`0`].type == JT_EQ_REF &&
5079	join_tab[`0`].ref.items[`0`]->name.str == in_left_expr_name.str)
5080	{
5081	remove_subq_pushed_predicates(join, &where);
5082	save_index_subquery_explain_info(join_tab, where);
5083	join_tab[`0`].type= JT_UNIQUE_SUBQUERY;
5084	join->error= `0`;
5085	DBUG_RETURN(unit->item->
5086	change_engine(new
5087	subselect_uniquesubquery_engine (thd,
5088	join_tab,
5089	unit->item,
5090	where)));
5091	}
5092	else if (join_tab[`0`].type == JT_REF &&
5093	join_tab[`0`].ref.items[`0`]->name.str == in_left_expr_name.str)
5094	{
5095	remove_subq_pushed_predicates(join, &where);
5096	save_index_subquery_explain_info(join_tab, where);
5097	join_tab[`0`].type= JT_INDEX_SUBQUERY;
5098	join->error= `0`;
5099	DBUG_RETURN(unit->item->
5100	change_engine(new
5101	subselect_indexsubquery_engine (thd,
5102	join_tab,
5103	unit->item,
5104	where,
5105	NULL,
5106	`0`)));
5107	}
5108	} else if (join_tab[`0`].type == JT_REF_OR_NULL &&
5109	join_tab[`0`].ref.items[`0`]->name.str == in_left_expr_name.str &&
5110	join->having->name.str == in_having_cond.str)
5111	{
5112	join_tab[`0`].type= JT_INDEX_SUBQUERY;
5113	join->error= `0`;
5114	join->conds= remove_additional_cond(join->conds);
5115	save_index_subquery_explain_info(join_tab, join->conds);
5116	DBUG_RETURN(unit->item->
5117	change_engine(new subselect_indexsubquery_engine (thd,
5118	join_tab,
5119	unit->item,
5120	join->conds,
5121	join->having,
5122	`1`)));
5123	}
5124	}
5125
5126	DBUG_RETURN(-`1`); / Haven't done the rewrite /
5127	}
5128
5129
5130	/**
5131	Remove additional condition inserted by IN/ALL/ANY transformation.
5132
5133	@param conds condition for processing
5134
5135	@return
5136	new conditions
5137	*/
5138
5139	static Item remove_additional_cond(Item conds)
5140	{
5141	if (conds->name.str == in_additional_cond.str)
5142	return `0`;
5143	if (conds->type() == Item::COND_ITEM)
5144	{
5145	Item_cond cnd= (Item_cond) conds;
5146	List_iterator<Item> li(*(cnd->argument_list()));
5147	Item *item;
5148	while ((item= li ++))
5149	{
5150	if (item->name.str == in_additional_cond.str)
5151	{
5152	li.remove();
5153	if (cnd->argument_list()->elements == `1`)
5154	return cnd->argument_list()->head();
5155	return conds;
5156	}
5157	}
5158	}
5159	return conds;
5160	}
5161
5162
5163	/*
5164	Remove the predicates pushed down into the subquery
5165
5166	SYNOPSIS
5167	remove_subq_pushed_predicates()
5168	where IN Must be NULL
5169	OUT The remaining WHERE condition, or NULL
5170
5171	DESCRIPTION
5172	Given that this join will be executed using (unique\|index)_subquery,
5173	without "checking NULL", remove the predicates that were pushed down
5174	into the subquery.
5175
5176	If the subquery compares scalar values, we can remove the condition that
5177	was wrapped into trig_cond (it will be checked when needed by the subquery
5178	engine)
5179
5180	If the subquery compares row values, we need to keep the wrapped
5181	equalities in the WHERE clause: when the left (outer) tuple has both NULL
5182	and non-NULL values, we'll do a full table scan and will rely on the
5183	equalities corresponding to non-NULL parts of left tuple to filter out
5184	non-matching records.
5185
5186	TODO: We can remove the equalities that will be guaranteed to be true by the
5187	fact that subquery engine will be using index lookup. This must be done only
5188	for cases where there are no conversion errors of significance, e.g. 257
5189	that is searched in a byte. But this requires homogenization of the return
5190	codes of all Field::store() methods.*
5191	*/
5192
5193	static void remove_subq_pushed_predicates(JOIN join, Item *where)
5194	{
5195	if (join->conds->type() == Item::FUNC_ITEM &&
5196	((Item_func *)join->conds)->functype() == Item_func::EQ_FUNC &&
5197	((Item_func *)join->conds)->arguments()[`0`]->type() == Item::REF_ITEM &&
5198	((Item_func *)join->conds)->arguments()[`1`]->type() == Item::FIELD_ITEM &&
5199	test_if_ref (join->conds,
5200	(Item_field )((Item_func )join->conds)->arguments()[`1`],
5201	((Item_func *)join->conds)->arguments()[`0`]))
5202	{
5203	*where= `0`;
5204	return;
5205	}
5206	}
5207
5208
5209
5210
5211	/**
5212	Optimize all subqueries of a query that were not flattened into a semijoin.
5213
5214	@details
5215	Optimize all immediate children subqueries of a query.
5216
5217	This phase must be called after substitute_for_best_equal_field() because
5218	that function may replace items with other items from a multiple equality,
5219	and we need to reference the correct items in the index access method of the
5220	IN predicate.
5221
5222	@return Operation status
5223	@retval FALSE success.
5224	@retval TRUE error occurred.
5225	*/
5226
5227	bool JOIN::optimize_unflattened_subqueries()
5228	{
5229	return select_lex->optimize_unflattened_subqueries(false);
5230	}
5231
5232	/**
5233	Optimize all constant subqueries of a query that were not flattened into
5234	a semijoin.
5235
5236	@details
5237	Similar to other constant conditions, constant subqueries can be used in
5238	various constant optimizations. Having optimized constant subqueries before
5239	these constant optimizations, makes it possible to estimate if a subquery
5240	is "cheap" enough to be executed during the optimization phase.
5241
5242	Constant subqueries can be optimized and evaluated independent of the outer
5243	query, therefore if const_only = true, this method can be called early in
5244	the optimization phase of the outer query.
5245
5246	@return Operation status
5247	@retval FALSE success.
5248	@retval TRUE error occurred.
5249	*/
5250
5251	bool JOIN::optimize_constant_subqueries()
5252	{
5253	ulonglong save_options= select_lex->options;
5254	bool res;
5255	/*
5256	Constant subqueries may be executed during the optimization phase.
5257	In EXPLAIN mode the optimizer doesn't initialize many of the data structures
5258	needed for execution. In order to make it possible to execute subqueries
5259	during optimization, constant subqueries must be optimized for execution,
5260	not for EXPLAIN.
5261	*/
5262	select_lex->options&= ~SELECT_DESCRIBE;
5263	res= select_lex->optimize_unflattened_subqueries(true);
5264	select_lex->options= save_options;
5265	return res;
5266	}
5267
5268
5269	/*
5270	Join tab execution startup function.
5271
5272	SYNOPSIS
5273	join_tab_execution_startup()
5274	tab Join tab to perform startup actions for
5275
5276	DESCRIPTION
5277	Join tab execution startup function. This is different from
5278	tab->read_first_record in the regard that this has actions that are to be
5279	done once per join execution.
5280
5281	Currently there are only two possible startup functions, so we have them
5282	both here inside if (...) branches. In future we could switch to function
5283	pointers.
5284
5285	TODO: consider moving this together with JOIN_TAB::preread_init
5286
5287	RETURN
5288	NESTED_LOOP_OK - OK
5289	NESTED_LOOP_ERROR\| NESTED_LOOP_KILLED - Error, abort the join execution
5290	*/
5291
5292	enum_nested_loop_state join_tab_execution_startup(JOIN_TAB *tab)
5293	{
5294	Item_in_subselect *in_subs;
5295	DBUG_ENTER("join_tab_execution_startup");
5296
5297	if (tab->table->pos_in_table_list &&
5298	(in_subs= tab->table->pos_in_table_list->jtbm_subselect))
5299	{
5300	/ It's a non-merged SJM nest /
5301	DBUG_ASSERT(in_subs->engine->engine_type() ==
5302	subselect_engine::HASH_SJ_ENGINE);
5303	subselect_hash_sj_engine *hash_sj_engine=
5304	((subselect_hash_sj_engine*)in_subs->engine);
5305	if (!hash_sj_engine->is_materialized)
5306	{
5307	hash_sj_engine->materialize_join->exec();
5308	hash_sj_engine->is_materialized= TRUE;
5309
5310	if (unlikely(hash_sj_engine->materialize_join->error) \|\|
5311	unlikely(tab->join->thd->is_fatal_error))
5312	DBUG_RETURN(NESTED_LOOP_ERROR);
5313	}
5314	}
5315	else if (tab->bush_children)
5316	{
5317	/ It's a merged SJM nest /
5318	enum_nested_loop_state rc;
5319	SJ_MATERIALIZATION_INFO *sjm= tab->bush_children->start->emb_sj_nest->sj_mat_info;
5320
5321	if (!sjm->materialized)
5322	{
5323	JOIN *join= tab->join;
5324	JOIN_TAB *join_tab= tab->bush_children->start;
5325	JOIN_TAB *save_return_tab= join->return_tab;
5326	/*
5327	Now run the join for the inner tables. The first call is to run the
5328	join, the second one is to signal EOF (this is essential for some
5329	join strategies, e.g. it will make join buffering flush the records)
5330	*/
5331	if ((rc= sub_select(join, join_tab, FALSE/* no EOF */)) < `0` \|\|
5332	(rc= sub_select(join, join_tab, TRUE/* now EOF */)) < `0`)
5333	{
5334	join->return_tab= save_return_tab;
5335	DBUG_RETURN(rc); / it's NESTED_LOOP_(ERROR\|KILLED)/
5336	}
5337	join->return_tab= save_return_tab;
5338	sjm->materialized= TRUE;
5339	}
5340	}
5341
5342	DBUG_RETURN(NESTED_LOOP_OK);
5343	}
5344
5345
5346	/*
5347	Create a dummy temporary table, useful only for the sake of having a
5348	TABLE object with map,tablenr and maybe_null properties.*
5349
5350	This is used by non-mergeable semi-join materilization code to handle
5351	degenerate cases where materialized subquery produced "Impossible WHERE"
5352	and thus wasn't materialized.
5353	*/
5354
5355	TABLE create_dummy_tmp_table(THD thd)
5356	{
5357	DBUG_ENTER("create_dummy_tmp_table");
5358	TABLE *table;
5359	TMP_TABLE_PARAM sjm_table_param;
5360	sjm_table_param.init();
5361	sjm_table_param.field_count= `1`;
5362	List<Item> sjm_table_cols;
5363	const LEX_CSTRING dummy_name= { STRING_WITH_LEN("dummy") };
5364	Item column_item= new* (thd->mem_root) Item_int (thd, `1`);
5365	if (!column_item)
5366	DBUG_RETURN(NULL);
5367
5368	sjm_table_cols.push_back(column_item, thd->mem_root);
5369	if (!(table= create_tmp_table(thd, &sjm_table_param,
5370	sjm_table_cols, (ORDER*) `0`,
5371	TRUE / distinct /,
5372	`1`, /save_sum_fields/
5373	thd->variables.option_bits \|
5374	TMP_TABLE_ALL_COLUMNS,
5375	HA_POS_ERROR /rows_limit /,
5376	&dummy_name, TRUE / Do not open /)))
5377	{
5378	DBUG_RETURN(NULL);
5379	}
5380	DBUG_RETURN(table);
5381	}
5382
5383
5384	/*
5385	A class that is used to catch one single tuple that is sent to the join
5386	output, and save it in Item_cache element(s).
5387
5388	It is very similar to select_singlerow_subselect but doesn't require a
5389	Item_singlerow_subselect item.
5390	*/
5391
5392	class select_value_catcher :public select_subselect
5393	{
5394	public:
5395	select_value_catcher(THD thd_arg, Item_subselect item_arg):
5396	select_subselect (thd_arg, item_arg)
5397	{}
5398	int send_data(List<Item> &items);
5399	int setup(List<Item> *items);
5400	bool assigned; / TRUE <=> we've caught a value /
5401	uint n_elements; / How many elements we get /
5402	Item_cache *row; /* Array of cache elements /
5403	};
5404
5405
5406	int select_value_catcher::setup(List<Item> *items)
5407	{
5408	assigned= FALSE;
5409	n_elements= items->elements;
5410
5411	if (!(row= (Item_cache) thd->alloc(sizeof*(Item_cache) * n_elements)))
5412	return TRUE;
5413
5414	Item *sel_item;
5415	List_iterator<Item> li(*items);
5416	for (uint i= `0`; (sel_item= li ++); i++)
5417	{
5418	if (!(row[i]= sel_item->get_cache(thd)))
5419	return TRUE;
5420	row[i]->setup(thd, sel_item);
5421	}
5422	return FALSE;
5423	}
5424
5425
5426	int select_value_catcher::send_data(List<Item> &items)
5427	{
5428	DBUG_ENTER("select_value_catcher::send_data");
5429	DBUG_ASSERT(!assigned);
5430	DBUG_ASSERT(items.elements == n_elements);
5431
5432	if (unit->offset_limit_cnt)
5433	{ // Using limit offset,count
5434	unit->offset_limit_cnt--;
5435	DBUG_RETURN(`0`);
5436	}
5437
5438	Item *val_item;
5439	List_iterator_fast<Item> li(items);
5440	for (uint i= `0`; (val_item= li ++); i++)
5441	{
5442	row[i]->store(val_item);
5443	row[i]->cache_value();
5444	}
5445	assigned= TRUE;
5446	DBUG_RETURN(`0`);
5447	}
5448
5449
5450	/*
5451	Setup JTBM join tabs for execution
5452	*/
5453
5454	bool setup_jtbm_semi_joins(JOIN join, List<TABLE_LIST> join_list,
5455	Item **join_where)
5456	{
5457	TABLE_LIST *table;
5458	NESTED_JOIN *nested_join;
5459	List_iterator<TABLE_LIST> li(*join_list);
5460	THD *thd= join->thd;
5461	DBUG_ENTER("setup_jtbm_semi_joins");
5462
5463	while ((table= li ++))
5464	{
5465	Item_in_subselect *item;
5466
5467	if ((item= table->jtbm_subselect))
5468	{
5469	Item_in_subselect *subq_pred= item;
5470	double rows;
5471	double read_time;
5472
5473	/*
5474	Perform optimization of the subquery, so that we know estmated
5475	- cost of materialization process
5476	- how many records will be in the materialized temp.table
5477	*/
5478	if (subq_pred->optimize(&rows, &read_time))
5479	DBUG_RETURN(TRUE);
5480
5481	subq_pred->jtbm_read_time= read_time;
5482	subq_pred->jtbm_record_count=rows;
5483	JOIN *subq_join= subq_pred->unit->first_select()->join;
5484
5485	if (!subq_join->tables_list \|\| !subq_join->table_count)
5486	{
5487	/*
5488	A special case; subquery's join is degenerate, and it either produces
5489	0 or 1 record. Examples of both cases:
5490
5491	select from ot where col in (select ... from it where 2>3)*
5492	select from ot where col in (select MY_MIN(it.key) from it)*
5493
5494	in this case, the subquery predicate has not been setup for
5495	materialization. In particular, there is no materialized temp.table.
5496	We'll now need to
5497	1. Check whether 1 or 0 records are produced, setup this as a
5498	constant join tab.
5499	2. Create a dummy temporary table, because all of the join
5500	optimization code relies on TABLE object being present (here we
5501	follow a bad tradition started by derived tables)
5502	*/
5503	DBUG_ASSERT(subq_pred->engine->engine_type() ==
5504	subselect_engine::SINGLE_SELECT_ENGINE);
5505	subselect_single_select_engine *engine=
5506	(subselect_single_select_engine*)subq_pred->engine;
5507	select_value_catcher *new_sink;
5508	if (!(new_sink=
5509	new (thd->mem_root) select_value_catcher (thd, subq_pred)))
5510	DBUG_RETURN(TRUE);
5511	if (new_sink->setup(&engine->select_lex->join->fields_list) \|\|
5512	engine->select_lex->join->change_result(new_sink, NULL) \|\|
5513	engine->exec())
5514	{
5515	DBUG_RETURN(TRUE);
5516	}
5517	subq_pred->is_jtbm_const_tab= TRUE;
5518
5519	if (new_sink->assigned)
5520	{
5521	subq_pred->jtbm_const_row_found= TRUE;
5522	/*
5523	Subselect produced one row, which is saved in new_sink->row.
5524	Inject "left_expr[i] == row[i] equalities into parent's WHERE.
5525	*/
5526	Item *eq_cond;
5527	for (uint i= `0`; i < subq_pred->left_expr->cols(); i++)
5528	{
5529	eq_cond= new (thd->mem_root)
5530	Item_func_eq (thd, subq_pred->left_expr->element_index(i),
5531	new_sink->row[i]);
5532	if (!eq_cond)
5533	DBUG_RETURN(`1`);
5534
5535	if (!((join_where)= and_items(thd, join_where, eq_cond)) \|\|
5536	(*join_where)->fix_fields(thd, join_where))
5537	DBUG_RETURN(`1`);
5538	}
5539	}
5540	else
5541	{
5542	/ Subselect produced no rows. Just set the flag, /
5543	subq_pred->jtbm_const_row_found= FALSE;
5544	}
5545
5546	/ Set up a dummy TABLE, optimizer code needs JOIN_TABs to have TABLE /*
5547	TABLE *dummy_table;
5548	if (!(dummy_table= create_dummy_tmp_table(thd)))
5549	DBUG_RETURN(`1`);
5550	table->table= dummy_table;
5551	table->table->pos_in_table_list= table;
5552	/*
5553	Note: the table created above may be freed by:
5554	1. JOIN_TAB::cleanup(), when the parent join is a regular join.
5555	2. cleanup_empty_jtbm_semi_joins(), when the parent join is a
5556	degenerate join (e.g. one with "Impossible where").
5557	*/
5558	setup_table_map(table->table, table, table->jtbm_table_no);
5559	}
5560	else
5561	{
5562	DBUG_ASSERT(subq_pred->test_set_strategy(SUBS_MATERIALIZATION));
5563	subq_pred->is_jtbm_const_tab= FALSE;
5564	subselect_hash_sj_engine *hash_sj_engine=
5565	((subselect_hash_sj_engine*)item->engine);
5566
5567	table->table= hash_sj_engine->tmp_table;
5568	table->table->pos_in_table_list= table;
5569
5570	setup_table_map(table->table, table, table->jtbm_table_no);
5571
5572	Item *sj_conds= hash_sj_engine->semi_join_conds;
5573
5574	(join_where)= and_items(thd, join_where, sj_conds);
5575	if (!(*join_where)->fixed)
5576	(*join_where)->fix_fields(thd, join_where);
5577	}
5578	table->table->maybe_null= MY_TEST(join->mixed_implicit_grouping);
5579	}
5580
5581	if ((nested_join= table->nested_join))
5582	{
5583	if (setup_jtbm_semi_joins(join, &nested_join->join_list, join_where))
5584	DBUG_RETURN(TRUE);
5585	}
5586	}
5587	DBUG_RETURN(FALSE);
5588	}
5589
5590
5591	/*
5592	Cleanup non-merged semi-joins (JBMs) that have empty.
5593
5594	This function is to cleanups for a special case:
5595	Consider a query like
5596
5597	select from t1 where 1=2 AND t1.col IN (select max(..) ... having 1=2)*
5598
5599	For this query, optimization of subquery will short-circuit, and
5600	setup_jtbm_semi_joins() will call create_dummy_tmp_table() so that we have
5601	empty, constant temp.table to stand in as materialized temp. table.
5602
5603	Now, suppose that the upper join is also found to be degenerate. In that
5604	case, no JOIN_TAB array will be produced, and hence, JOIN::cleanup() will
5605	have a problem with cleaning up empty JTBMs (non-empty ones are cleaned up
5606	through Item::cleanup() calls).
5607	*/
5608
5609	void cleanup_empty_jtbm_semi_joins(JOIN join, List<TABLE_LIST> join_list)
5610	{
5611	List_iterator<TABLE_LIST> li(*join_list);
5612	TABLE_LIST *table;
5613	while ((table= li ++))
5614	{
5615	if ((table->jtbm_subselect && table->jtbm_subselect->is_jtbm_const_tab))
5616	{
5617	if (table->table)
5618	{
5619	free_tmp_table(join->thd, table->table);
5620	table->table= NULL;
5621	}
5622	}
5623	else if (table->nested_join && table->sj_subq_pred)
5624	{
5625	cleanup_empty_jtbm_semi_joins(join, &table->nested_join->join_list);
5626	}
5627	}
5628	}
5629
5630
5631	/**
5632	Choose an optimal strategy to execute an IN/ALL/ANY subquery predicate
5633	based on cost.
5634
5635	@param join_tables the set of tables joined in the subquery
5636
5637	@notes
5638	The method chooses between the materialization and IN=>EXISTS rewrite
5639	strategies for the execution of a non-flattened subquery IN predicate.
5640	The cost-based decision is made as follows:
5641
5642	1. compute materialize_strategy_cost based on the unmodified subquery
5643	2. reoptimize the subquery taking into account the IN-EXISTS predicates
5644	3. compute in_exists_strategy_cost based on the reoptimized plan
5645	4. compare and set the cheaper strategy
5646	if (materialize_strategy_cost >= in_exists_strategy_cost)
5647	in_strategy = MATERIALIZATION
5648	else
5649	in_strategy = IN_TO_EXISTS
5650	5. if in_strategy = MATERIALIZATION and it is not possible to initialize it
5651	revert to IN_TO_EXISTS
5652	6. if (in_strategy == MATERIALIZATION)
5653	revert the subquery plan to the original one before reoptimizing
5654	else
5655	inject the IN=>EXISTS predicates into the new EXISTS subquery plan
5656
5657	The implementation itself is a bit more complicated because it takes into
5658	account two more factors:
5659	- whether the user allowed both strategies through an optimizer_switch, and
5660	- if materialization was the cheaper strategy, whether it can be executed
5661	or not.
5662
5663	@retval FALSE success.
5664	@retval TRUE error occurred.
5665	*/
5666
5667	bool JOIN::choose_subquery_plan(table_map join_tables)
5668	{
5669	enum_reopt_result reopt_result= REOPT_NONE;
5670	Item_in_subselect *in_subs;
5671
5672	/*
5673	IN/ALL/ANY optimizations are not applicable for so called fake select
5674	(this select exists only to filter results of union if it is needed).
5675	*/
5676	if (select_lex == select_lex->master_unit()->fake_select_lex)
5677	return `0`;
5678
5679	if (is_in_subquery())
5680	{
5681	in_subs= (Item_in_subselect*) unit->item;
5682	if (in_subs->create_in_to_exists_cond(this))
5683	return true;
5684	}
5685	else
5686	return false;
5687
5688	/ A strategy must be chosen earlier. /
5689	DBUG_ASSERT(in_subs->has_strategy());
5690	DBUG_ASSERT(in_to_exists_where \|\| in_to_exists_having);
5691	DBUG_ASSERT(!in_to_exists_where \|\| in_to_exists_where->fixed);
5692	DBUG_ASSERT(!in_to_exists_having \|\| in_to_exists_having->fixed);
5693
5694	/ The original QEP of the subquery. /
5695	Join_plan_state save_qep(table_count);
5696
5697	/*
5698	Compute and compare the costs of materialization and in-exists if both
5699	strategies are possible and allowed by the user (checked during the prepare
5700	phase.
5701	*/
5702	if (in_subs->test_strategy(SUBS_MATERIALIZATION) &&
5703	in_subs->test_strategy(SUBS_IN_TO_EXISTS))
5704	{
5705	JOIN *outer_join;
5706	JOIN inner_join= this*;
5707	/ Number of unique value combinations filtered by the IN predicate. /
5708	double outer_lookup_keys;
5709	/ Cost and row count of the unmodified subquery. /
5710	double inner_read_time_1, inner_record_count_1;
5711	/ Cost of the subquery with injected IN-EXISTS predicates. /
5712	double inner_read_time_2;
5713	/ The cost to compute IN via materialization. /
5714	double materialize_strategy_cost;
5715	/ The cost of the IN->EXISTS strategy. /
5716	double in_exists_strategy_cost;
5717	double dummy;
5718
5719	/*
5720	A. Estimate the number of rows of the outer table that will be filtered
5721	by the IN predicate.
5722	*/
5723	outer_join= unit->outer_select() ? unit->outer_select()->join : NULL;
5724	/*
5725	Get the cost of the outer join if:
5726	(1) It has at least one table, and
5727	(2) It has been already optimized (if there is no join_tab, then the
5728	outer join has not been optimized yet).
5729	*/
5730	if (outer_join && outer_join->table_count > `0` && // (1)
5731	outer_join->join_tab && // (2)
5732	!in_subs->const_item())
5733	{
5734	/*
5735	TODO:
5736	Currently outer_lookup_keys is computed as the number of rows in
5737	the partial join including the JOIN_TAB where the IN predicate is
5738	pushed to. In the general case this is a gross overestimate because
5739	due to caching we are interested only in the number of unique keys.
5740	The search key may be formed by columns from much fewer than all
5741	tables in the partial join. Example:
5742	select from t1, t2 where t1.c1 = t2.key AND t2.c2 IN (select ...);*
5743	If the join order: t1, t2, the number of unique lookup keys is ~ to
5744	the number of unique values t2.c2 in the partial join t1 join t2.
5745	*/
5746	outer_join->get_partial_cost_and_fanout(in_subs->get_join_tab_idx(),
5747	table_map(-`1`),
5748	&dummy,
5749	&outer_lookup_keys);
5750	}
5751	else
5752	{
5753	/*
5754	TODO: outer_join can be NULL for DELETE statements.
5755	How to compute its cost?
5756	*/
5757	outer_lookup_keys= `1`;
5758	}
5759
5760	/*
5761	B. Estimate the cost and number of records of the subquery both
5762	unmodified, and with injected IN->EXISTS predicates.
5763	*/
5764	inner_read_time_1= inner_join->best_read;
5765	inner_record_count_1= inner_join->join_record_count;
5766
5767	if (in_to_exists_where && const_tables != table_count)
5768	{
5769	/*
5770	Re-optimize and cost the subquery taking into account the IN-EXISTS
5771	conditions.
5772	*/
5773	reopt_result= reoptimize(in_to_exists_where, join_tables, &save_qep);
5774	if (reopt_result == REOPT_ERROR)
5775	return TRUE;
5776
5777	/ Get the cost of the modified IN-EXISTS plan. /
5778	inner_read_time_2= inner_join->best_read;
5779
5780	}
5781	else
5782	{
5783	/ Reoptimization would not produce any better plan. /
5784	inner_read_time_2= inner_read_time_1;
5785	}
5786
5787	/*
5788	C. Compute execution costs.
5789	*/
5790	/ C.1 Compute the cost of the materialization strategy. /
5791	//uint rowlen= get_tmp_table_rec_length(unit->first_select()->item_list);
5792	uint rowlen= get_tmp_table_rec_length(ref_ptrs,
5793	select_lex->item_list.elements);
5794	/ The cost of writing one row into the temporary table. /
5795	double write_cost= get_tmp_table_write_cost(thd, inner_record_count_1,
5796	rowlen);
5797	/ The cost of a lookup into the unique index of the materialized table. /
5798	double lookup_cost= get_tmp_table_lookup_cost(thd, inner_record_count_1,
5799	rowlen);
5800	/*
5801	The cost of executing the subquery and storing its result in an indexed
5802	temporary table.
5803	*/
5804	double materialization_cost= inner_read_time_1 +
5805	write_cost * inner_record_count_1;
5806
5807	materialize_strategy_cost= materialization_cost +
5808	outer_lookup_keys * lookup_cost;
5809
5810	/ C.2 Compute the cost of the IN=>EXISTS strategy. /
5811	in_exists_strategy_cost= outer_lookup_keys * inner_read_time_2;
5812
5813	/ C.3 Compare the costs and choose the cheaper strategy. /
5814	if (materialize_strategy_cost >= in_exists_strategy_cost)
5815	in_subs->set_strategy(SUBS_IN_TO_EXISTS);
5816	else
5817	in_subs->set_strategy(SUBS_MATERIALIZATION);
5818
5819	DBUG_PRINT("info",
5820	("mat_strategy_cost: %.2f, mat_cost: %.2f, write_cost: %.2f, lookup_cost: %.2f",
5821	materialize_strategy_cost, materialization_cost, write_cost, lookup_cost));
5822	DBUG_PRINT("info",
5823	("inx_strategy_cost: %.2f, inner_read_time_2: %.2f",
5824	in_exists_strategy_cost, inner_read_time_2));
5825	DBUG_PRINT("info",("outer_lookup_keys: %.2f", outer_lookup_keys));
5826	}
5827
5828	/*
5829	If (1) materialization is a possible strategy based on semantic analysis
5830	during the prepare phase, then if
5831	(2) it is more expensive than the IN->EXISTS transformation, and
5832	(3) it is not possible to create usable indexes for the materialization
5833	strategy,
5834	fall back to IN->EXISTS.
5835	otherwise
5836	use materialization.
5837	*/
5838	if (in_subs->test_strategy(SUBS_MATERIALIZATION) &&
5839	in_subs->setup_mat_engine())
5840	{
5841	/*
5842	If materialization was the cheaper or the only user-selected strategy,
5843	but it is not possible to execute it due to limitations in the
5844	implementation, fall back to IN-TO-EXISTS.
5845	*/
5846	in_subs->set_strategy(SUBS_IN_TO_EXISTS);
5847	}
5848
5849	if (in_subs->test_strategy(SUBS_MATERIALIZATION))
5850	{
5851	/ Restore the original query plan used for materialization. /
5852	if (reopt_result == REOPT_NEW_PLAN)
5853	restore_query_plan(&save_qep);
5854
5855	in_subs->unit->uncacheable&= ~UNCACHEABLE_DEPENDENT_INJECTED;
5856	select_lex->uncacheable&= ~UNCACHEABLE_DEPENDENT_INJECTED;
5857
5858	/*
5859	Reset the "LIMIT 1" set in Item_exists_subselect::fix_length_and_dec.
5860	TODO:
5861	Currently we set the subquery LIMIT to infinity, and this is correct
5862	because we forbid at parse time LIMIT inside IN subqueries (see
5863	Item_in_subselect::test_limit). However, once we allow this, here
5864	we should set the correct limit if given in the query.
5865	*/
5866	in_subs->unit->global_parameters()->select_limit= NULL;
5867	in_subs->unit->set_limit(unit->global_parameters());
5868	/*
5869	Set the limit of this JOIN object as well, because normally its being
5870	set in the beginning of JOIN::optimize, which was already done.
5871	*/
5872	select_limit= in_subs->unit->select_limit_cnt;
5873	}
5874	else if (in_subs->test_strategy(SUBS_IN_TO_EXISTS))
5875	{
5876	if (reopt_result == REOPT_NONE && in_to_exists_where &&
5877	const_tables != table_count)
5878	{
5879	/*
5880	The subquery was not reoptimized with the newly injected IN-EXISTS
5881	conditions either because the user allowed only the IN-EXISTS strategy,
5882	or because materialization was not possible based on semantic analysis.
5883	*/
5884	reopt_result= reoptimize(in_to_exists_where, join_tables, NULL);
5885	if (reopt_result == REOPT_ERROR)
5886	return TRUE;
5887	}
5888
5889	if (in_subs->inject_in_to_exists_cond(this))
5890	return TRUE;
5891	/*
5892	If the injected predicate is correlated the IN->EXISTS transformation
5893	make the subquery dependent.
5894	*/
5895	if ((in_to_exists_where &&
5896	in_to_exists_where->used_tables() & OUTER_REF_TABLE_BIT) \|\|
5897	(in_to_exists_having &&
5898	in_to_exists_having->used_tables() & OUTER_REF_TABLE_BIT))
5899	{
5900	in_subs->unit->uncacheable\|= UNCACHEABLE_DEPENDENT_INJECTED;
5901	select_lex->uncacheable\|= UNCACHEABLE_DEPENDENT_INJECTED;
5902	}
5903	select_limit= `1`;
5904	}
5905	else
5906	DBUG_ASSERT(FALSE);
5907
5908	return FALSE;
5909	}
5910
5911
5912	/**
5913	Choose a query plan for a table-less subquery.
5914
5915	@notes
5916
5917	@retval FALSE success.
5918	@retval TRUE error occurred.
5919	*/
5920
5921	bool JOIN::choose_tableless_subquery_plan()
5922	{
5923	DBUG_ASSERT(!tables_list \|\| !table_count);
5924	if (unit->item)
5925	{
5926	DBUG_ASSERT(unit->item->type() == Item::SUBSELECT_ITEM);
5927	Item_subselect *subs_predicate= unit->item;
5928
5929	/*
5930	If the optimizer determined that his query has an empty result,
5931	in most cases the subquery predicate is a known constant value -
5932	either of TRUE, FALSE or NULL. The implementation of
5933	Item_subselect::no_rows_in_result() determines which one.
5934	*/
5935	if (zero_result_cause)
5936	{
5937	if (!implicit_grouping)
5938	{
5939	/*
5940	Both group by queries and non-group by queries without aggregate
5941	functions produce empty subquery result. There is no need to further
5942	rewrite the subquery because it will not be executed at all.
5943	*/
5944	return FALSE;
5945	}
5946
5947	/ @todo*
5948	A further optimization is possible when a non-group query with
5949	MIN/MAX/COUNT is optimized by opt_sum_query. Then, if there are
5950	only MIN/MAX functions over an empty result set, the subquery
5951	result is a NULL value/row, thus the value of subs_predicate is
5952	NULL.
5953	*/
5954	}
5955
5956	/*
5957	For IN subqueries, use IN->EXISTS transfomation, unless the subquery
5958	has been converted to a JTBM semi-join. In that case, just leave
5959	everything as-is, setup_jtbm_semi_joins() has special handling for cases
5960	like this.
5961	*/
5962	if (subs_predicate->is_in_predicate() &&
5963	!(subs_predicate->substype() == Item_subselect::IN_SUBS &&
5964	((Item_in_subselect*)subs_predicate)->is_jtbm_merged))
5965	{
5966	Item_in_subselect *in_subs;
5967	in_subs= (Item_in_subselect*) subs_predicate;
5968	in_subs->set_strategy(SUBS_IN_TO_EXISTS);
5969	if (in_subs->create_in_to_exists_cond(this) \|\|
5970	in_subs->inject_in_to_exists_cond(this))
5971	return TRUE;
5972	tmp_having= having;
5973	}
5974	}
5975	exec_const_cond= conds;
5976	return FALSE;
5977	}
5978

Browse the source code of MariaDB/sql/opt_subselect.cc