1/****************************************************************************
2**
3** Copyright (C) 2016 The Qt Company Ltd.
4** Copyright (C) 2016 Intel Corporation.
5** Contact: https://www.qt.io/licensing/
6**
7** This file is part of the QtCore module of the Qt Toolkit.
8**
9** $QT_BEGIN_LICENSE:LGPL$
10** Commercial License Usage
11** Licensees holding valid commercial Qt licenses may use this file in
12** accordance with the commercial license agreement provided with the
13** Software or, alternatively, in accordance with the terms contained in
14** a written agreement between you and The Qt Company. For licensing terms
15** and conditions see https://www.qt.io/terms-conditions. For further
16** information use the contact form at https://www.qt.io/contact-us.
17**
18** GNU Lesser General Public License Usage
19** Alternatively, this file may be used under the terms of the GNU Lesser
20** General Public License version 3 as published by the Free Software
21** Foundation and appearing in the file LICENSE.LGPL3 included in the
22** packaging of this file. Please review the following information to
23** ensure the GNU Lesser General Public License version 3 requirements
24** will be met: https://www.gnu.org/licenses/lgpl-3.0.html.
25**
26** GNU General Public License Usage
27** Alternatively, this file may be used under the terms of the GNU
28** General Public License version 2.0 or (at your option) the GNU General
29** Public license version 3 or any later version approved by the KDE Free
30** Qt Foundation. The licenses are as published by the Free Software
31** Foundation and appearing in the file LICENSE.GPL2 and LICENSE.GPL3
32** included in the packaging of this file. Please review the following
33** information to ensure the GNU General Public License requirements will
34** be met: https://www.gnu.org/licenses/gpl-2.0.html and
35** https://www.gnu.org/licenses/gpl-3.0.html.
36**
37** $QT_END_LICENSE$
38**
39****************************************************************************/
40
41/*!
42 \class QUrl
43 \inmodule QtCore
44
45 \brief The QUrl class provides a convenient interface for working
46 with URLs.
47
48 \reentrant
49 \ingroup io
50 \ingroup network
51 \ingroup shared
52
53
54 It can parse and construct URLs in both encoded and unencoded
55 form. QUrl also has support for internationalized domain names
56 (IDNs).
57
58 The most common way to use QUrl is to initialize it via the
59 constructor by passing a QString. Otherwise, setUrl() can also
60 be used.
61
62 URLs can be represented in two forms: encoded or unencoded. The
63 unencoded representation is suitable for showing to users, but
64 the encoded representation is typically what you would send to
65 a web server. For example, the unencoded URL
66 "http://bühler.example.com/List of applicants.xml"
67 would be sent to the server as
68 "http://xn--bhler-kva.example.com/List%20of%20applicants.xml".
69
70 A URL can also be constructed piece by piece by calling
71 setScheme(), setUserName(), setPassword(), setHost(), setPort(),
72 setPath(), setQuery() and setFragment(). Some convenience
73 functions are also available: setAuthority() sets the user name,
74 password, host and port. setUserInfo() sets the user name and
75 password at once.
76
77 Call isValid() to check if the URL is valid. This can be done at any point
78 during the constructing of a URL. If isValid() returns \c false, you should
79 clear() the URL before proceeding, or start over by parsing a new URL with
80 setUrl().
81
82 Constructing a query is particularly convenient through the use of the \l
83 QUrlQuery class and its methods QUrlQuery::setQueryItems(),
84 QUrlQuery::addQueryItem() and QUrlQuery::removeQueryItem(). Use
85 QUrlQuery::setQueryDelimiters() to customize the delimiters used for
86 generating the query string.
87
88 For the convenience of generating encoded URL strings or query
89 strings, there are two static functions called
90 fromPercentEncoding() and toPercentEncoding() which deal with
91 percent encoding and decoding of QString objects.
92
93 fromLocalFile() constructs a QUrl by parsing a local
94 file path. toLocalFile() converts a URL to a local file path.
95
96 The human readable representation of the URL is fetched with
97 toString(). This representation is appropriate for displaying a
98 URL to a user in unencoded form. The encoded form however, as
99 returned by toEncoded(), is for internal use, passing to web
100 servers, mail clients and so on. Both forms are technically correct
101 and represent the same URL unambiguously -- in fact, passing either
102 form to QUrl's constructor or to setUrl() will yield the same QUrl
103 object.
104
105 QUrl conforms to the URI specification from
106 \l{RFC 3986} (Uniform Resource Identifier: Generic Syntax), and includes
107 scheme extensions from \l{RFC 1738} (Uniform Resource Locators). Case
108 folding rules in QUrl conform to \l{RFC 3491} (Nameprep: A Stringprep
109 Profile for Internationalized Domain Names (IDN)). It is also compatible with the
110 \l{http://freedesktop.org/wiki/Specifications/file-uri-spec/}{file URI specification}
111 from freedesktop.org, provided that the locale encodes file names using
112 UTF-8 (required by IDN).
113
114 \section2 Relative URLs vs Relative Paths
115
116 Calling isRelative() will return whether or not the URL is relative.
117 A relative URL has no \l {scheme}. For example:
118
119 \snippet code/src_corelib_io_qurl.cpp 8
120
121 Notice that a URL can be absolute while containing a relative path, and
122 vice versa:
123
124 \snippet code/src_corelib_io_qurl.cpp 9
125
126 A relative URL can be resolved by passing it as an argument to resolved(),
127 which returns an absolute URL. isParentOf() is used for determining whether
128 one URL is a parent of another.
129
130 \section2 Error checking
131
132 QUrl is capable of detecting many errors in URLs while parsing it or when
133 components of the URL are set with individual setter methods (like
134 setScheme(), setHost() or setPath()). If the parsing or setter function is
135 successful, any previously recorded error conditions will be discarded.
136
137 By default, QUrl setter methods operate in QUrl::TolerantMode, which means
138 they accept some common mistakes and mis-representation of data. An
139 alternate method of parsing is QUrl::StrictMode, which applies further
140 checks. See QUrl::ParsingMode for a description of the difference of the
141 parsing modes.
142
143 QUrl only checks for conformance with the URL specification. It does not
144 try to verify that high-level protocol URLs are in the format they are
145 expected to be by handlers elsewhere. For example, the following URIs are
146 all considered valid by QUrl, even if they do not make sense when used:
147
148 \list
149 \li "http:/filename.html"
150 \li "mailto://example.com"
151 \endlist
152
153 When the parser encounters an error, it signals the event by making
154 isValid() return false and toString() / toEncoded() return an empty string.
155 If it is necessary to show the user the reason why the URL failed to parse,
156 the error condition can be obtained from QUrl by calling errorString().
157 Note that this message is highly technical and may not make sense to
158 end-users.
159
160 QUrl is capable of recording only one error condition. If more than one
161 error is found, it is undefined which error is reported.
162
163 \section2 Character Conversions
164
165 Follow these rules to avoid erroneous character conversion when
166 dealing with URLs and strings:
167
168 \list
169 \li When creating a QString to contain a URL from a QByteArray or a
170 char*, always use QString::fromUtf8().
171 \endlist
172*/
173
174/*!
175 \enum QUrl::ParsingMode
176
177 The parsing mode controls the way QUrl parses strings.
178
179 \value TolerantMode QUrl will try to correct some common errors in URLs.
180 This mode is useful for parsing URLs coming from sources
181 not known to be strictly standards-conforming.
182
183 \value StrictMode Only valid URLs are accepted. This mode is useful for
184 general URL validation.
185
186 \value DecodedMode QUrl will interpret the URL component in the fully-decoded form,
187 where percent characters stand for themselves, not as the beginning
188 of a percent-encoded sequence. This mode is only valid for the
189 setters setting components of a URL; it is not permitted in
190 the QUrl constructor, in fromEncoded() or in setUrl().
191 For more information on this mode, see the documentation for
192 \l {QUrl::ComponentFormattingOption}{QUrl::FullyDecoded}.
193
194 In TolerantMode, the parser has the following behaviour:
195
196 \list
197
198 \li Spaces and "%20": unencoded space characters will be accepted and will
199 be treated as equivalent to "%20".
200
201 \li Single "%" characters: Any occurrences of a percent character "%" not
202 followed by exactly two hexadecimal characters (e.g., "13% coverage.html")
203 will be replaced by "%25". Note that one lone "%" character will trigger
204 the correction mode for all percent characters.
205
206 \li Reserved and unreserved characters: An encoded URL should only
207 contain a few characters as literals; all other characters should
208 be percent-encoded. In TolerantMode, these characters will be
209 accepted if they are found in the URL:
210 space / double-quote / "<" / ">" / "\" /
211 "^" / "`" / "{" / "|" / "}"
212 Those same characters can be decoded again by passing QUrl::DecodeReserved
213 to toString() or toEncoded(). In the getters of individual components,
214 those characters are often returned in decoded form.
215
216 \endlist
217
218 When in StrictMode, if a parsing error is found, isValid() will return \c
219 false and errorString() will return a message describing the error.
220 If more than one error is detected, it is undefined which error gets
221 reported.
222
223 Note that TolerantMode is not usually enough for parsing user input, which
224 often contains more errors and expectations than the parser can deal with.
225 When dealing with data coming directly from the user -- as opposed to data
226 coming from data-transfer sources, such as other programs -- it is
227 recommended to use fromUserInput().
228
229 \sa fromUserInput(), setUrl(), toString(), toEncoded(), QUrl::FormattingOptions
230*/
231
232/*!
233 \enum QUrl::UrlFormattingOption
234
235 The formatting options define how the URL is formatted when written out
236 as text.
237
238 \value None The format of the URL is unchanged.
239 \value RemoveScheme The scheme is removed from the URL.
240 \value RemovePassword Any password in the URL is removed.
241 \value RemoveUserInfo Any user information in the URL is removed.
242 \value RemovePort Any specified port is removed from the URL.
243 \value RemoveAuthority
244 \value RemovePath The URL's path is removed, leaving only the scheme,
245 host address, and port (if present).
246 \value RemoveQuery The query part of the URL (following a '?' character)
247 is removed.
248 \value RemoveFragment
249 \value RemoveFilename The filename (i.e. everything after the last '/' in the path) is removed.
250 The trailing '/' is kept, unless StripTrailingSlash is set.
251 Only valid if RemovePath is not set.
252 \value PreferLocalFile If the URL is a local file according to isLocalFile()
253 and contains no query or fragment, a local file path is returned.
254 \value StripTrailingSlash The trailing slash is removed from the path, if one is present.
255 \value NormalizePathSegments Modifies the path to remove redundant directory separators,
256 and to resolve "."s and ".."s (as far as possible). For non-local paths, adjacent
257 slashes are preserved.
258
259 Note that the case folding rules in \l{RFC 3491}{Nameprep}, which QUrl
260 conforms to, require host names to always be converted to lower case,
261 regardless of the Qt::FormattingOptions used.
262
263 The options from QUrl::ComponentFormattingOptions are also possible.
264
265 \sa QUrl::ComponentFormattingOptions
266*/
267
268/*!
269 \enum QUrl::ComponentFormattingOption
270 \since 5.0
271
272 The component formatting options define how the components of an URL will
273 be formatted when written out as text. They can be combined with the
274 options from QUrl::FormattingOptions when used in toString() and
275 toEncoded().
276
277 \value PrettyDecoded The component is returned in a "pretty form", with
278 most percent-encoded characters decoded. The exact
279 behavior of PrettyDecoded varies from component to
280 component and may also change from Qt release to Qt
281 release. This is the default.
282
283 \value EncodeSpaces Leave space characters in their encoded form ("%20").
284
285 \value EncodeUnicode Leave non-US-ASCII characters encoded in their UTF-8
286 percent-encoded form (e.g., "%C3%A9" for the U+00E9
287 codepoint, LATIN SMALL LETTER E WITH ACUTE).
288
289 \value EncodeDelimiters Leave certain delimiters in their encoded form, as
290 would appear in the URL when the full URL is
291 represented as text. The delimiters are affected
292 by this option change from component to component.
293 This flag has no effect in toString() or toEncoded().
294
295 \value EncodeReserved Leave US-ASCII characters not permitted in the URL by
296 the specification in their encoded form. This is the
297 default on toString() and toEncoded().
298
299 \value DecodeReserved Decode the US-ASCII characters that the URL specification
300 does not allow to appear in the URL. This is the
301 default on the getters of individual components.
302
303 \value FullyEncoded Leave all characters in their properly-encoded form,
304 as this component would appear as part of a URL. When
305 used with toString(), this produces a fully-compliant
306 URL in QString form, exactly equal to the result of
307 toEncoded()
308
309 \value FullyDecoded Attempt to decode as much as possible. For individual
310 components of the URL, this decodes every percent
311 encoding sequence, including control characters (U+0000
312 to U+001F) and UTF-8 sequences found in percent-encoded form.
313 Use of this mode may cause data loss, see below for more information.
314
315 The values of EncodeReserved and DecodeReserved should not be used together
316 in one call. The behavior is undefined if that happens. They are provided
317 as separate values because the behavior of the "pretty mode" with regards
318 to reserved characters is different on certain components and specially on
319 the full URL.
320
321 \section2 Full decoding
322
323 The FullyDecoded mode is similar to the behavior of the functions returning
324 QString in Qt 4.x, in that every character represents itself and never has
325 any special meaning. This is true even for the percent character ('%'),
326 which should be interpreted to mean a literal percent, not the beginning of
327 a percent-encoded sequence. The same actual character, in all other
328 decoding modes, is represented by the sequence "%25".
329
330 Whenever re-applying data obtained with QUrl::FullyDecoded into a QUrl,
331 care must be taken to use the QUrl::DecodedMode parameter to the setters
332 (like setPath() and setUserName()). Failure to do so may cause
333 re-interpretation of the percent character ('%') as the beginning of a
334 percent-encoded sequence.
335
336 This mode is quite useful when portions of a URL are used in a non-URL
337 context. For example, to extract the username, password or file paths in an
338 FTP client application, the FullyDecoded mode should be used.
339
340 This mode should be used with care, since there are two conditions that
341 cannot be reliably represented in the returned QString. They are:
342
343 \list
344 \li \b{Non-UTF-8 sequences:} URLs may contain sequences of
345 percent-encoded characters that do not form valid UTF-8 sequences. Since
346 URLs need to be decoded using UTF-8, any decoder failure will result in
347 the QString containing one or more replacement characters where the
348 sequence existed.
349
350 \li \b{Encoded delimiters:} URLs are also allowed to make a distinction
351 between a delimiter found in its literal form and its equivalent in
352 percent-encoded form. This is most commonly found in the query, but is
353 permitted in most parts of the URL.
354 \endlist
355
356 The following example illustrates the problem:
357
358 \snippet code/src_corelib_io_qurl.cpp 10
359
360 If the two URLs were used via HTTP GET, the interpretation by the web
361 server would probably be different. In the first case, it would interpret
362 as one parameter, with a key of "q" and value "a+=b&c". In the second
363 case, it would probably interpret as two parameters, one with a key of "q"
364 and value "a =b", and the second with a key "c" and no value.
365
366 \sa QUrl::FormattingOptions
367*/
368
369/*!
370 \enum QUrl::UserInputResolutionOption
371 \since 5.4
372
373 The user input resolution options define how fromUserInput() should
374 interpret strings that could either be a relative path or the short
375 form of a HTTP URL. For instance \c{file.pl} can be either a local file
376 or the URL \c{http://file.pl}.
377
378 \value DefaultResolution The default resolution mechanism is to check
379 whether a local file exists, in the working
380 directory given to fromUserInput, and only
381 return a local path in that case. Otherwise a URL
382 is assumed.
383 \value AssumeLocalFile This option makes fromUserInput() always return
384 a local path unless the input contains a scheme, such as
385 \c{http://file.pl}. This is useful for applications
386 such as text editors, which are able to create
387 the file if it doesn't exist.
388
389 \sa fromUserInput()
390*/
391
392/*!
393 \fn QUrl::QUrl(QUrl &&other)
394
395 Move-constructs a QUrl instance, making it point at the same
396 object that \a other was pointing to.
397
398 \since 5.2
399*/
400
401/*!
402 \fn QUrl &QUrl::operator=(QUrl &&other)
403
404 Move-assigns \a other to this QUrl instance.
405
406 \since 5.2
407*/
408
409#include "qurl.h"
410#include "qurl_p.h"
411#include "qplatformdefs.h"
412#include "qstring.h"
413#include "qstringlist.h"
414#include "qdebug.h"
415#include "qhash.h"
416#include "qdir.h" // for QDir::fromNativeSeparators
417#include "qdatastream.h"
418#include "private/qipaddress_p.h"
419#include "qurlquery.h"
420#include "private/qdir_p.h"
421#include <private/qmemory_p.h>
422
423QT_BEGIN_NAMESPACE
424
425// in qstring.cpp:
426void qt_from_latin1(char16_t *dst, const char *str, size_t size) noexcept;
427
428inline static bool isHex(char c)
429{
430 c |= 0x20;
431 return (c >= '0' && c <= '9') || (c >= 'a' && c <= 'f');
432}
433
434static inline QString ftpScheme()
435{
436 return QStringLiteral("ftp");
437}
438
439static inline QString fileScheme()
440{
441 return QStringLiteral("file");
442}
443
444static inline QString webDavScheme()
445{
446 return QStringLiteral("webdavs");
447}
448
449static inline QString webDavSslTag()
450{
451 return QStringLiteral("@SSL");
452}
453
454class QUrlPrivate
455{
456public:
457 enum Section : uchar {
458 Scheme = 0x01,
459 UserName = 0x02,
460 Password = 0x04,
461 UserInfo = UserName | Password,
462 Host = 0x08,
463 Port = 0x10,
464 Authority = UserInfo | Host | Port,
465 Path = 0x20,
466 Hierarchy = Authority | Path,
467 Query = 0x40,
468 Fragment = 0x80,
469 FullUrl = 0xff
470 };
471
472 enum Flags : uchar {
473 IsLocalFile = 0x01
474 };
475
476 enum ErrorCode {
477 // the high byte of the error code matches the Section
478 // the first item in each value must be the generic "Invalid xxx Error"
479 InvalidSchemeError = Scheme << 8,
480
481 InvalidUserNameError = UserName << 8,
482
483 InvalidPasswordError = Password << 8,
484
485 InvalidRegNameError = Host << 8,
486 InvalidIPv4AddressError,
487 InvalidIPv6AddressError,
488 InvalidCharacterInIPv6Error,
489 InvalidIPvFutureError,
490 HostMissingEndBracket,
491
492 InvalidPortError = Port << 8,
493 PortEmptyError,
494
495 InvalidPathError = Path << 8,
496
497 InvalidQueryError = Query << 8,
498
499 InvalidFragmentError = Fragment << 8,
500
501 // the following three cases are only possible in combination with
502 // presence/absence of the path, authority and scheme. See validityError().
503 AuthorityPresentAndPathIsRelative = Authority << 8 | Path << 8 | 0x10000,
504 AuthorityAbsentAndPathIsDoubleSlash,
505 RelativeUrlPathContainsColonBeforeSlash = Scheme << 8 | Authority << 8 | Path << 8 | 0x10000,
506
507 NoError = 0
508 };
509
510 struct Error {
511 QString source;
512 ErrorCode code;
513 int position;
514 };
515
516 QUrlPrivate();
517 QUrlPrivate(const QUrlPrivate &copy);
518 ~QUrlPrivate();
519
520 void parse(const QString &url, QUrl::ParsingMode parsingMode);
521 bool isEmpty() const
522 { return sectionIsPresent == 0 && port == -1 && path.isEmpty(); }
523
524 std::unique_ptr<Error> cloneError() const;
525 void clearError();
526 void setError(ErrorCode errorCode, const QString &source, int supplement = -1);
527 ErrorCode validityError(QString *source = nullptr, int *position = nullptr) const;
528 bool validateComponent(Section section, const QString &input, int begin, int end);
529 bool validateComponent(Section section, const QString &input)
530 { return validateComponent(section, input, 0, uint(input.length())); }
531
532 // no QString scheme() const;
533 void appendAuthority(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
534 void appendUserInfo(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
535 void appendUserName(QString &appendTo, QUrl::FormattingOptions options) const;
536 void appendPassword(QString &appendTo, QUrl::FormattingOptions options) const;
537 void appendHost(QString &appendTo, QUrl::FormattingOptions options) const;
538 void appendPath(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
539 void appendQuery(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
540 void appendFragment(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
541
542 // the "end" parameters are like STL iterators: they point to one past the last valid element
543 bool setScheme(const QString &value, int len, bool doSetError);
544 void setAuthority(const QString &auth, int from, int end, QUrl::ParsingMode mode);
545 void setUserInfo(const QString &userInfo, int from, int end);
546 void setUserName(const QString &value, int from, int end);
547 void setPassword(const QString &value, int from, int end);
548 bool setHost(const QString &value, int from, int end, QUrl::ParsingMode mode);
549 void setPath(const QString &value, int from, int end);
550 void setQuery(const QString &value, int from, int end);
551 void setFragment(const QString &value, int from, int end);
552
553 inline bool hasScheme() const { return sectionIsPresent & Scheme; }
554 inline bool hasAuthority() const { return sectionIsPresent & Authority; }
555 inline bool hasUserInfo() const { return sectionIsPresent & UserInfo; }
556 inline bool hasUserName() const { return sectionIsPresent & UserName; }
557 inline bool hasPassword() const { return sectionIsPresent & Password; }
558 inline bool hasHost() const { return sectionIsPresent & Host; }
559 inline bool hasPort() const { return port != -1; }
560 inline bool hasPath() const { return !path.isEmpty(); }
561 inline bool hasQuery() const { return sectionIsPresent & Query; }
562 inline bool hasFragment() const { return sectionIsPresent & Fragment; }
563
564 inline bool isLocalFile() const { return flags & IsLocalFile; }
565 QString toLocalFile(QUrl::FormattingOptions options) const;
566
567 QString mergePaths(const QString &relativePath) const;
568
569 QAtomicInt ref;
570 int port;
571
572 QString scheme;
573 QString userName;
574 QString password;
575 QString host;
576 QString path;
577 QString query;
578 QString fragment;
579
580 std::unique_ptr<Error> error;
581
582 // not used for:
583 // - Port (port == -1 means absence)
584 // - Path (there's no path delimiter, so we optimize its use out of existence)
585 // Schemes are never supposed to be empty, but we keep the flag anyway
586 uchar sectionIsPresent;
587 uchar flags;
588
589 // 32-bit: 2 bytes tail padding available
590 // 64-bit: 6 bytes tail padding available
591};
592
593inline QUrlPrivate::QUrlPrivate()
594 : ref(1), port(-1),
595 sectionIsPresent(0),
596 flags(0)
597{
598}
599
600inline QUrlPrivate::QUrlPrivate(const QUrlPrivate &copy)
601 : ref(1), port(copy.port),
602 scheme(copy.scheme),
603 userName(copy.userName),
604 password(copy.password),
605 host(copy.host),
606 path(copy.path),
607 query(copy.query),
608 fragment(copy.fragment),
609 error(copy.cloneError()),
610 sectionIsPresent(copy.sectionIsPresent),
611 flags(copy.flags)
612{
613}
614
615inline QUrlPrivate::~QUrlPrivate()
616 = default;
617
618std::unique_ptr<QUrlPrivate::Error> QUrlPrivate::cloneError() const
619{
620 return error ? qt_make_unique<Error>(*error) : nullptr;
621}
622
623inline void QUrlPrivate::clearError()
624{
625 error.reset();
626}
627
628inline void QUrlPrivate::setError(ErrorCode errorCode, const QString &source, int supplement)
629{
630 if (error) {
631 // don't overwrite an error set in a previous section during parsing
632 return;
633 }
634 error = qt_make_unique<Error>();
635 error->code = errorCode;
636 error->source = source;
637 error->position = supplement;
638}
639
640// From RFC 3986, Appendix A Collected ABNF for URI
641// URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
642//[...]
643// scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
644//
645// authority = [ userinfo "@" ] host [ ":" port ]
646// userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
647// host = IP-literal / IPv4address / reg-name
648// port = *DIGIT
649//[...]
650// reg-name = *( unreserved / pct-encoded / sub-delims )
651//[..]
652// pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
653//
654// query = *( pchar / "/" / "?" )
655//
656// fragment = *( pchar / "/" / "?" )
657//
658// pct-encoded = "%" HEXDIG HEXDIG
659//
660// unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
661// reserved = gen-delims / sub-delims
662// gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
663// sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
664// / "*" / "+" / "," / ";" / "="
665// the path component has a complex ABNF that basically boils down to
666// slash-separated segments of "pchar"
667
668// The above is the strict definition of the URL components and we mostly
669// adhere to it, with few exceptions. QUrl obeys the following behavior:
670// - percent-encoding sequences always use uppercase HEXDIG;
671// - unreserved characters are *always* decoded, no exceptions;
672// - the space character and bytes with the high bit set are controlled by
673// the EncodeSpaces and EncodeUnicode bits;
674// - control characters, the percent sign itself, and bytes with the high
675// bit set that don't form valid UTF-8 sequences are always encoded,
676// except in FullyDecoded mode;
677// - sub-delims are always left alone, except in FullyDecoded mode;
678// - gen-delim change behavior depending on which section of the URL (or
679// the entire URL) we're looking at; see below;
680// - characters not mentioned above, like "<", and ">", are usually
681// decoded in individual sections of the URL, but encoded when the full
682// URL is put together (we can change on subjective definition of
683// "pretty").
684//
685// The behavior for the delimiters bears some explanation. The spec says in
686// section 2.2:
687// URIs that differ in the replacement of a reserved character with its
688// corresponding percent-encoded octet are not equivalent.
689// (note: QUrl API mistakenly uses the "reserved" term, so we will refer to
690// them here as "delimiters").
691//
692// For that reason, we cannot encode delimiters found in decoded form and we
693// cannot decode the ones found in encoded form if that would change the
694// interpretation. Conversely, we *can* perform the transformation if it would
695// not change the interpretation. From the last component of a URL to the first,
696// here are the gen-delims we can unambiguously transform when the field is
697// taken in isolation:
698// - fragment: none, since it's the last
699// - query: "#" is unambiguous
700// - path: "#" and "?" are unambiguous
701// - host: completely special but never ambiguous, see setHost() below.
702// - password: the "#", "?", "/", "[", "]" and "@" characters are unambiguous
703// - username: the "#", "?", "/", "[", "]", "@", and ":" characters are unambiguous
704// - scheme: doesn't accept any delimiter, see setScheme() below.
705//
706// Internally, QUrl stores each component in the format that corresponds to the
707// default mode (PrettyDecoded). It deviates from the "strict" FullyEncoded
708// mode in the following way:
709// - spaces are decoded
710// - valid UTF-8 sequences are decoded
711// - gen-delims that can be unambiguously transformed are decoded
712// - characters controlled by DecodeReserved are often decoded, though this behavior
713// can change depending on the subjective definition of "pretty"
714//
715// Note that the list of gen-delims that we can transform is different for the
716// user info (user name + password) and the authority (user info + host +
717// port).
718
719
720// list the recoding table modifications to be used with the recodeFromUser and
721// appendToUser functions, according to the rules above. Spaces and UTF-8
722// sequences are handled outside the tables.
723
724// the encodedXXX tables are run with the delimiters set to "leave" by default;
725// the decodedXXX tables are run with the delimiters set to "decode" by default
726// (except for the query, which doesn't use these functions)
727
728namespace {
729template <typename T> constexpr ushort decode(T x) noexcept { return ushort(x); }
730template <typename T> constexpr ushort leave(T x) noexcept { return ushort(0x100 | x); }
731template <typename T> constexpr ushort encode(T x) noexcept { return ushort(0x200 | x); }
732}
733
734static const ushort userNameInIsolation[] = {
735 decode(':'), // 0
736 decode('@'), // 1
737 decode(']'), // 2
738 decode('['), // 3
739 decode('/'), // 4
740 decode('?'), // 5
741 decode('#'), // 6
742
743 decode('"'), // 7
744 decode('<'),
745 decode('>'),
746 decode('^'),
747 decode('\\'),
748 decode('|'),
749 decode('{'),
750 decode('}'),
751 0
752};
753static const ushort * const passwordInIsolation = userNameInIsolation + 1;
754static const ushort * const pathInIsolation = userNameInIsolation + 5;
755static const ushort * const queryInIsolation = userNameInIsolation + 6;
756static const ushort * const fragmentInIsolation = userNameInIsolation + 7;
757
758static const ushort userNameInUserInfo[] = {
759 encode(':'), // 0
760 decode('@'), // 1
761 decode(']'), // 2
762 decode('['), // 3
763 decode('/'), // 4
764 decode('?'), // 5
765 decode('#'), // 6
766
767 decode('"'), // 7
768 decode('<'),
769 decode('>'),
770 decode('^'),
771 decode('\\'),
772 decode('|'),
773 decode('{'),
774 decode('}'),
775 0
776};
777static const ushort * const passwordInUserInfo = userNameInUserInfo + 1;
778
779static const ushort userNameInAuthority[] = {
780 encode(':'), // 0
781 encode('@'), // 1
782 encode(']'), // 2
783 encode('['), // 3
784 decode('/'), // 4
785 decode('?'), // 5
786 decode('#'), // 6
787
788 decode('"'), // 7
789 decode('<'),
790 decode('>'),
791 decode('^'),
792 decode('\\'),
793 decode('|'),
794 decode('{'),
795 decode('}'),
796 0
797};
798static const ushort * const passwordInAuthority = userNameInAuthority + 1;
799
800static const ushort userNameInUrl[] = {
801 encode(':'), // 0
802 encode('@'), // 1
803 encode(']'), // 2
804 encode('['), // 3
805 encode('/'), // 4
806 encode('?'), // 5
807 encode('#'), // 6
808
809 // no need to list encode(x) for the other characters
810 0
811};
812static const ushort * const passwordInUrl = userNameInUrl + 1;
813static const ushort * const pathInUrl = userNameInUrl + 5;
814static const ushort * const queryInUrl = userNameInUrl + 6;
815static const ushort * const fragmentInUrl = userNameInUrl + 6;
816
817static inline void parseDecodedComponent(QString &data)
818{
819 data.replace(QLatin1Char('%'), QLatin1String("%25"));
820}
821
822static inline QString
823recodeFromUser(const QString &input, const ushort *actions, int from, int to)
824{
825 QString output;
826 const QChar *begin = input.constData() + from;
827 const QChar *end = input.constData() + to;
828 if (qt_urlRecode(output, QStringView{begin, end}, {}, actions))
829 return output;
830
831 return input.mid(from, to - from);
832}
833
834// appendXXXX functions: copy from the internal form to the external, user form.
835// the internal value is stored in its PrettyDecoded form, so that case is easy.
836static inline void appendToUser(QString &appendTo, QStringView value, QUrl::FormattingOptions options,
837 const ushort *actions)
838{
839 // Test ComponentFormattingOptions, ignore FormattingOptions.
840 if ((options & 0xFFFF0000) == QUrl::PrettyDecoded) {
841 appendTo += value;
842 return;
843 }
844
845 if (!qt_urlRecode(appendTo, value, options, actions))
846 appendTo += value;
847}
848
849inline void QUrlPrivate::appendAuthority(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
850{
851 if ((options & QUrl::RemoveUserInfo) != QUrl::RemoveUserInfo) {
852 appendUserInfo(appendTo, options, appendingTo);
853
854 // add '@' only if we added anything
855 if (hasUserName() || (hasPassword() && (options & QUrl::RemovePassword) == 0))
856 appendTo += QLatin1Char('@');
857 }
858 appendHost(appendTo, options);
859 if (!(options & QUrl::RemovePort) && port != -1)
860 appendTo += QLatin1Char(':') + QString::number(port);
861}
862
863inline void QUrlPrivate::appendUserInfo(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
864{
865 if (Q_LIKELY(!hasUserInfo()))
866 return;
867
868 const ushort *userNameActions;
869 const ushort *passwordActions;
870 if (options & QUrl::EncodeDelimiters) {
871 userNameActions = userNameInUrl;
872 passwordActions = passwordInUrl;
873 } else {
874 switch (appendingTo) {
875 case UserInfo:
876 userNameActions = userNameInUserInfo;
877 passwordActions = passwordInUserInfo;
878 break;
879
880 case Authority:
881 userNameActions = userNameInAuthority;
882 passwordActions = passwordInAuthority;
883 break;
884
885 case FullUrl:
886 userNameActions = userNameInUrl;
887 passwordActions = passwordInUrl;
888 break;
889
890 default:
891 // can't happen
892 Q_UNREACHABLE();
893 break;
894 }
895 }
896
897 if (!qt_urlRecode(appendTo, userName, options, userNameActions))
898 appendTo += userName;
899 if (options & QUrl::RemovePassword || !hasPassword()) {
900 return;
901 } else {
902 appendTo += QLatin1Char(':');
903 if (!qt_urlRecode(appendTo, password, options, passwordActions))
904 appendTo += password;
905 }
906}
907
908inline void QUrlPrivate::appendUserName(QString &appendTo, QUrl::FormattingOptions options) const
909{
910 // only called from QUrl::userName()
911 appendToUser(appendTo, userName, options,
912 options & QUrl::EncodeDelimiters ? userNameInUrl : userNameInIsolation);
913}
914
915inline void QUrlPrivate::appendPassword(QString &appendTo, QUrl::FormattingOptions options) const
916{
917 // only called from QUrl::password()
918 appendToUser(appendTo, password, options,
919 options & QUrl::EncodeDelimiters ? passwordInUrl : passwordInIsolation);
920}
921
922inline void QUrlPrivate::appendPath(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
923{
924 QString thePath = path;
925 if (options & QUrl::NormalizePathSegments) {
926 thePath = qt_normalizePathSegments(path, isLocalFile() ? QDirPrivate::DefaultNormalization : QDirPrivate::RemotePath);
927 }
928
929 QStringView thePathView(thePath);
930 if (options & QUrl::RemoveFilename) {
931 const int slash = path.lastIndexOf(QLatin1Char('/'));
932 if (slash == -1)
933 return;
934 thePathView = QStringView{path}.left(slash + 1);
935 }
936 // check if we need to remove trailing slashes
937 if (options & QUrl::StripTrailingSlash) {
938 while (thePathView.length() > 1 && thePathView.endsWith(QLatin1Char('/')))
939 thePathView.chop(1);
940 }
941
942 appendToUser(appendTo, thePathView, options,
943 appendingTo == FullUrl || options & QUrl::EncodeDelimiters ? pathInUrl : pathInIsolation);
944}
945
946inline void QUrlPrivate::appendFragment(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
947{
948 appendToUser(appendTo, fragment, options,
949 options & QUrl::EncodeDelimiters ? fragmentInUrl :
950 appendingTo == FullUrl ? nullptr : fragmentInIsolation);
951}
952
953inline void QUrlPrivate::appendQuery(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
954{
955 appendToUser(appendTo, query, options,
956 appendingTo == FullUrl || options & QUrl::EncodeDelimiters ? queryInUrl : queryInIsolation);
957}
958
959// setXXX functions
960
961inline bool QUrlPrivate::setScheme(const QString &value, int len, bool doSetError)
962{
963 // schemes are strictly RFC-compliant:
964 // scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
965 // we also lowercase the scheme
966
967 // schemes in URLs are not allowed to be empty, but they can be in
968 // "Relative URIs" which QUrl also supports. QUrl::setScheme does
969 // not call us with len == 0, so this can only be from parse()
970 scheme.clear();
971 if (len == 0)
972 return false;
973
974 sectionIsPresent |= Scheme;
975
976 // validate it:
977 int needsLowercasing = -1;
978 const ushort *p = value.utf16();
979 for (int i = 0; i < len; ++i) {
980 if (p[i] >= 'a' && p[i] <= 'z')
981 continue;
982 if (p[i] >= 'A' && p[i] <= 'Z') {
983 needsLowercasing = i;
984 continue;
985 }
986 if (i) {
987 if (p[i] >= '0' && p[i] <= '9')
988 continue;
989 if (p[i] == '+' || p[i] == '-' || p[i] == '.')
990 continue;
991 }
992
993 // found something else
994 // don't call setError needlessly:
995 // if we've been called from parse(), it will try to recover
996 if (doSetError)
997 setError(InvalidSchemeError, value, i);
998 return false;
999 }
1000
1001 scheme = value.left(len);
1002
1003 if (needsLowercasing != -1) {
1004 // schemes are ASCII only, so we don't need the full Unicode toLower
1005 QChar *schemeData = scheme.data(); // force detaching here
1006 for (int i = needsLowercasing; i >= 0; --i) {
1007 ushort c = schemeData[i].unicode();
1008 if (c >= 'A' && c <= 'Z')
1009 schemeData[i] = QChar(c + 0x20);
1010 }
1011 }
1012
1013 // did we set to the file protocol?
1014 if (scheme == fileScheme()
1015#ifdef Q_OS_WIN
1016 || scheme == webDavScheme()
1017#endif
1018 ) {
1019 flags |= IsLocalFile;
1020 } else {
1021 flags &= ~IsLocalFile;
1022 }
1023 return true;
1024}
1025
1026inline void QUrlPrivate::setAuthority(const QString &auth, int from, int end, QUrl::ParsingMode mode)
1027{
1028 sectionIsPresent &= ~Authority;
1029 sectionIsPresent |= Host;
1030 port = -1;
1031
1032 // we never actually _loop_
1033 while (from != end) {
1034 int userInfoIndex = auth.indexOf(QLatin1Char('@'), from);
1035 if (uint(userInfoIndex) < uint(end)) {
1036 setUserInfo(auth, from, userInfoIndex);
1037 if (mode == QUrl::StrictMode && !validateComponent(UserInfo, auth, from, userInfoIndex))
1038 break;
1039 from = userInfoIndex + 1;
1040 }
1041
1042 int colonIndex = auth.lastIndexOf(QLatin1Char(':'), end - 1);
1043 if (colonIndex < from)
1044 colonIndex = -1;
1045
1046 if (uint(colonIndex) < uint(end)) {
1047 if (auth.at(from).unicode() == '[') {
1048 // check if colonIndex isn't inside the "[...]" part
1049 int closingBracket = auth.indexOf(QLatin1Char(']'), from);
1050 if (uint(closingBracket) > uint(colonIndex))
1051 colonIndex = -1;
1052 }
1053 }
1054
1055 if (uint(colonIndex) < uint(end) - 1) {
1056 // found a colon with digits after it
1057 unsigned long x = 0;
1058 for (int i = colonIndex + 1; i < end; ++i) {
1059 ushort c = auth.at(i).unicode();
1060 if (c >= '0' && c <= '9') {
1061 x *= 10;
1062 x += c - '0';
1063 } else {
1064 x = ulong(-1); // x != ushort(x)
1065 break;
1066 }
1067 }
1068 if (x == ushort(x)) {
1069 port = ushort(x);
1070 } else {
1071 setError(InvalidPortError, auth, colonIndex + 1);
1072 if (mode == QUrl::StrictMode)
1073 break;
1074 }
1075 }
1076
1077 setHost(auth, from, qMin<uint>(end, colonIndex), mode);
1078 if (mode == QUrl::StrictMode && !validateComponent(Host, auth, from, qMin<uint>(end, colonIndex))) {
1079 // clear host too
1080 sectionIsPresent &= ~Authority;
1081 break;
1082 }
1083
1084 // success
1085 return;
1086 }
1087 // clear all sections but host
1088 sectionIsPresent &= ~Authority | Host;
1089 userName.clear();
1090 password.clear();
1091 host.clear();
1092 port = -1;
1093}
1094
1095inline void QUrlPrivate::setUserInfo(const QString &userInfo, int from, int end)
1096{
1097 int delimIndex = userInfo.indexOf(QLatin1Char(':'), from);
1098 setUserName(userInfo, from, qMin<uint>(delimIndex, end));
1099
1100 if (uint(delimIndex) >= uint(end)) {
1101 password.clear();
1102 sectionIsPresent &= ~Password;
1103 } else {
1104 setPassword(userInfo, delimIndex + 1, end);
1105 }
1106}
1107
1108inline void QUrlPrivate::setUserName(const QString &value, int from, int end)
1109{
1110 sectionIsPresent |= UserName;
1111 userName = recodeFromUser(value, userNameInIsolation, from, end);
1112}
1113
1114inline void QUrlPrivate::setPassword(const QString &value, int from, int end)
1115{
1116 sectionIsPresent |= Password;
1117 password = recodeFromUser(value, passwordInIsolation, from, end);
1118}
1119
1120inline void QUrlPrivate::setPath(const QString &value, int from, int end)
1121{
1122 // sectionIsPresent |= Path; // not used, save some cycles
1123 path = recodeFromUser(value, pathInIsolation, from, end);
1124}
1125
1126inline void QUrlPrivate::setFragment(const QString &value, int from, int end)
1127{
1128 sectionIsPresent |= Fragment;
1129 fragment = recodeFromUser(value, fragmentInIsolation, from, end);
1130}
1131
1132inline void QUrlPrivate::setQuery(const QString &value, int from, int iend)
1133{
1134 sectionIsPresent |= Query;
1135 query = recodeFromUser(value, queryInIsolation, from, iend);
1136}
1137
1138// Host handling
1139// The RFC says the host is:
1140// host = IP-literal / IPv4address / reg-name
1141// IP-literal = "[" ( IPv6address / IPvFuture ) "]"
1142// IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
1143// [a strict definition of IPv6Address and IPv4Address]
1144// reg-name = *( unreserved / pct-encoded / sub-delims )
1145//
1146// We deviate from the standard in all but IPvFuture. For IPvFuture we accept
1147// and store only exactly what the RFC says we should. No percent-encoding is
1148// permitted in this field, so Unicode characters and space aren't either.
1149//
1150// For IPv4 addresses, we accept broken addresses like inet_aton does (that is,
1151// less than three dots). However, we correct the address to the proper form
1152// and store the corrected address. After correction, we comply to the RFC and
1153// it's exclusively composed of unreserved characters.
1154//
1155// For IPv6 addresses, we accept addresses including trailing (embedded) IPv4
1156// addresses, the so-called v4-compat and v4-mapped addresses. We also store
1157// those addresses like that in the hostname field, which violates the spec.
1158// IPv6 hosts are stored with the square brackets in the QString. It also
1159// requires no transformation in any way.
1160//
1161// As for registered names, it's the other way around: we accept only valid
1162// hostnames as specified by STD 3 and IDNA. That means everything we accept is
1163// valid in the RFC definition above, but there are many valid reg-names
1164// according to the RFC that we do not accept in the name of security. Since we
1165// do accept IDNA, reg-names are subject to ACE encoding and decoding, which is
1166// specified by the DecodeUnicode flag. The hostname is stored in its Unicode form.
1167
1168inline void QUrlPrivate::appendHost(QString &appendTo, QUrl::FormattingOptions options) const
1169{
1170 if (host.isEmpty())
1171 return;
1172 if (host.at(0).unicode() == '[') {
1173 // IPv6 addresses might contain a zone-id which needs to be recoded
1174 if (options != 0)
1175 if (qt_urlRecode(appendTo, host, options, nullptr))
1176 return;
1177 appendTo += host;
1178 } else {
1179 // this is either an IPv4Address or a reg-name
1180 // if it is a reg-name, it is already stored in Unicode form
1181 if (options & QUrl::EncodeUnicode && !(options & 0x4000000))
1182 appendTo += qt_ACE_do(host, ToAceOnly, AllowLeadingDot);
1183 else
1184 appendTo += host;
1185 }
1186}
1187
1188// the whole IPvFuture is passed and parsed here, including brackets;
1189// returns null if the parsing was successful, or the QChar of the first failure
1190static const QChar *parseIpFuture(QString &host, const QChar *begin, const QChar *end, QUrl::ParsingMode mode)
1191{
1192 // IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
1193 static const char acceptable[] =
1194 "!$&'()*+,;=" // sub-delims
1195 ":" // ":"
1196 "-._~"; // unreserved
1197
1198 // the brackets and the "v" have been checked
1199 const QChar *const origBegin = begin;
1200 if (begin[3].unicode() != '.')
1201 return &begin[3];
1202 if ((begin[2].unicode() >= 'A' && begin[2].unicode() <= 'F') ||
1203 (begin[2].unicode() >= 'a' && begin[2].unicode() <= 'f') ||
1204 (begin[2].unicode() >= '0' && begin[2].unicode() <= '9')) {
1205 // this is so unlikely that we'll just go down the slow path
1206 // decode the whole string, skipping the "[vH." and "]" which we already know to be there
1207 host += QStringView(begin, 4);
1208
1209 // uppercase the version, if necessary
1210 if (begin[2].unicode() >= 'a')
1211 host[host.length() - 2] = QChar{begin[2].unicode() - 0x20};
1212
1213 begin += 4;
1214 --end;
1215
1216 QString decoded;
1217 if (mode == QUrl::TolerantMode && qt_urlRecode(decoded, QStringView{begin, end}, QUrl::FullyDecoded, nullptr)) {
1218 begin = decoded.constBegin();
1219 end = decoded.constEnd();
1220 }
1221
1222 for ( ; begin != end; ++begin) {
1223 if (begin->unicode() >= 'A' && begin->unicode() <= 'Z')
1224 host += *begin;
1225 else if (begin->unicode() >= 'a' && begin->unicode() <= 'z')
1226 host += *begin;
1227 else if (begin->unicode() >= '0' && begin->unicode() <= '9')
1228 host += *begin;
1229 else if (begin->unicode() < 0x80 && strchr(acceptable, begin->unicode()) != nullptr)
1230 host += *begin;
1231 else
1232 return decoded.isEmpty() ? begin : &origBegin[2];
1233 }
1234 host += QLatin1Char(']');
1235 return nullptr;
1236 }
1237 return &origBegin[2];
1238}
1239
1240// ONLY the IPv6 address is parsed here, WITHOUT the brackets
1241static const QChar *parseIp6(QString &host, const QChar *begin, const QChar *end, QUrl::ParsingMode mode)
1242{
1243 // ### Update to use QStringView once QStringView::indexOf and QStringView::lastIndexOf exists
1244 QString decoded;
1245 if (mode == QUrl::TolerantMode) {
1246 // this struct is kept in automatic storage because it's only 4 bytes
1247 const ushort decodeColon[] = { decode(':'), 0 };
1248 if (qt_urlRecode(decoded, QStringView{begin, end}, QUrl::ComponentFormattingOption::PrettyDecoded, decodeColon) == 0)
1249 decoded = QString(begin, end-begin);
1250 } else {
1251 decoded = QString(begin, end-begin);
1252 }
1253
1254 const QLatin1String zoneIdIdentifier("%25");
1255 QIPAddressUtils::IPv6Address address;
1256 QString zoneId;
1257
1258 const QChar *endBeforeZoneId = decoded.constEnd();
1259
1260 int zoneIdPosition = decoded.indexOf(zoneIdIdentifier);
1261 if ((zoneIdPosition != -1) && (decoded.lastIndexOf(zoneIdIdentifier) == zoneIdPosition)) {
1262 zoneId = decoded.mid(zoneIdPosition + zoneIdIdentifier.size());
1263 endBeforeZoneId = decoded.constBegin() + zoneIdPosition;
1264
1265 if (zoneId.isEmpty())
1266 return end;
1267 }
1268
1269 const QChar *ret = QIPAddressUtils::parseIp6(address, decoded.constBegin(), endBeforeZoneId);
1270 if (ret)
1271 return begin + (ret - decoded.constBegin());
1272
1273 host.reserve(host.size() + (decoded.constEnd() - decoded.constBegin()));
1274 host += QLatin1Char('[');
1275 QIPAddressUtils::toString(host, address);
1276
1277 if (!zoneId.isEmpty()) {
1278 host += zoneIdIdentifier;
1279 host += zoneId;
1280 }
1281 host += QLatin1Char(']');
1282 return nullptr;
1283}
1284
1285inline bool QUrlPrivate::setHost(const QString &value, int from, int iend, QUrl::ParsingMode mode)
1286{
1287 const QChar *begin = value.constData() + from;
1288 const QChar *end = value.constData() + iend;
1289
1290 const int len = end - begin;
1291 host.clear();
1292 sectionIsPresent |= Host;
1293 if (len == 0)
1294 return true;
1295
1296 if (begin[0].unicode() == '[') {
1297 // IPv6Address or IPvFuture
1298 // smallest IPv6 address is "[::]" (len = 4)
1299 // smallest IPvFuture address is "[v7.X]" (len = 6)
1300 if (end[-1].unicode() != ']') {
1301 setError(HostMissingEndBracket, value);
1302 return false;
1303 }
1304
1305 if (len > 5 && begin[1].unicode() == 'v') {
1306 const QChar *c = parseIpFuture(host, begin, end, mode);
1307 if (c)
1308 setError(InvalidIPvFutureError, value, c - value.constData());
1309 return !c;
1310 } else if (begin[1].unicode() == 'v') {
1311 setError(InvalidIPvFutureError, value, from);
1312 }
1313
1314 const QChar *c = parseIp6(host, begin + 1, end - 1, mode);
1315 if (!c)
1316 return true;
1317
1318 if (c == end - 1)
1319 setError(InvalidIPv6AddressError, value, from);
1320 else
1321 setError(InvalidCharacterInIPv6Error, value, c - value.constData());
1322 return false;
1323 }
1324
1325 // check if it's an IPv4 address
1326 QIPAddressUtils::IPv4Address ip4;
1327 if (QIPAddressUtils::parseIp4(ip4, begin, end)) {
1328 // yes, it was
1329 QIPAddressUtils::toString(host, ip4);
1330 return true;
1331 }
1332
1333 // This is probably a reg-name.
1334 // But it can also be an encoded string that, when decoded becomes one
1335 // of the types above.
1336 //
1337 // Two types of encoding are possible:
1338 // percent encoding (e.g., "%31%30%2E%30%2E%30%2E%31" -> "10.0.0.1")
1339 // Unicode encoding (some non-ASCII characters case-fold to digits
1340 // when nameprepping is done)
1341 //
1342 // The qt_ACE_do function below applies nameprepping and the STD3 check.
1343 // That means a Unicode string may become an IPv4 address, but it cannot
1344 // produce a '[' or a '%'.
1345
1346 // check for percent-encoding first
1347 QString s;
1348 if (mode == QUrl::TolerantMode && qt_urlRecode(s, QStringView{begin, end}, { }, nullptr)) {
1349 // something was decoded
1350 // anything encoded left?
1351 int pos = s.indexOf(QChar(0x25)); // '%'
1352 if (pos != -1) {
1353 setError(InvalidRegNameError, s, pos);
1354 return false;
1355 }
1356
1357 // recurse
1358 return setHost(s, 0, s.length(), QUrl::StrictMode);
1359 }
1360
1361 s = qt_ACE_do(QStringView(begin, len), NormalizeAce, ForbidLeadingDot);
1362 if (s.isEmpty()) {
1363 setError(InvalidRegNameError, value);
1364 return false;
1365 }
1366
1367 // check IPv4 again
1368 if (QIPAddressUtils::parseIp4(ip4, s.constBegin(), s.constEnd())) {
1369 QIPAddressUtils::toString(host, ip4);
1370 } else {
1371 host = s;
1372 }
1373 return true;
1374}
1375
1376inline void QUrlPrivate::parse(const QString &url, QUrl::ParsingMode parsingMode)
1377{
1378 // URI-reference = URI / relative-ref
1379 // URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
1380 // relative-ref = relative-part [ "?" query ] [ "#" fragment ]
1381 // hier-part = "//" authority path-abempty
1382 // / other path types
1383 // relative-part = "//" authority path-abempty
1384 // / other path types here
1385
1386 sectionIsPresent = 0;
1387 flags = 0;
1388 clearError();
1389
1390 // find the important delimiters
1391 int colon = -1;
1392 int question = -1;
1393 int hash = -1;
1394 const int len = url.length();
1395 const QChar *const begin = url.constData();
1396 const ushort *const data = reinterpret_cast<const ushort *>(begin);
1397
1398 for (int i = 0; i < len; ++i) {
1399 uint uc = data[i];
1400 if (uc == '#' && hash == -1) {
1401 hash = i;
1402
1403 // nothing more to be found
1404 break;
1405 }
1406
1407 if (question == -1) {
1408 if (uc == ':' && colon == -1)
1409 colon = i;
1410 else if (uc == '?')
1411 question = i;
1412 }
1413 }
1414
1415 // check if we have a scheme
1416 int hierStart;
1417 if (colon != -1 && setScheme(url, colon, /* don't set error */ false)) {
1418 hierStart = colon + 1;
1419 } else {
1420 // recover from a failed scheme: it might not have been a scheme at all
1421 scheme.clear();
1422 sectionIsPresent = 0;
1423 hierStart = 0;
1424 }
1425
1426 int pathStart;
1427 int hierEnd = qMin<uint>(qMin<uint>(question, hash), len);
1428 if (hierEnd - hierStart >= 2 && data[hierStart] == '/' && data[hierStart + 1] == '/') {
1429 // we have an authority, it ends at the first slash after these
1430 int authorityEnd = hierEnd;
1431 for (int i = hierStart + 2; i < authorityEnd ; ++i) {
1432 if (data[i] == '/') {
1433 authorityEnd = i;
1434 break;
1435 }
1436 }
1437
1438 setAuthority(url, hierStart + 2, authorityEnd, parsingMode);
1439
1440 // even if we failed to set the authority properly, let's try to recover
1441 pathStart = authorityEnd;
1442 setPath(url, pathStart, hierEnd);
1443 } else {
1444 userName.clear();
1445 password.clear();
1446 host.clear();
1447 port = -1;
1448 pathStart = hierStart;
1449
1450 if (hierStart < hierEnd)
1451 setPath(url, hierStart, hierEnd);
1452 else
1453 path.clear();
1454 }
1455
1456 if (uint(question) < uint(hash))
1457 setQuery(url, question + 1, qMin<uint>(hash, len));
1458
1459 if (hash != -1)
1460 setFragment(url, hash + 1, len);
1461
1462 if (error || parsingMode == QUrl::TolerantMode)
1463 return;
1464
1465 // The parsing so far was partially tolerant of errors, except for the
1466 // scheme parser (which is always strict) and the authority (which was
1467 // executed in strict mode).
1468 // If we haven't found any errors so far, continue the strict-mode parsing
1469 // from the path component onwards.
1470
1471 if (!validateComponent(Path, url, pathStart, hierEnd))
1472 return;
1473 if (uint(question) < uint(hash) && !validateComponent(Query, url, question + 1, qMin<uint>(hash, len)))
1474 return;
1475 if (hash != -1)
1476 validateComponent(Fragment, url, hash + 1, len);
1477}
1478
1479QString QUrlPrivate::toLocalFile(QUrl::FormattingOptions options) const
1480{
1481 QString tmp;
1482 QString ourPath;
1483 appendPath(ourPath, options, QUrlPrivate::Path);
1484
1485 // magic for shared drive on windows
1486 if (!host.isEmpty()) {
1487 tmp = QLatin1String("//") + host;
1488#ifdef Q_OS_WIN // QTBUG-42346, WebDAV is visible as local file on Windows only.
1489 if (scheme == webDavScheme())
1490 tmp += webDavSslTag();
1491#endif
1492 if (!ourPath.isEmpty() && !ourPath.startsWith(QLatin1Char('/')))
1493 tmp += QLatin1Char('/');
1494 tmp += ourPath;
1495 } else {
1496 tmp = ourPath;
1497#ifdef Q_OS_WIN
1498 // magic for drives on windows
1499 if (ourPath.length() > 2 && ourPath.at(0) == QLatin1Char('/') && ourPath.at(2) == QLatin1Char(':'))
1500 tmp.remove(0, 1);
1501#endif
1502 }
1503 return tmp;
1504}
1505
1506/*
1507 From http://www.ietf.org/rfc/rfc3986.txt, 5.2.3: Merge paths
1508
1509 Returns a merge of the current path with the relative path passed
1510 as argument.
1511
1512 Note: \a relativePath is relative (does not start with '/').
1513*/
1514inline QString QUrlPrivate::mergePaths(const QString &relativePath) const
1515{
1516 // If the base URI has a defined authority component and an empty
1517 // path, then return a string consisting of "/" concatenated with
1518 // the reference's path; otherwise,
1519 if (!host.isEmpty() && path.isEmpty())
1520 return QLatin1Char('/') + relativePath;
1521
1522 // Return a string consisting of the reference's path component
1523 // appended to all but the last segment of the base URI's path
1524 // (i.e., excluding any characters after the right-most "/" in the
1525 // base URI path, or excluding the entire base URI path if it does
1526 // not contain any "/" characters).
1527 QString newPath;
1528 if (!path.contains(QLatin1Char('/')))
1529 newPath = relativePath;
1530 else
1531 newPath = QStringView{path}.left(path.lastIndexOf(QLatin1Char('/')) + 1) + relativePath;
1532
1533 return newPath;
1534}
1535
1536/*
1537 From http://www.ietf.org/rfc/rfc3986.txt, 5.2.4: Remove dot segments
1538
1539 Removes unnecessary ../ and ./ from the path. Used for normalizing
1540 the URL.
1541*/
1542static void removeDotsFromPath(QString *path)
1543{
1544 // The input buffer is initialized with the now-appended path
1545 // components and the output buffer is initialized to the empty
1546 // string.
1547 QChar *out = path->data();
1548 const QChar *in = out;
1549 const QChar *end = out + path->size();
1550
1551 // If the input buffer consists only of
1552 // "." or "..", then remove that from the input
1553 // buffer;
1554 if (path->size() == 1 && in[0].unicode() == '.')
1555 ++in;
1556 else if (path->size() == 2 && in[0].unicode() == '.' && in[1].unicode() == '.')
1557 in += 2;
1558 // While the input buffer is not empty, loop:
1559 while (in < end) {
1560
1561 // otherwise, if the input buffer begins with a prefix of "../" or "./",
1562 // then remove that prefix from the input buffer;
1563 if (path->size() >= 2 && in[0].unicode() == '.' && in[1].unicode() == '/')
1564 in += 2;
1565 else if (path->size() >= 3 && in[0].unicode() == '.'
1566 && in[1].unicode() == '.' && in[2].unicode() == '/')
1567 in += 3;
1568
1569 // otherwise, if the input buffer begins with a prefix of
1570 // "/./" or "/.", where "." is a complete path segment,
1571 // then replace that prefix with "/" in the input buffer;
1572 if (in <= end - 3 && in[0].unicode() == '/' && in[1].unicode() == '.'
1573 && in[2].unicode() == '/') {
1574 in += 2;
1575 continue;
1576 } else if (in == end - 2 && in[0].unicode() == '/' && in[1].unicode() == '.') {
1577 *out++ = QLatin1Char('/');
1578 in += 2;
1579 break;
1580 }
1581
1582 // otherwise, if the input buffer begins with a prefix
1583 // of "/../" or "/..", where ".." is a complete path
1584 // segment, then replace that prefix with "/" in the
1585 // input buffer and remove the last //segment and its
1586 // preceding "/" (if any) from the output buffer;
1587 if (in <= end - 4 && in[0].unicode() == '/' && in[1].unicode() == '.'
1588 && in[2].unicode() == '.' && in[3].unicode() == '/') {
1589 while (out > path->constData() && (--out)->unicode() != '/')
1590 ;
1591 if (out == path->constData() && out->unicode() != '/')
1592 ++in;
1593 in += 3;
1594 continue;
1595 } else if (in == end - 3 && in[0].unicode() == '/' && in[1].unicode() == '.'
1596 && in[2].unicode() == '.') {
1597 while (out > path->constData() && (--out)->unicode() != '/')
1598 ;
1599 if (out->unicode() == '/')
1600 ++out;
1601 in += 3;
1602 break;
1603 }
1604
1605 // otherwise move the first path segment in
1606 // the input buffer to the end of the output
1607 // buffer, including the initial "/" character
1608 // (if any) and any subsequent characters up
1609 // to, but not including, the next "/"
1610 // character or the end of the input buffer.
1611 *out++ = *in++;
1612 while (in < end && in->unicode() != '/')
1613 *out++ = *in++;
1614 }
1615 path->truncate(out - path->constData());
1616}
1617
1618inline QUrlPrivate::ErrorCode QUrlPrivate::validityError(QString *source, int *position) const
1619{
1620 Q_ASSERT(!source == !position);
1621 if (error) {
1622 if (source) {
1623 *source = error->source;
1624 *position = error->position;
1625 }
1626 return error->code;
1627 }
1628
1629 // There are three more cases of invalid URLs that QUrl recognizes and they
1630 // are only possible with constructed URLs (setXXX methods), not with
1631 // parsing. Therefore, they are tested here.
1632 //
1633 // Two cases are a non-empty path that doesn't start with a slash and:
1634 // - with an authority
1635 // - without an authority, without scheme but the path with a colon before
1636 // the first slash
1637 // The third case is an empty authority and a non-empty path that starts
1638 // with "//".
1639 // Those cases are considered invalid because toString() would produce a URL
1640 // that wouldn't be parsed back to the same QUrl.
1641
1642 if (path.isEmpty())
1643 return NoError;
1644 if (path.at(0) == QLatin1Char('/')) {
1645 if (hasAuthority() || path.length() == 1 || path.at(1) != QLatin1Char('/'))
1646 return NoError;
1647 if (source) {
1648 *source = path;
1649 *position = 0;
1650 }
1651 return AuthorityAbsentAndPathIsDoubleSlash;
1652 }
1653
1654 if (sectionIsPresent & QUrlPrivate::Host) {
1655 if (source) {
1656 *source = path;
1657 *position = 0;
1658 }
1659 return AuthorityPresentAndPathIsRelative;
1660 }
1661 if (sectionIsPresent & QUrlPrivate::Scheme)
1662 return NoError;
1663
1664 // check for a path of "text:text/"
1665 for (int i = 0; i < path.length(); ++i) {
1666 ushort c = path.at(i).unicode();
1667 if (c == '/') {
1668 // found the slash before the colon
1669 return NoError;
1670 }
1671 if (c == ':') {
1672 // found the colon before the slash, it's invalid
1673 if (source) {
1674 *source = path;
1675 *position = i;
1676 }
1677 return RelativeUrlPathContainsColonBeforeSlash;
1678 }
1679 }
1680 return NoError;
1681}
1682
1683bool QUrlPrivate::validateComponent(QUrlPrivate::Section section, const QString &input,
1684 int begin, int end)
1685{
1686 // What we need to look out for, that the regular parser tolerates:
1687 // - percent signs not followed by two hex digits
1688 // - forbidden characters, which should always appear encoded
1689 // '"' / '<' / '>' / '\' / '^' / '`' / '{' / '|' / '}' / BKSP
1690 // control characters
1691 // - delimiters not allowed in certain positions
1692 // . scheme: parser is already strict
1693 // . user info: gen-delims except ":" disallowed ("/" / "?" / "#" / "[" / "]" / "@")
1694 // . host: parser is stricter than the standard
1695 // . port: parser is stricter than the standard
1696 // . path: all delimiters allowed
1697 // . fragment: all delimiters allowed
1698 // . query: all delimiters allowed
1699 static const char forbidden[] = "\"<>\\^`{|}\x7F";
1700 static const char forbiddenUserInfo[] = ":/?#[]@";
1701
1702 Q_ASSERT(section != Authority && section != Hierarchy && section != FullUrl);
1703
1704 const ushort *const data = reinterpret_cast<const ushort *>(input.constData());
1705 for (uint i = uint(begin); i < uint(end); ++i) {
1706 uint uc = data[i];
1707 if (uc >= 0x80)
1708 continue;
1709
1710 bool error = false;
1711 if ((uc == '%' && (uint(end) < i + 2 || !isHex(data[i + 1]) || !isHex(data[i + 2])))
1712 || uc <= 0x20 || strchr(forbidden, uc)) {
1713 // found an error
1714 error = true;
1715 } else if (section & UserInfo) {
1716 if (section == UserInfo && strchr(forbiddenUserInfo + 1, uc))
1717 error = true;
1718 else if (section != UserInfo && strchr(forbiddenUserInfo, uc))
1719 error = true;
1720 }
1721
1722 if (!error)
1723 continue;
1724
1725 ErrorCode errorCode = ErrorCode(int(section) << 8);
1726 if (section == UserInfo) {
1727 // is it the user name or the password?
1728 errorCode = InvalidUserNameError;
1729 for (uint j = uint(begin); j < i; ++j)
1730 if (data[j] == ':') {
1731 errorCode = InvalidPasswordError;
1732 break;
1733 }
1734 }
1735
1736 setError(errorCode, input, i);
1737 return false;
1738 }
1739
1740 // no errors
1741 return true;
1742}
1743
1744#if 0
1745inline void QUrlPrivate::validate() const
1746{
1747 QUrlPrivate *that = (QUrlPrivate *)this;
1748 that->encodedOriginal = that->toEncoded(); // may detach
1749 parse(ParseOnly);
1750
1751 QURL_SETFLAG(that->stateFlags, Validated);
1752
1753 if (!isValid)
1754 return;
1755
1756 QString auth = authority(); // causes the non-encoded forms to be valid
1757
1758 // authority() calls canonicalHost() which sets this
1759 if (!isHostValid)
1760 return;
1761
1762 if (scheme == QLatin1String("mailto")) {
1763 if (!host.isEmpty() || port != -1 || !userName.isEmpty() || !password.isEmpty()) {
1764 that->isValid = false;
1765 that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "expected empty host, username,"
1766 "port and password"),
1767 0, 0);
1768 }
1769 } else if (scheme == ftpScheme() || scheme == httpScheme()) {
1770 if (host.isEmpty() && !(path.isEmpty() && encodedPath.isEmpty())) {
1771 that->isValid = false;
1772 that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "the host is empty, but not the path"),
1773 0, 0);
1774 }
1775 }
1776}
1777#endif
1778
1779/*!
1780 \macro QT_NO_URL_CAST_FROM_STRING
1781 \relates QUrl
1782
1783 Disables automatic conversions from QString (or char *) to QUrl.
1784
1785 Compiling your code with this define is useful when you have a lot of
1786 code that uses QString for file names and you wish to convert it to
1787 use QUrl for network transparency. In any code that uses QUrl, it can
1788 help avoid missing QUrl::resolved() calls, and other misuses of
1789 QString to QUrl conversions.
1790
1791 \oldcode
1792 url = filename; // probably not what you want
1793 \newcode
1794 url = QUrl::fromLocalFile(filename);
1795 url = baseurl.resolved(QUrl(filename));
1796 \endcode
1797
1798 \sa QT_NO_CAST_FROM_ASCII
1799*/
1800
1801
1802/*!
1803 Constructs a URL by parsing \a url. QUrl will automatically percent encode
1804 all characters that are not allowed in a URL and decode the percent-encoded
1805 sequences that represent an unreserved character (letters, digits, hyphens,
1806 undercores, dots and tildes). All other characters are left in their
1807 original forms.
1808
1809 Parses the \a url using the parser mode \a parsingMode. In TolerantMode
1810 (the default), QUrl will correct certain mistakes, notably the presence of
1811 a percent character ('%') not followed by two hexadecimal digits, and it
1812 will accept any character in any position. In StrictMode, encoding mistakes
1813 will not be tolerated and QUrl will also check that certain forbidden
1814 characters are not present in unencoded form. If an error is detected in
1815 StrictMode, isValid() will return false. The parsing mode DecodedMode is not
1816 permitted in this context.
1817
1818 Example:
1819
1820 \snippet code/src_corelib_io_qurl.cpp 0
1821
1822 To construct a URL from an encoded string, you can also use fromEncoded():
1823
1824 \snippet code/src_corelib_io_qurl.cpp 1
1825
1826 Both functions are equivalent and, in Qt 5, both functions accept encoded
1827 data. Usually, the choice of the QUrl constructor or setUrl() versus
1828 fromEncoded() will depend on the source data: the constructor and setUrl()
1829 take a QString, whereas fromEncoded takes a QByteArray.
1830
1831 \sa setUrl(), fromEncoded(), TolerantMode
1832*/
1833QUrl::QUrl(const QString &url, ParsingMode parsingMode) : d(nullptr)
1834{
1835 setUrl(url, parsingMode);
1836}
1837
1838/*!
1839 Constructs an empty QUrl object.
1840*/
1841QUrl::QUrl() : d(nullptr)
1842{
1843}
1844
1845/*!
1846 Constructs a copy of \a other.
1847*/
1848QUrl::QUrl(const QUrl &other) : d(other.d)
1849{
1850 if (d)
1851 d->ref.ref();
1852}
1853
1854/*!
1855 Destructor; called immediately before the object is deleted.
1856*/
1857QUrl::~QUrl()
1858{
1859 if (d && !d->ref.deref())
1860 delete d;
1861}
1862
1863/*!
1864 Returns \c true if the URL is non-empty and valid; otherwise returns \c false.
1865
1866 The URL is run through a conformance test. Every part of the URL
1867 must conform to the standard encoding rules of the URI standard
1868 for the URL to be reported as valid.
1869
1870 \snippet code/src_corelib_io_qurl.cpp 2
1871*/
1872bool QUrl::isValid() const
1873{
1874 if (isEmpty()) {
1875 // also catches d == nullptr
1876 return false;
1877 }
1878 return d->validityError() == QUrlPrivate::NoError;
1879}
1880
1881/*!
1882 Returns \c true if the URL has no data; otherwise returns \c false.
1883
1884 \sa clear()
1885*/
1886bool QUrl::isEmpty() const
1887{
1888 if (!d) return true;
1889 return d->isEmpty();
1890}
1891
1892/*!
1893 Resets the content of the QUrl. After calling this function, the
1894 QUrl is equal to one that has been constructed with the default
1895 empty constructor.
1896
1897 \sa isEmpty()
1898*/
1899void QUrl::clear()
1900{
1901 if (d && !d->ref.deref())
1902 delete d;
1903 d = nullptr;
1904}
1905
1906/*!
1907 Parses \a url and sets this object to that value. QUrl will automatically
1908 percent encode all characters that are not allowed in a URL and decode the
1909 percent-encoded sequences that represent an unreserved character (letters,
1910 digits, hyphens, undercores, dots and tildes). All other characters are
1911 left in their original forms.
1912
1913 Parses the \a url using the parser mode \a parsingMode. In TolerantMode
1914 (the default), QUrl will correct certain mistakes, notably the presence of
1915 a percent character ('%') not followed by two hexadecimal digits, and it
1916 will accept any character in any position. In StrictMode, encoding mistakes
1917 will not be tolerated and QUrl will also check that certain forbidden
1918 characters are not present in unencoded form. If an error is detected in
1919 StrictMode, isValid() will return false. The parsing mode DecodedMode is
1920 not permitted in this context and will produce a run-time warning.
1921
1922 \sa url(), toString()
1923*/
1924void QUrl::setUrl(const QString &url, ParsingMode parsingMode)
1925{
1926 if (parsingMode == DecodedMode) {
1927 qWarning("QUrl: QUrl::DecodedMode is not permitted when parsing a full URL");
1928 } else {
1929 detach();
1930 d->parse(url, parsingMode);
1931 }
1932}
1933
1934/*!
1935 Sets the scheme of the URL to \a scheme. As a scheme can only
1936 contain ASCII characters, no conversion or decoding is done on the
1937 input. It must also start with an ASCII letter.
1938
1939 The scheme describes the type (or protocol) of the URL. It's
1940 represented by one or more ASCII characters at the start the URL.
1941
1942 A scheme is strictly \l {http://www.ietf.org/rfc/rfc3986.txt} {RFC 3986}-compliant:
1943 \tt {scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )}
1944
1945 The following example shows a URL where the scheme is "ftp":
1946
1947 \image qurl-authority2.png
1948
1949 To set the scheme, the following call is used:
1950 \snippet code/src_corelib_io_qurl.cpp 11
1951
1952 The scheme can also be empty, in which case the URL is interpreted
1953 as relative.
1954
1955 \sa scheme(), isRelative()
1956*/
1957void QUrl::setScheme(const QString &scheme)
1958{
1959 detach();
1960 d->clearError();
1961 if (scheme.isEmpty()) {
1962 // schemes are not allowed to be empty
1963 d->sectionIsPresent &= ~QUrlPrivate::Scheme;
1964 d->flags &= ~QUrlPrivate::IsLocalFile;
1965 d->scheme.clear();
1966 } else {
1967 d->setScheme(scheme, scheme.length(), /* do set error */ true);
1968 }
1969}
1970
1971/*!
1972 Returns the scheme of the URL. If an empty string is returned,
1973 this means the scheme is undefined and the URL is then relative.
1974
1975 The scheme can only contain US-ASCII letters or digits, which means it
1976 cannot contain any character that would otherwise require encoding.
1977 Additionally, schemes are always returned in lowercase form.
1978
1979 \sa setScheme(), isRelative()
1980*/
1981QString QUrl::scheme() const
1982{
1983 if (!d) return QString();
1984
1985 return d->scheme;
1986}
1987
1988/*!
1989 Sets the authority of the URL to \a authority.
1990
1991 The authority of a URL is the combination of user info, a host
1992 name and a port. All of these elements are optional; an empty
1993 authority is therefore valid.
1994
1995 The user info and host are separated by a '@', and the host and
1996 port are separated by a ':'. If the user info is empty, the '@'
1997 must be omitted; although a stray ':' is permitted if the port is
1998 empty.
1999
2000 The following example shows a valid authority string:
2001
2002 \image qurl-authority.png
2003
2004 The \a authority data is interpreted according to \a mode: in StrictMode,
2005 any '%' characters must be followed by exactly two hexadecimal characters
2006 and some characters (including space) are not allowed in undecoded form. In
2007 TolerantMode (the default), all characters are accepted in undecoded form
2008 and the tolerant parser will correct stray '%' not followed by two hex
2009 characters.
2010
2011 This function does not allow \a mode to be QUrl::DecodedMode. To set fully
2012 decoded data, call setUserName(), setPassword(), setHost() and setPort()
2013 individually.
2014
2015 \sa setUserInfo(), setHost(), setPort()
2016*/
2017void QUrl::setAuthority(const QString &authority, ParsingMode mode)
2018{
2019 detach();
2020 d->clearError();
2021
2022 if (mode == DecodedMode) {
2023 qWarning("QUrl::setAuthority(): QUrl::DecodedMode is not permitted in this function");
2024 return;
2025 }
2026
2027 d->setAuthority(authority, 0, authority.length(), mode);
2028 if (authority.isNull()) {
2029 // QUrlPrivate::setAuthority cleared almost everything
2030 // but it leaves the Host bit set
2031 d->sectionIsPresent &= ~QUrlPrivate::Authority;
2032 }
2033}
2034
2035/*!
2036 Returns the authority of the URL if it is defined; otherwise
2037 an empty string is returned.
2038
2039 This function returns an unambiguous value, which may contain that
2040 characters still percent-encoded, plus some control sequences not
2041 representable in decoded form in QString.
2042
2043 The \a options argument controls how to format the user info component. The
2044 value of QUrl::FullyDecoded is not permitted in this function. If you need
2045 to obtain fully decoded data, call userName(), password(), host() and
2046 port() individually.
2047
2048 \sa setAuthority(), userInfo(), userName(), password(), host(), port()
2049*/
2050QString QUrl::authority(ComponentFormattingOptions options) const
2051{
2052 QString result;
2053 if (!d)
2054 return result;
2055
2056 if (options == QUrl::FullyDecoded) {
2057 qWarning("QUrl::authority(): QUrl::FullyDecoded is not permitted in this function");
2058 return result;
2059 }
2060
2061 d->appendAuthority(result, options, QUrlPrivate::Authority);
2062 return result;
2063}
2064
2065/*!
2066 Sets the user info of the URL to \a userInfo. The user info is an
2067 optional part of the authority of the URL, as described in
2068 setAuthority().
2069
2070 The user info consists of a user name and optionally a password,
2071 separated by a ':'. If the password is empty, the colon must be
2072 omitted. The following example shows a valid user info string:
2073
2074 \image qurl-authority3.png
2075
2076 The \a userInfo data is interpreted according to \a mode: in StrictMode,
2077 any '%' characters must be followed by exactly two hexadecimal characters
2078 and some characters (including space) are not allowed in undecoded form. In
2079 TolerantMode (the default), all characters are accepted in undecoded form
2080 and the tolerant parser will correct stray '%' not followed by two hex
2081 characters.
2082
2083 This function does not allow \a mode to be QUrl::DecodedMode. To set fully
2084 decoded data, call setUserName() and setPassword() individually.
2085
2086 \sa userInfo(), setUserName(), setPassword(), setAuthority()
2087*/
2088void QUrl::setUserInfo(const QString &userInfo, ParsingMode mode)
2089{
2090 detach();
2091 d->clearError();
2092 QString trimmed = userInfo.trimmed();
2093 if (mode == DecodedMode) {
2094 qWarning("QUrl::setUserInfo(): QUrl::DecodedMode is not permitted in this function");
2095 return;
2096 }
2097
2098 d->setUserInfo(trimmed, 0, trimmed.length());
2099 if (userInfo.isNull()) {
2100 // QUrlPrivate::setUserInfo cleared almost everything
2101 // but it leaves the UserName bit set
2102 d->sectionIsPresent &= ~QUrlPrivate::UserInfo;
2103 } else if (mode == StrictMode && !d->validateComponent(QUrlPrivate::UserInfo, userInfo)) {
2104 d->sectionIsPresent &= ~QUrlPrivate::UserInfo;
2105 d->userName.clear();
2106 d->password.clear();
2107 }
2108}
2109
2110/*!
2111 Returns the user info of the URL, or an empty string if the user
2112 info is undefined.
2113
2114 This function returns an unambiguous value, which may contain that
2115 characters still percent-encoded, plus some control sequences not
2116 representable in decoded form in QString.
2117
2118 The \a options argument controls how to format the user info component. The
2119 value of QUrl::FullyDecoded is not permitted in this function. If you need
2120 to obtain fully decoded data, call userName() and password() individually.
2121
2122 \sa setUserInfo(), userName(), password(), authority()
2123*/
2124QString QUrl::userInfo(ComponentFormattingOptions options) const
2125{
2126 QString result;
2127 if (!d)
2128 return result;
2129
2130 if (options == QUrl::FullyDecoded) {
2131 qWarning("QUrl::userInfo(): QUrl::FullyDecoded is not permitted in this function");
2132 return result;
2133 }
2134
2135 d->appendUserInfo(result, options, QUrlPrivate::UserInfo);
2136 return result;
2137}
2138
2139/*!
2140 Sets the URL's user name to \a userName. The \a userName is part
2141 of the user info element in the authority of the URL, as described
2142 in setUserInfo().
2143
2144 The \a userName data is interpreted according to \a mode: in StrictMode,
2145 any '%' characters must be followed by exactly two hexadecimal characters
2146 and some characters (including space) are not allowed in undecoded form. In
2147 TolerantMode (the default), all characters are accepted in undecoded form
2148 and the tolerant parser will correct stray '%' not followed by two hex
2149 characters. In DecodedMode, '%' stand for themselves and encoded characters
2150 are not possible.
2151
2152 QUrl::DecodedMode should be used when setting the user name from a data
2153 source which is not a URL, such as a password dialog shown to the user or
2154 with a user name obtained by calling userName() with the QUrl::FullyDecoded
2155 formatting option.
2156
2157 \sa userName(), setUserInfo()
2158*/
2159void QUrl::setUserName(const QString &userName, ParsingMode mode)
2160{
2161 detach();
2162 d->clearError();
2163
2164 QString data = userName;
2165 if (mode == DecodedMode) {
2166 parseDecodedComponent(data);
2167 mode = TolerantMode;
2168 }
2169
2170 d->setUserName(data, 0, data.length());
2171 if (userName.isNull())
2172 d->sectionIsPresent &= ~QUrlPrivate::UserName;
2173 else if (mode == StrictMode && !d->validateComponent(QUrlPrivate::UserName, userName))
2174 d->userName.clear();
2175}
2176
2177/*!
2178 Returns the user name of the URL if it is defined; otherwise
2179 an empty string is returned.
2180
2181 The \a options argument controls how to format the user name component. All
2182 values produce an unambiguous result. With QUrl::FullyDecoded, all
2183 percent-encoded sequences are decoded; otherwise, the returned value may
2184 contain some percent-encoded sequences for some control sequences not
2185 representable in decoded form in QString.
2186
2187 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2188 sequences are present. It is recommended to use that value when the result
2189 will be used in a non-URL context, such as setting in QAuthenticator or
2190 negotiating a login.
2191
2192 \sa setUserName(), userInfo()
2193*/
2194QString QUrl::userName(ComponentFormattingOptions options) const
2195{
2196 QString result;
2197 if (d)
2198 d->appendUserName(result, options);
2199 return result;
2200}
2201
2202/*!
2203 Sets the URL's password to \a password. The \a password is part of
2204 the user info element in the authority of the URL, as described in
2205 setUserInfo().
2206
2207 The \a password data is interpreted according to \a mode: in StrictMode,
2208 any '%' characters must be followed by exactly two hexadecimal characters
2209 and some characters (including space) are not allowed in undecoded form. In
2210 TolerantMode, all characters are accepted in undecoded form and the
2211 tolerant parser will correct stray '%' not followed by two hex characters.
2212 In DecodedMode, '%' stand for themselves and encoded characters are not
2213 possible.
2214
2215 QUrl::DecodedMode should be used when setting the password from a data
2216 source which is not a URL, such as a password dialog shown to the user or
2217 with a password obtained by calling password() with the QUrl::FullyDecoded
2218 formatting option.
2219
2220 \sa password(), setUserInfo()
2221*/
2222void QUrl::setPassword(const QString &password, ParsingMode mode)
2223{
2224 detach();
2225 d->clearError();
2226
2227 QString data = password;
2228 if (mode == DecodedMode) {
2229 parseDecodedComponent(data);
2230 mode = TolerantMode;
2231 }
2232
2233 d->setPassword(data, 0, data.length());
2234 if (password.isNull())
2235 d->sectionIsPresent &= ~QUrlPrivate::Password;
2236 else if (mode == StrictMode && !d->validateComponent(QUrlPrivate::Password, password))
2237 d->password.clear();
2238}
2239
2240/*!
2241 Returns the password of the URL if it is defined; otherwise
2242 an empty string is returned.
2243
2244 The \a options argument controls how to format the user name component. All
2245 values produce an unambiguous result. With QUrl::FullyDecoded, all
2246 percent-encoded sequences are decoded; otherwise, the returned value may
2247 contain some percent-encoded sequences for some control sequences not
2248 representable in decoded form in QString.
2249
2250 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2251 sequences are present. It is recommended to use that value when the result
2252 will be used in a non-URL context, such as setting in QAuthenticator or
2253 negotiating a login.
2254
2255 \sa setPassword()
2256*/
2257QString QUrl::password(ComponentFormattingOptions options) const
2258{
2259 QString result;
2260 if (d)
2261 d->appendPassword(result, options);
2262 return result;
2263}
2264
2265/*!
2266 Sets the host of the URL to \a host. The host is part of the
2267 authority.
2268
2269 The \a host data is interpreted according to \a mode: in StrictMode,
2270 any '%' characters must be followed by exactly two hexadecimal characters
2271 and some characters (including space) are not allowed in undecoded form. In
2272 TolerantMode, all characters are accepted in undecoded form and the
2273 tolerant parser will correct stray '%' not followed by two hex characters.
2274 In DecodedMode, '%' stand for themselves and encoded characters are not
2275 possible.
2276
2277 Note that, in all cases, the result of the parsing must be a valid hostname
2278 according to STD 3 rules, as modified by the Internationalized Resource
2279 Identifiers specification (RFC 3987). Invalid hostnames are not permitted
2280 and will cause isValid() to become false.
2281
2282 \sa host(), setAuthority()
2283*/
2284void QUrl::setHost(const QString &host, ParsingMode mode)
2285{
2286 detach();
2287 d->clearError();
2288
2289 QString data = host;
2290 if (mode == DecodedMode) {
2291 parseDecodedComponent(data);
2292 mode = TolerantMode;
2293 }
2294
2295 if (d->setHost(data, 0, data.length(), mode)) {
2296 if (host.isNull())
2297 d->sectionIsPresent &= ~QUrlPrivate::Host;
2298 } else if (!data.startsWith(QLatin1Char('['))) {
2299 // setHost failed, it might be IPv6 or IPvFuture in need of bracketing
2300 Q_ASSERT(d->error);
2301
2302 data.prepend(QLatin1Char('['));
2303 data.append(QLatin1Char(']'));
2304 if (!d->setHost(data, 0, data.length(), mode)) {
2305 // failed again
2306 if (data.contains(QLatin1Char(':'))) {
2307 // source data contains ':', so it's an IPv6 error
2308 d->error->code = QUrlPrivate::InvalidIPv6AddressError;
2309 }
2310 } else {
2311 // succeeded
2312 d->clearError();
2313 }
2314 }
2315}
2316
2317/*!
2318 Returns the host of the URL if it is defined; otherwise
2319 an empty string is returned.
2320
2321 The \a options argument controls how the hostname will be formatted. The
2322 QUrl::EncodeUnicode option will cause this function to return the hostname
2323 in the ASCII-Compatible Encoding (ACE) form, which is suitable for use in
2324 channels that are not 8-bit clean or that require the legacy hostname (such
2325 as DNS requests or in HTTP request headers). If that flag is not present,
2326 this function returns the International Domain Name (IDN) in Unicode form,
2327 according to the list of permissible top-level domains (see
2328 idnWhitelist()).
2329
2330 All other flags are ignored. Host names cannot contain control or percent
2331 characters, so the returned value can be considered fully decoded.
2332
2333 \sa setHost(), idnWhitelist(), setIdnWhitelist(), authority()
2334*/
2335QString QUrl::host(ComponentFormattingOptions options) const
2336{
2337 QString result;
2338 if (d) {
2339 d->appendHost(result, options);
2340 if (result.startsWith(QLatin1Char('[')))
2341 result = result.mid(1, result.length() - 2);
2342 }
2343 return result;
2344}
2345
2346/*!
2347 Sets the port of the URL to \a port. The port is part of the
2348 authority of the URL, as described in setAuthority().
2349
2350 \a port must be between 0 and 65535 inclusive. Setting the
2351 port to -1 indicates that the port is unspecified.
2352*/
2353void QUrl::setPort(int port)
2354{
2355 detach();
2356 d->clearError();
2357
2358 if (port < -1 || port > 65535) {
2359 d->setError(QUrlPrivate::InvalidPortError, QString::number(port), 0);
2360 port = -1;
2361 }
2362
2363 d->port = port;
2364 if (port != -1)
2365 d->sectionIsPresent |= QUrlPrivate::Host;
2366}
2367
2368/*!
2369 \since 4.1
2370
2371 Returns the port of the URL, or \a defaultPort if the port is
2372 unspecified.
2373
2374 Example:
2375
2376 \snippet code/src_corelib_io_qurl.cpp 3
2377*/
2378int QUrl::port(int defaultPort) const
2379{
2380 if (!d) return defaultPort;
2381 return d->port == -1 ? defaultPort : d->port;
2382}
2383
2384/*!
2385 Sets the path of the URL to \a path. The path is the part of the
2386 URL that comes after the authority but before the query string.
2387
2388 \image qurl-ftppath.png
2389
2390 For non-hierarchical schemes, the path will be everything
2391 following the scheme declaration, as in the following example:
2392
2393 \image qurl-mailtopath.png
2394
2395 The \a path data is interpreted according to \a mode: in StrictMode,
2396 any '%' characters must be followed by exactly two hexadecimal characters
2397 and some characters (including space) are not allowed in undecoded form. In
2398 TolerantMode, all characters are accepted in undecoded form and the
2399 tolerant parser will correct stray '%' not followed by two hex characters.
2400 In DecodedMode, '%' stand for themselves and encoded characters are not
2401 possible.
2402
2403 QUrl::DecodedMode should be used when setting the path from a data source
2404 which is not a URL, such as a dialog shown to the user or with a path
2405 obtained by calling path() with the QUrl::FullyDecoded formatting option.
2406
2407 \sa path()
2408*/
2409void QUrl::setPath(const QString &path, ParsingMode mode)
2410{
2411 detach();
2412 d->clearError();
2413
2414 QString data = path;
2415 if (mode == DecodedMode) {
2416 parseDecodedComponent(data);
2417 mode = TolerantMode;
2418 }
2419
2420 d->setPath(data, 0, data.length());
2421
2422 // optimized out, since there is no path delimiter
2423// if (path.isNull())
2424// d->sectionIsPresent &= ~QUrlPrivate::Path;
2425// else
2426 if (mode == StrictMode && !d->validateComponent(QUrlPrivate::Path, path))
2427 d->path.clear();
2428}
2429
2430/*!
2431 Returns the path of the URL.
2432
2433 \snippet code/src_corelib_io_qurl.cpp 12
2434
2435 The \a options argument controls how to format the path component. All
2436 values produce an unambiguous result. With QUrl::FullyDecoded, all
2437 percent-encoded sequences are decoded; otherwise, the returned value may
2438 contain some percent-encoded sequences for some control sequences not
2439 representable in decoded form in QString.
2440
2441 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2442 sequences are present. It is recommended to use that value when the result
2443 will be used in a non-URL context, such as sending to an FTP server.
2444
2445 An example of data loss is when you have non-Unicode percent-encoded sequences
2446 and use FullyDecoded (the default):
2447
2448 \snippet code/src_corelib_io_qurl.cpp 13
2449
2450 In this example, there will be some level of data loss because the \c %FF cannot
2451 be converted.
2452
2453 Data loss can also occur when the path contains sub-delimiters (such as \c +):
2454
2455 \snippet code/src_corelib_io_qurl.cpp 14
2456
2457 Other decoding examples:
2458
2459 \snippet code/src_corelib_io_qurl.cpp 15
2460
2461 \sa setPath()
2462*/
2463QString QUrl::path(ComponentFormattingOptions options) const
2464{
2465 QString result;
2466 if (d)
2467 d->appendPath(result, options, QUrlPrivate::Path);
2468 return result;
2469}
2470
2471/*!
2472 \since 5.2
2473
2474 Returns the name of the file, excluding the directory path.
2475
2476 Note that, if this QUrl object is given a path ending in a slash, the name of the file is considered empty.
2477
2478 If the path doesn't contain any slash, it is fully returned as the fileName.
2479
2480 Example:
2481
2482 \snippet code/src_corelib_io_qurl.cpp 7
2483
2484 The \a options argument controls how to format the file name component. All
2485 values produce an unambiguous result. With QUrl::FullyDecoded, all
2486 percent-encoded sequences are decoded; otherwise, the returned value may
2487 contain some percent-encoded sequences for some control sequences not
2488 representable in decoded form in QString.
2489
2490 \sa path()
2491*/
2492QString QUrl::fileName(ComponentFormattingOptions options) const
2493{
2494 const QString ourPath = path(options);
2495 const int slash = ourPath.lastIndexOf(QLatin1Char('/'));
2496 if (slash == -1)
2497 return ourPath;
2498 return ourPath.mid(slash + 1);
2499}
2500
2501/*!
2502 \since 4.2
2503
2504 Returns \c true if this URL contains a Query (i.e., if ? was seen on it).
2505
2506 \sa setQuery(), query(), hasFragment()
2507*/
2508bool QUrl::hasQuery() const
2509{
2510 if (!d) return false;
2511 return d->hasQuery();
2512}
2513
2514/*!
2515 Sets the query string of the URL to \a query.
2516
2517 This function is useful if you need to pass a query string that
2518 does not fit into the key-value pattern, or that uses a different
2519 scheme for encoding special characters than what is suggested by
2520 QUrl.
2521
2522 Passing a value of QString() to \a query (a null QString) unsets
2523 the query completely. However, passing a value of QString("")
2524 will set the query to an empty value, as if the original URL
2525 had a lone "?".
2526
2527 The \a query data is interpreted according to \a mode: in StrictMode,
2528 any '%' characters must be followed by exactly two hexadecimal characters
2529 and some characters (including space) are not allowed in undecoded form. In
2530 TolerantMode, all characters are accepted in undecoded form and the
2531 tolerant parser will correct stray '%' not followed by two hex characters.
2532 In DecodedMode, '%' stand for themselves and encoded characters are not
2533 possible.
2534
2535 Query strings often contain percent-encoded sequences, so use of
2536 DecodedMode is discouraged. One special sequence to be aware of is that of
2537 the plus character ('+'). QUrl does not convert spaces to plus characters,
2538 even though HTML forms posted by web browsers do. In order to represent an
2539 actual plus character in a query, the sequence "%2B" is usually used. This
2540 function will leave "%2B" sequences untouched in TolerantMode or
2541 StrictMode.
2542
2543 \sa query(), hasQuery()
2544*/
2545void QUrl::setQuery(const QString &query, ParsingMode mode)
2546{
2547 detach();
2548 d->clearError();
2549
2550 QString data = query;
2551 if (mode == DecodedMode) {
2552 parseDecodedComponent(data);
2553 mode = TolerantMode;
2554 }
2555
2556 d->setQuery(data, 0, data.length());
2557 if (query.isNull())
2558 d->sectionIsPresent &= ~QUrlPrivate::Query;
2559 else if (mode == StrictMode && !d->validateComponent(QUrlPrivate::Query, query))
2560 d->query.clear();
2561}
2562
2563/*!
2564 \overload
2565 \since 5.0
2566 Sets the query string of the URL to \a query.
2567
2568 This function reconstructs the query string from the QUrlQuery object and
2569 sets on this QUrl object. This function does not have parsing parameters
2570 because the QUrlQuery contains data that is already parsed.
2571
2572 \sa query(), hasQuery()
2573*/
2574void QUrl::setQuery(const QUrlQuery &query)
2575{
2576 detach();
2577 d->clearError();
2578
2579 // we know the data is in the right format
2580 d->query = query.toString();
2581 if (query.isEmpty())
2582 d->sectionIsPresent &= ~QUrlPrivate::Query;
2583 else
2584 d->sectionIsPresent |= QUrlPrivate::Query;
2585}
2586
2587/*!
2588 Returns the query string of the URL if there's a query string, or an empty
2589 result if not. To determine if the parsed URL contained a query string, use
2590 hasQuery().
2591
2592 The \a options argument controls how to format the query component. All
2593 values produce an unambiguous result. With QUrl::FullyDecoded, all
2594 percent-encoded sequences are decoded; otherwise, the returned value may
2595 contain some percent-encoded sequences for some control sequences not
2596 representable in decoded form in QString.
2597
2598 Note that use of QUrl::FullyDecoded in queries is discouraged, as queries
2599 often contain data that is supposed to remain percent-encoded, including
2600 the use of the "%2B" sequence to represent a plus character ('+').
2601
2602 \sa setQuery(), hasQuery()
2603*/
2604QString QUrl::query(ComponentFormattingOptions options) const
2605{
2606 QString result;
2607 if (d) {
2608 d->appendQuery(result, options, QUrlPrivate::Query);
2609 if (d->hasQuery() && result.isNull())
2610 result.detach();
2611 }
2612 return result;
2613}
2614
2615/*!
2616 Sets the fragment of the URL to \a fragment. The fragment is the
2617 last part of the URL, represented by a '#' followed by a string of
2618 characters. It is typically used in HTTP for referring to a
2619 certain link or point on a page:
2620
2621 \image qurl-fragment.png
2622
2623 The fragment is sometimes also referred to as the URL "reference".
2624
2625 Passing an argument of QString() (a null QString) will unset the fragment.
2626 Passing an argument of QString("") (an empty but not null QString) will set the
2627 fragment to an empty string (as if the original URL had a lone "#").
2628
2629 The \a fragment data is interpreted according to \a mode: in StrictMode,
2630 any '%' characters must be followed by exactly two hexadecimal characters
2631 and some characters (including space) are not allowed in undecoded form. In
2632 TolerantMode, all characters are accepted in undecoded form and the
2633 tolerant parser will correct stray '%' not followed by two hex characters.
2634 In DecodedMode, '%' stand for themselves and encoded characters are not
2635 possible.
2636
2637 QUrl::DecodedMode should be used when setting the fragment from a data
2638 source which is not a URL or with a fragment obtained by calling
2639 fragment() with the QUrl::FullyDecoded formatting option.
2640
2641 \sa fragment(), hasFragment()
2642*/
2643void QUrl::setFragment(const QString &fragment, ParsingMode mode)
2644{
2645 detach();
2646 d->clearError();
2647
2648 QString data = fragment;
2649 if (mode == DecodedMode) {
2650 parseDecodedComponent(data);
2651 mode = TolerantMode;
2652 }
2653
2654 d->setFragment(data, 0, data.length());
2655 if (fragment.isNull())
2656 d->sectionIsPresent &= ~QUrlPrivate::Fragment;
2657 else if (mode == StrictMode && !d->validateComponent(QUrlPrivate::Fragment, fragment))
2658 d->fragment.clear();
2659}
2660
2661/*!
2662 Returns the fragment of the URL. To determine if the parsed URL contained a
2663 fragment, use hasFragment().
2664
2665 The \a options argument controls how to format the fragment component. All
2666 values produce an unambiguous result. With QUrl::FullyDecoded, all
2667 percent-encoded sequences are decoded; otherwise, the returned value may
2668 contain some percent-encoded sequences for some control sequences not
2669 representable in decoded form in QString.
2670
2671 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2672 sequences are present. It is recommended to use that value when the result
2673 will be used in a non-URL context.
2674
2675 \sa setFragment(), hasFragment()
2676*/
2677QString QUrl::fragment(ComponentFormattingOptions options) const
2678{
2679 QString result;
2680 if (d) {
2681 d->appendFragment(result, options, QUrlPrivate::Fragment);
2682 if (d->hasFragment() && result.isNull())
2683 result.detach();
2684 }
2685 return result;
2686}
2687
2688/*!
2689 \since 4.2
2690
2691 Returns \c true if this URL contains a fragment (i.e., if # was seen on it).
2692
2693 \sa fragment(), setFragment()
2694*/
2695bool QUrl::hasFragment() const
2696{
2697 if (!d) return false;
2698 return d->hasFragment();
2699}
2700
2701/*!
2702 Returns the result of the merge of this URL with \a relative. This
2703 URL is used as a base to convert \a relative to an absolute URL.
2704
2705 If \a relative is not a relative URL, this function will return \a
2706 relative directly. Otherwise, the paths of the two URLs are
2707 merged, and the new URL returned has the scheme and authority of
2708 the base URL, but with the merged path, as in the following
2709 example:
2710
2711 \snippet code/src_corelib_io_qurl.cpp 5
2712
2713 Calling resolved() with ".." returns a QUrl whose directory is
2714 one level higher than the original. Similarly, calling resolved()
2715 with "../.." removes two levels from the path. If \a relative is
2716 "/", the path becomes "/".
2717
2718 \sa isRelative()
2719*/
2720QUrl QUrl::resolved(const QUrl &relative) const
2721{
2722 if (!d) return relative;
2723 if (!relative.d) return *this;
2724
2725 QUrl t;
2726 if (!relative.d->scheme.isEmpty()) {
2727 t = relative;
2728 t.detach();
2729 } else {
2730 if (relative.d->hasAuthority()) {
2731 t = relative;
2732 t.detach();
2733 } else {
2734 t.d = new QUrlPrivate;
2735
2736 // copy the authority
2737 t.d->userName = d->userName;
2738 t.d->password = d->password;
2739 t.d->host = d->host;
2740 t.d->port = d->port;
2741 t.d->sectionIsPresent = d->sectionIsPresent & QUrlPrivate::Authority;
2742
2743 if (relative.d->path.isEmpty()) {
2744 t.d->path = d->path;
2745 if (relative.d->hasQuery()) {
2746 t.d->query = relative.d->query;
2747 t.d->sectionIsPresent |= QUrlPrivate::Query;
2748 } else if (d->hasQuery()) {
2749 t.d->query = d->query;
2750 t.d->sectionIsPresent |= QUrlPrivate::Query;
2751 }
2752 } else {
2753 t.d->path = relative.d->path.startsWith(QLatin1Char('/'))
2754 ? relative.d->path
2755 : d->mergePaths(relative.d->path);
2756 if (relative.d->hasQuery()) {
2757 t.d->query = relative.d->query;
2758 t.d->sectionIsPresent |= QUrlPrivate::Query;
2759 }
2760 }
2761 }
2762 t.d->scheme = d->scheme;
2763 if (d->hasScheme())
2764 t.d->sectionIsPresent |= QUrlPrivate::Scheme;
2765 else
2766 t.d->sectionIsPresent &= ~QUrlPrivate::Scheme;
2767 t.d->flags |= d->flags & QUrlPrivate::IsLocalFile;
2768 }
2769 t.d->fragment = relative.d->fragment;
2770 if (relative.d->hasFragment())
2771 t.d->sectionIsPresent |= QUrlPrivate::Fragment;
2772 else
2773 t.d->sectionIsPresent &= ~QUrlPrivate::Fragment;
2774
2775 removeDotsFromPath(&t.d->path);
2776
2777#if defined(QURL_DEBUG)
2778 qDebug("QUrl(\"%ls\").resolved(\"%ls\") = \"%ls\"",
2779 qUtf16Printable(url()),
2780 qUtf16Printable(relative.url()),
2781 qUtf16Printable(t.url()));
2782#endif
2783 return t;
2784}
2785
2786/*!
2787 Returns \c true if the URL is relative; otherwise returns \c false. A URL is
2788 relative reference if its scheme is undefined; this function is therefore
2789 equivalent to calling scheme().isEmpty().
2790
2791 Relative references are defined in RFC 3986 section 4.2.
2792
2793 \sa {Relative URLs vs Relative Paths}
2794*/
2795bool QUrl::isRelative() const
2796{
2797 if (!d) return true;
2798 return !d->hasScheme();
2799}
2800
2801/*!
2802 Returns a string representation of the URL. The output can be customized by
2803 passing flags with \a options. The option QUrl::FullyDecoded is not
2804 permitted in this function since it would generate ambiguous data.
2805
2806 The resulting QString can be passed back to a QUrl later on.
2807
2808 Synonym for toString(options).
2809
2810 \sa FormattingOptions, toEncoded(), toString()
2811*/
2812QString QUrl::url(FormattingOptions options) const
2813{
2814 return toString(options);
2815}
2816
2817/*!
2818 Returns a string representation of the URL. The output can be customized by
2819 passing flags with \a options. The option QUrl::FullyDecoded is not
2820 permitted in this function since it would generate ambiguous data.
2821
2822 The default formatting option is \l{QUrl::FormattingOptions}{PrettyDecoded}.
2823
2824 \sa FormattingOptions, url(), setUrl()
2825*/
2826QString QUrl::toString(FormattingOptions options) const
2827{
2828 QString url;
2829 if (!isValid()) {
2830 // also catches isEmpty()
2831 return url;
2832 }
2833 if ((options & QUrl::FullyDecoded) == QUrl::FullyDecoded) {
2834 qWarning("QUrl: QUrl::FullyDecoded is not permitted when reconstructing the full URL");
2835 options &= ~QUrl::FullyDecoded;
2836 //options |= QUrl::PrettyDecoded; // no-op, value is 0
2837 }
2838
2839 // return just the path if:
2840 // - QUrl::PreferLocalFile is passed
2841 // - QUrl::RemovePath isn't passed (rather stupid if the user did...)
2842 // - there's no query or fragment to return
2843 // that is, either they aren't present, or we're removing them
2844 // - it's a local file
2845 if (options.testFlag(QUrl::PreferLocalFile) && !options.testFlag(QUrl::RemovePath)
2846 && (!d->hasQuery() || options.testFlag(QUrl::RemoveQuery))
2847 && (!d->hasFragment() || options.testFlag(QUrl::RemoveFragment))
2848 && isLocalFile()) {
2849 url = d->toLocalFile(options | QUrl::FullyDecoded);
2850 return url;
2851 }
2852
2853 // for the full URL, we consider that the reserved characters are prettier if encoded
2854 if (options & DecodeReserved)
2855 options &= ~EncodeReserved;
2856 else
2857 options |= EncodeReserved;
2858
2859 if (!(options & QUrl::RemoveScheme) && d->hasScheme())
2860 url += d->scheme + QLatin1Char(':');
2861
2862 bool pathIsAbsolute = d->path.startsWith(QLatin1Char('/'));
2863 if (!((options & QUrl::RemoveAuthority) == QUrl::RemoveAuthority) && d->hasAuthority()) {
2864 url += QLatin1String("//");
2865 d->appendAuthority(url, options, QUrlPrivate::FullUrl);
2866 } else if (isLocalFile() && pathIsAbsolute) {
2867 // Comply with the XDG file URI spec, which requires triple slashes.
2868 url += QLatin1String("//");
2869 }
2870
2871 if (!(options & QUrl::RemovePath))
2872 d->appendPath(url, options, QUrlPrivate::FullUrl);
2873
2874 if (!(options & QUrl::RemoveQuery) && d->hasQuery()) {
2875 url += QLatin1Char('?');
2876 d->appendQuery(url, options, QUrlPrivate::FullUrl);
2877 }
2878 if (!(options & QUrl::RemoveFragment) && d->hasFragment()) {
2879 url += QLatin1Char('#');
2880 d->appendFragment(url, options, QUrlPrivate::FullUrl);
2881 }
2882
2883 return url;
2884}
2885
2886/*!
2887 \since 5.0
2888
2889 Returns a human-displayable string representation of the URL.
2890 The output can be customized by passing flags with \a options.
2891 The option RemovePassword is always enabled, since passwords
2892 should never be shown back to users.
2893
2894 With the default options, the resulting QString can be passed back
2895 to a QUrl later on, but any password that was present initially will
2896 be lost.
2897
2898 \sa FormattingOptions, toEncoded(), toString()
2899*/
2900
2901QString QUrl::toDisplayString(FormattingOptions options) const
2902{
2903 return toString(options | RemovePassword);
2904}
2905
2906/*!
2907 \since 5.2
2908
2909 Returns an adjusted version of the URL.
2910 The output can be customized by passing flags with \a options.
2911
2912 The encoding options from QUrl::ComponentFormattingOption don't make
2913 much sense for this method, nor does QUrl::PreferLocalFile.
2914
2915 This is always equivalent to QUrl(url.toString(options)).
2916
2917 \sa FormattingOptions, toEncoded(), toString()
2918*/
2919QUrl QUrl::adjusted(QUrl::FormattingOptions options) const
2920{
2921 if (!isValid()) {
2922 // also catches isEmpty()
2923 return QUrl();
2924 }
2925 QUrl that = *this;
2926 if (options & RemoveScheme)
2927 that.setScheme(QString());
2928 if ((options & RemoveAuthority) == RemoveAuthority) {
2929 that.setAuthority(QString());
2930 } else {
2931 if ((options & RemoveUserInfo) == RemoveUserInfo)
2932 that.setUserInfo(QString());
2933 else if (options & RemovePassword)
2934 that.setPassword(QString());
2935 if (options & RemovePort)
2936 that.setPort(-1);
2937 }
2938 if (options & RemoveQuery)
2939 that.setQuery(QString());
2940 if (options & RemoveFragment)
2941 that.setFragment(QString());
2942 if (options & RemovePath) {
2943 that.setPath(QString());
2944 } else if (options & (StripTrailingSlash | RemoveFilename | NormalizePathSegments)) {
2945 that.detach();
2946 QString path;
2947 d->appendPath(path, options | FullyEncoded, QUrlPrivate::Path);
2948 that.d->setPath(path, 0, path.length());
2949 }
2950 return that;
2951}
2952
2953/*!
2954 Returns the encoded representation of the URL if it's valid;
2955 otherwise an empty QByteArray is returned. The output can be
2956 customized by passing flags with \a options.
2957
2958 The user info, path and fragment are all converted to UTF-8, and
2959 all non-ASCII characters are then percent encoded. The host name
2960 is encoded using Punycode.
2961*/
2962QByteArray QUrl::toEncoded(FormattingOptions options) const
2963{
2964 options &= ~(FullyDecoded | FullyEncoded);
2965 return toString(options | FullyEncoded).toLatin1();
2966}
2967
2968/*!
2969 \fn QUrl QUrl::fromEncoded(const QByteArray &input, ParsingMode parsingMode)
2970
2971 Parses \a input and returns the corresponding QUrl. \a input is
2972 assumed to be in encoded form, containing only ASCII characters.
2973
2974 Parses the URL using \a parsingMode. See setUrl() for more information on
2975 this parameter. QUrl::DecodedMode is not permitted in this context.
2976
2977 \sa toEncoded(), setUrl()
2978*/
2979QUrl QUrl::fromEncoded(const QByteArray &input, ParsingMode mode)
2980{
2981 return QUrl(QString::fromUtf8(input.constData(), input.size()), mode);
2982}
2983
2984/*!
2985 Returns a decoded copy of \a input. \a input is first decoded from
2986 percent encoding, then converted from UTF-8 to unicode.
2987
2988 \note Given invalid input (such as a string containing the sequence "%G5",
2989 which is not a valid hexadecimal number) the output will be invalid as
2990 well. As an example: the sequence "%G5" could be decoded to 'W'.
2991*/
2992QString QUrl::fromPercentEncoding(const QByteArray &input)
2993{
2994 QByteArray ba = QByteArray::fromPercentEncoding(input);
2995 return QString::fromUtf8(ba, ba.size());
2996}
2997
2998/*!
2999 Returns an encoded copy of \a input. \a input is first converted
3000 to UTF-8, and all ASCII-characters that are not in the unreserved group
3001 are percent encoded. To prevent characters from being percent encoded
3002 pass them to \a exclude. To force characters to be percent encoded pass
3003 them to \a include.
3004
3005 Unreserved is defined as:
3006 \tt {ALPHA / DIGIT / "-" / "." / "_" / "~"}
3007
3008 \snippet code/src_corelib_io_qurl.cpp 6
3009*/
3010QByteArray QUrl::toPercentEncoding(const QString &input, const QByteArray &exclude, const QByteArray &include)
3011{
3012 return input.toUtf8().toPercentEncoding(exclude, include);
3013}
3014
3015/*!
3016 \since 4.2
3017
3018 Returns the Unicode form of the given domain name
3019 \a domain, which is encoded in the ASCII Compatible Encoding (ACE).
3020 The result of this function is considered equivalent to \a domain.
3021
3022 If the value in \a domain cannot be encoded, it will be converted
3023 to QString and returned.
3024
3025 The ASCII Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
3026 and RFC 3492. It is part of the Internationalizing Domain Names in
3027 Applications (IDNA) specification, which allows for domain names
3028 (like \c "example.com") to be written using international
3029 characters.
3030*/
3031QString QUrl::fromAce(const QByteArray &domain)
3032{
3033 QVarLengthArray<char16_t> buffer;
3034 buffer.resize(domain.size());
3035 qt_from_latin1(buffer.data(), domain.data(), domain.size());
3036 return qt_ACE_do(QStringView{buffer.data(), buffer.size()},
3037 NormalizeAce, ForbidLeadingDot /*FIXME: make configurable*/);
3038}
3039
3040/*!
3041 \since 4.2
3042
3043 Returns the ASCII Compatible Encoding of the given domain name \a domain.
3044 The result of this function is considered equivalent to \a domain.
3045
3046 The ASCII-Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
3047 and RFC 3492. It is part of the Internationalizing Domain Names in
3048 Applications (IDNA) specification, which allows for domain names
3049 (like \c "example.com") to be written using international
3050 characters.
3051
3052 This function returns an empty QByteArray if \a domain is not a valid
3053 hostname. Note, in particular, that IPv6 literals are not valid domain
3054 names.
3055*/
3056QByteArray QUrl::toAce(const QString &domain)
3057{
3058 return qt_ACE_do(domain, ToAceOnly, ForbidLeadingDot /*FIXME: make configurable*/).toLatin1();
3059}
3060
3061/*!
3062 \internal
3063
3064 Returns \c true if this URL is "less than" the given \a url. This
3065 provides a means of ordering URLs.
3066*/
3067bool QUrl::operator <(const QUrl &url) const
3068{
3069 if (!d || !url.d) {
3070 bool thisIsEmpty = !d || d->isEmpty();
3071 bool thatIsEmpty = !url.d || url.d->isEmpty();
3072
3073 // sort an empty URL first
3074 return thisIsEmpty && !thatIsEmpty;
3075 }
3076
3077 int cmp;
3078 cmp = d->scheme.compare(url.d->scheme);
3079 if (cmp != 0)
3080 return cmp < 0;
3081
3082 cmp = d->userName.compare(url.d->userName);
3083 if (cmp != 0)
3084 return cmp < 0;
3085
3086 cmp = d->password.compare(url.d->password);
3087 if (cmp != 0)
3088 return cmp < 0;
3089
3090 cmp = d->host.compare(url.d->host);
3091 if (cmp != 0)
3092 return cmp < 0;
3093
3094 if (d->port != url.d->port)
3095 return d->port < url.d->port;
3096
3097 cmp = d->path.compare(url.d->path);
3098 if (cmp != 0)
3099 return cmp < 0;
3100
3101 if (d->hasQuery() != url.d->hasQuery())
3102 return url.d->hasQuery();
3103
3104 cmp = d->query.compare(url.d->query);
3105 if (cmp != 0)
3106 return cmp < 0;
3107
3108 if (d->hasFragment() != url.d->hasFragment())
3109 return url.d->hasFragment();
3110
3111 cmp = d->fragment.compare(url.d->fragment);
3112 return cmp < 0;
3113}
3114
3115/*!
3116 Returns \c true if this URL and the given \a url are equal;
3117 otherwise returns \c false.
3118*/
3119bool QUrl::operator ==(const QUrl &url) const
3120{
3121 if (!d && !url.d)
3122 return true;
3123 if (!d)
3124 return url.d->isEmpty();
3125 if (!url.d)
3126 return d->isEmpty();
3127
3128 // First, compare which sections are present, since it speeds up the
3129 // processing considerably. We just have to ignore the host-is-present flag
3130 // for local files (the "file" protocol), due to the requirements of the
3131 // XDG file URI specification.
3132 int mask = QUrlPrivate::FullUrl;
3133 if (isLocalFile())
3134 mask &= ~QUrlPrivate::Host;
3135 return (d->sectionIsPresent & mask) == (url.d->sectionIsPresent & mask) &&
3136 d->scheme == url.d->scheme &&
3137 d->userName == url.d->userName &&
3138 d->password == url.d->password &&
3139 d->host == url.d->host &&
3140 d->port == url.d->port &&
3141 d->path == url.d->path &&
3142 d->query == url.d->query &&
3143 d->fragment == url.d->fragment;
3144}
3145
3146/*!
3147 \since 5.2
3148
3149 Returns \c true if this URL and the given \a url are equal after
3150 applying \a options to both; otherwise returns \c false.
3151
3152 This is equivalent to calling adjusted(options) on both URLs
3153 and comparing the resulting urls, but faster.
3154
3155*/
3156bool QUrl::matches(const QUrl &url, FormattingOptions options) const
3157{
3158 if (!d && !url.d)
3159 return true;
3160 if (!d)
3161 return url.d->isEmpty();
3162 if (!url.d)
3163 return d->isEmpty();
3164
3165 // First, compare which sections are present, since it speeds up the
3166 // processing considerably. We just have to ignore the host-is-present flag
3167 // for local files (the "file" protocol), due to the requirements of the
3168 // XDG file URI specification.
3169 int mask = QUrlPrivate::FullUrl;
3170 if (isLocalFile())
3171 mask &= ~QUrlPrivate::Host;
3172
3173 if (options.testFlag(QUrl::RemoveScheme))
3174 mask &= ~QUrlPrivate::Scheme;
3175 else if (d->scheme != url.d->scheme)
3176 return false;
3177
3178 if (options.testFlag(QUrl::RemovePassword))
3179 mask &= ~QUrlPrivate::Password;
3180 else if (d->password != url.d->password)
3181 return false;
3182
3183 if (options.testFlag(QUrl::RemoveUserInfo))
3184 mask &= ~QUrlPrivate::UserName;
3185 else if (d->userName != url.d->userName)
3186 return false;
3187
3188 if (options.testFlag(QUrl::RemovePort))
3189 mask &= ~QUrlPrivate::Port;
3190 else if (d->port != url.d->port)
3191 return false;
3192
3193 if (options.testFlag(QUrl::RemoveAuthority))
3194 mask &= ~QUrlPrivate::Host;
3195 else if (d->host != url.d->host)
3196 return false;
3197
3198 if (options.testFlag(QUrl::RemoveQuery))
3199 mask &= ~QUrlPrivate::Query;
3200 else if (d->query != url.d->query)
3201 return false;
3202
3203 if (options.testFlag(QUrl::RemoveFragment))
3204 mask &= ~QUrlPrivate::Fragment;
3205 else if (d->fragment != url.d->fragment)
3206 return false;
3207
3208 if ((d->sectionIsPresent & mask) != (url.d->sectionIsPresent & mask))
3209 return false;
3210
3211 if (options.testFlag(QUrl::RemovePath))
3212 return true;
3213
3214 // Compare paths, after applying path-related options
3215 QString path1;
3216 d->appendPath(path1, options, QUrlPrivate::Path);
3217 QString path2;
3218 url.d->appendPath(path2, options, QUrlPrivate::Path);
3219 return path1 == path2;
3220}
3221
3222/*!
3223 Returns \c true if this URL and the given \a url are not equal;
3224 otherwise returns \c false.
3225*/
3226bool QUrl::operator !=(const QUrl &url) const
3227{
3228 return !(*this == url);
3229}
3230
3231/*!
3232 Assigns the specified \a url to this object.
3233*/
3234QUrl &QUrl::operator =(const QUrl &url)
3235{
3236 if (!d) {
3237 if (url.d) {
3238 url.d->ref.ref();
3239 d = url.d;
3240 }
3241 } else {
3242 if (url.d)
3243 qAtomicAssign(d, url.d);
3244 else
3245 clear();
3246 }
3247 return *this;
3248}
3249
3250/*!
3251 Assigns the specified \a url to this object.
3252*/
3253QUrl &QUrl::operator =(const QString &url)
3254{
3255 if (url.isEmpty()) {
3256 clear();
3257 } else {
3258 detach();
3259 d->parse(url, TolerantMode);
3260 }
3261 return *this;
3262}
3263
3264/*!
3265 \fn void QUrl::swap(QUrl &other)
3266 \since 4.8
3267
3268 Swaps URL \a other with this URL. This operation is very
3269 fast and never fails.
3270*/
3271
3272/*!
3273 \internal
3274
3275 Forces a detach.
3276*/
3277void QUrl::detach()
3278{
3279 if (!d)
3280 d = new QUrlPrivate;
3281 else
3282 qAtomicDetach(d);
3283}
3284
3285/*!
3286 \internal
3287*/
3288bool QUrl::isDetached() const
3289{
3290 return !d || d->ref.loadRelaxed() == 1;
3291}
3292
3293
3294/*!
3295 Returns a QUrl representation of \a localFile, interpreted as a local
3296 file. This function accepts paths separated by slashes as well as the
3297 native separator for this platform.
3298
3299 This function also accepts paths with a doubled leading slash (or
3300 backslash) to indicate a remote file, as in
3301 "//servername/path/to/file.txt". Note that only certain platforms can
3302 actually open this file using QFile::open().
3303
3304 An empty \a localFile leads to an empty URL (since Qt 5.4).
3305
3306 \snippet code/src_corelib_io_qurl.cpp 16
3307
3308 In the first line in snippet above, a file URL is constructed from a
3309 local, relative path. A file URL with a relative path only makes sense
3310 if there is a base URL to resolve it against. For example:
3311
3312 \snippet code/src_corelib_io_qurl.cpp 17
3313
3314 To resolve such a URL, it's necessary to remove the scheme beforehand:
3315
3316 \snippet code/src_corelib_io_qurl.cpp 18
3317
3318 For this reason, it is better to use a relative URL (that is, no scheme)
3319 for relative file paths:
3320
3321 \snippet code/src_corelib_io_qurl.cpp 19
3322
3323 \sa toLocalFile(), isLocalFile(), QDir::toNativeSeparators()
3324*/
3325QUrl QUrl::fromLocalFile(const QString &localFile)
3326{
3327 QUrl url;
3328 if (localFile.isEmpty())
3329 return url;
3330 QString scheme = fileScheme();
3331 QString deslashified = QDir::fromNativeSeparators(localFile);
3332
3333 // magic for drives on windows
3334 if (deslashified.length() > 1 && deslashified.at(1) == QLatin1Char(':') && deslashified.at(0) != QLatin1Char('/')) {
3335 deslashified.prepend(QLatin1Char('/'));
3336 } else if (deslashified.startsWith(QLatin1String("//"))) {
3337 // magic for shared drive on windows
3338 int indexOfPath = deslashified.indexOf(QLatin1Char('/'), 2);
3339 QStringView hostSpec = QStringView{deslashified}.mid(2, indexOfPath - 2);
3340 // Check for Windows-specific WebDAV specification: "//host@SSL/path".
3341 if (hostSpec.endsWith(webDavSslTag(), Qt::CaseInsensitive)) {
3342 hostSpec.truncate(hostSpec.size() - 4);
3343 scheme = webDavScheme();
3344 }
3345
3346 // hosts can't be IPv6 addresses without [], so we can use QUrlPrivate::setHost
3347 url.detach();
3348 if (!url.d->setHost(hostSpec.toString(), 0, hostSpec.size(), StrictMode)) {
3349 if (url.d->error->code != QUrlPrivate::InvalidRegNameError)
3350 return url;
3351
3352 // Path hostname is not a valid URL host, so set it entirely in the path
3353 // (by leaving deslashified unchanged)
3354 } else if (indexOfPath > 2) {
3355 deslashified = deslashified.right(deslashified.length() - indexOfPath);
3356 } else {
3357 deslashified.clear();
3358 }
3359 }
3360
3361 url.setScheme(scheme);
3362 url.setPath(deslashified, DecodedMode);
3363 return url;
3364}
3365
3366/*!
3367 Returns the path of this URL formatted as a local file path. The path
3368 returned will use forward slashes, even if it was originally created
3369 from one with backslashes.
3370
3371 If this URL contains a non-empty hostname, it will be encoded in the
3372 returned value in the form found on SMB networks (for example,
3373 "//servername/path/to/file.txt").
3374
3375 \snippet code/src_corelib_io_qurl.cpp 20
3376
3377 Note: if the path component of this URL contains a non-UTF-8 binary
3378 sequence (such as %80), the behaviour of this function is undefined.
3379
3380 \sa fromLocalFile(), isLocalFile()
3381*/
3382QString QUrl::toLocalFile() const
3383{
3384 // the call to isLocalFile() also ensures that we're parsed
3385 if (!isLocalFile())
3386 return QString();
3387
3388 return d->toLocalFile(QUrl::FullyDecoded);
3389}
3390
3391/*!
3392 \since 4.8
3393 Returns \c true if this URL is pointing to a local file path. A URL is a
3394 local file path if the scheme is "file".
3395
3396 Note that this function considers URLs with hostnames to be local file
3397 paths, even if the eventual file path cannot be opened with
3398 QFile::open().
3399
3400 \sa fromLocalFile(), toLocalFile()
3401*/
3402bool QUrl::isLocalFile() const
3403{
3404 return d && d->isLocalFile();
3405}
3406
3407/*!
3408 Returns \c true if this URL is a parent of \a childUrl. \a childUrl is a child
3409 of this URL if the two URLs share the same scheme and authority,
3410 and this URL's path is a parent of the path of \a childUrl.
3411*/
3412bool QUrl::isParentOf(const QUrl &childUrl) const
3413{
3414 QString childPath = childUrl.path();
3415
3416 if (!d)
3417 return ((childUrl.scheme().isEmpty())
3418 && (childUrl.authority().isEmpty())
3419 && childPath.length() > 0 && childPath.at(0) == QLatin1Char('/'));
3420
3421 QString ourPath = path();
3422
3423 return ((childUrl.scheme().isEmpty() || d->scheme == childUrl.scheme())
3424 && (childUrl.authority().isEmpty() || authority() == childUrl.authority())
3425 && childPath.startsWith(ourPath)
3426 && ((ourPath.endsWith(QLatin1Char('/')) && childPath.length() > ourPath.length())
3427 || (!ourPath.endsWith(QLatin1Char('/'))
3428 && childPath.length() > ourPath.length() && childPath.at(ourPath.length()) == QLatin1Char('/'))));
3429}
3430
3431
3432#ifndef QT_NO_DATASTREAM
3433/*! \relates QUrl
3434
3435 Writes url \a url to the stream \a out and returns a reference
3436 to the stream.
3437
3438 \sa{Serializing Qt Data Types}{Format of the QDataStream operators}
3439*/
3440QDataStream &operator<<(QDataStream &out, const QUrl &url)
3441{
3442 QByteArray u;
3443 if (url.isValid())
3444 u = url.toEncoded();
3445 out << u;
3446 return out;
3447}
3448
3449/*! \relates QUrl
3450
3451 Reads a url into \a url from the stream \a in and returns a
3452 reference to the stream.
3453
3454 \sa{Serializing Qt Data Types}{Format of the QDataStream operators}
3455*/
3456QDataStream &operator>>(QDataStream &in, QUrl &url)
3457{
3458 QByteArray u;
3459 in >> u;
3460 url.setUrl(QString::fromLatin1(u));
3461 return in;
3462}
3463#endif // QT_NO_DATASTREAM
3464
3465#ifndef QT_NO_DEBUG_STREAM
3466QDebug operator<<(QDebug d, const QUrl &url)
3467{
3468 QDebugStateSaver saver(d);
3469 d.nospace() << "QUrl(" << url.toDisplayString() << ')';
3470 return d;
3471}
3472#endif
3473
3474static QString errorMessage(QUrlPrivate::ErrorCode errorCode, const QString &errorSource, int errorPosition)
3475{
3476 QChar c = uint(errorPosition) < uint(errorSource.length()) ?
3477 errorSource.at(errorPosition) : QChar(QChar::Null);
3478
3479 switch (errorCode) {
3480 case QUrlPrivate::NoError:
3481 Q_ASSERT_X(false, "QUrl::errorString",
3482 "Impossible: QUrl::errorString should have treated this condition");
3483 Q_UNREACHABLE();
3484 return QString();
3485
3486 case QUrlPrivate::InvalidSchemeError: {
3487 auto msg = QLatin1String("Invalid scheme (character '%1' not permitted)");
3488 return msg.arg(c);
3489 }
3490
3491 case QUrlPrivate::InvalidUserNameError:
3492 return QLatin1String("Invalid user name (character '%1' not permitted)")
3493 .arg(c);
3494
3495 case QUrlPrivate::InvalidPasswordError:
3496 return QLatin1String("Invalid password (character '%1' not permitted)")
3497 .arg(c);
3498
3499 case QUrlPrivate::InvalidRegNameError:
3500 if (errorPosition != -1)
3501 return QLatin1String("Invalid hostname (character '%1' not permitted)")
3502 .arg(c);
3503 else
3504 return QStringLiteral("Invalid hostname (contains invalid characters)");
3505 case QUrlPrivate::InvalidIPv4AddressError:
3506 return QString(); // doesn't happen yet
3507 case QUrlPrivate::InvalidIPv6AddressError:
3508 return QStringLiteral("Invalid IPv6 address");
3509 case QUrlPrivate::InvalidCharacterInIPv6Error:
3510 return QLatin1String("Invalid IPv6 address (character '%1' not permitted)").arg(c);
3511 case QUrlPrivate::InvalidIPvFutureError:
3512 return QLatin1String("Invalid IPvFuture address (character '%1' not permitted)").arg(c);
3513 case QUrlPrivate::HostMissingEndBracket:
3514 return QStringLiteral("Expected ']' to match '[' in hostname");
3515
3516 case QUrlPrivate::InvalidPortError:
3517 return QStringLiteral("Invalid port or port number out of range");
3518 case QUrlPrivate::PortEmptyError:
3519 return QStringLiteral("Port field was empty");
3520
3521 case QUrlPrivate::InvalidPathError:
3522 return QLatin1String("Invalid path (character '%1' not permitted)")
3523 .arg(c);
3524
3525 case QUrlPrivate::InvalidQueryError:
3526 return QLatin1String("Invalid query (character '%1' not permitted)")
3527 .arg(c);
3528
3529 case QUrlPrivate::InvalidFragmentError:
3530 return QLatin1String("Invalid fragment (character '%1' not permitted)")
3531 .arg(c);
3532
3533 case QUrlPrivate::AuthorityPresentAndPathIsRelative:
3534 return QStringLiteral("Path component is relative and authority is present");
3535 case QUrlPrivate::AuthorityAbsentAndPathIsDoubleSlash:
3536 return QStringLiteral("Path component starts with '//' and authority is absent");
3537 case QUrlPrivate::RelativeUrlPathContainsColonBeforeSlash:
3538 return QStringLiteral("Relative URL's path component contains ':' before any '/'");
3539 }
3540
3541 Q_ASSERT_X(false, "QUrl::errorString", "Cannot happen, unknown error");
3542 Q_UNREACHABLE();
3543 return QString();
3544}
3545
3546static inline void appendComponentIfPresent(QString &msg, bool present, const char *componentName,
3547 const QString &component)
3548{
3549 if (present) {
3550 msg += QLatin1String(componentName);
3551 msg += QLatin1Char('"');
3552 msg += component;
3553 msg += QLatin1String("\",");
3554 }
3555}
3556
3557/*!
3558 \since 4.2
3559
3560 Returns an error message if the last operation that modified this QUrl
3561 object ran into a parsing error. If no error was detected, this function
3562 returns an empty string and isValid() returns \c true.
3563
3564 The error message returned by this function is technical in nature and may
3565 not be understood by end users. It is mostly useful to developers trying to
3566 understand why QUrl will not accept some input.
3567
3568 \sa QUrl::ParsingMode
3569*/
3570QString QUrl::errorString() const
3571{
3572 QString msg;
3573 if (!d)
3574 return msg;
3575
3576 QString errorSource;
3577 int errorPosition = 0;
3578 QUrlPrivate::ErrorCode errorCode = d->validityError(&errorSource, &errorPosition);
3579 if (errorCode == QUrlPrivate::NoError)
3580 return msg;
3581
3582 msg += errorMessage(errorCode, errorSource, errorPosition);
3583 msg += QLatin1String("; source was \"");
3584 msg += errorSource;
3585 msg += QLatin1String("\";");
3586 appendComponentIfPresent(msg, d->sectionIsPresent & QUrlPrivate::Scheme,
3587 " scheme = ", d->scheme);
3588 appendComponentIfPresent(msg, d->sectionIsPresent & QUrlPrivate::UserInfo,
3589 " userinfo = ", userInfo());
3590 appendComponentIfPresent(msg, d->sectionIsPresent & QUrlPrivate::Host,
3591 " host = ", d->host);
3592 appendComponentIfPresent(msg, d->port != -1,
3593 " port = ", QString::number(d->port));
3594 appendComponentIfPresent(msg, !d->path.isEmpty(),
3595 " path = ", d->path);
3596 appendComponentIfPresent(msg, d->sectionIsPresent & QUrlPrivate::Query,
3597 " query = ", d->query);
3598 appendComponentIfPresent(msg, d->sectionIsPresent & QUrlPrivate::Fragment,
3599 " fragment = ", d->fragment);
3600 if (msg.endsWith(QLatin1Char(',')))
3601 msg.chop(1);
3602 return msg;
3603}
3604
3605/*!
3606 \since 5.1
3607
3608 Converts a list of \a urls into a list of QString objects, using toString(\a options).
3609*/
3610QStringList QUrl::toStringList(const QList<QUrl> &urls, FormattingOptions options)
3611{
3612 QStringList lst;
3613 lst.reserve(urls.size());
3614 for (const QUrl &url : urls)
3615 lst.append(url.toString(options));
3616 return lst;
3617
3618}
3619
3620/*!
3621 \since 5.1
3622
3623 Converts a list of strings representing \a urls into a list of urls, using QUrl(str, \a mode).
3624 Note that this means all strings must be urls, not for instance local paths.
3625*/
3626QList<QUrl> QUrl::fromStringList(const QStringList &urls, ParsingMode mode)
3627{
3628 QList<QUrl> lst;
3629 lst.reserve(urls.size());
3630 for (const QString &str : urls)
3631 lst.append(QUrl(str, mode));
3632 return lst;
3633}
3634
3635/*!
3636 \typedef QUrl::DataPtr
3637 \internal
3638*/
3639
3640/*!
3641 \fn DataPtr &QUrl::data_ptr()
3642 \internal
3643*/
3644
3645/*!
3646 Returns the hash value for the \a url. If specified, \a seed is used to
3647 initialize the hash.
3648
3649 \relates QHash
3650 \since 5.0
3651*/
3652size_t qHash(const QUrl &url, size_t seed) noexcept
3653{
3654 if (!url.d)
3655 return qHash(-1, seed); // the hash of an unset port (-1)
3656
3657 return qHash(url.d->scheme) ^
3658 qHash(url.d->userName) ^
3659 qHash(url.d->password) ^
3660 qHash(url.d->host) ^
3661 qHash(url.d->port, seed) ^
3662 qHash(url.d->path) ^
3663 qHash(url.d->query) ^
3664 qHash(url.d->fragment);
3665}
3666
3667static QUrl adjustFtpPath(QUrl url)
3668{
3669 if (url.scheme() == ftpScheme()) {
3670 QString path = url.path(QUrl::PrettyDecoded);
3671 if (path.startsWith(QLatin1String("//")))
3672 url.setPath(QLatin1String("/%2F") + QStringView{path}.mid(2), QUrl::TolerantMode);
3673 }
3674 return url;
3675}
3676
3677static bool isIp6(const QString &text)
3678{
3679 QIPAddressUtils::IPv6Address address;
3680 return !text.isEmpty() && QIPAddressUtils::parseIp6(address, text.begin(), text.end()) == nullptr;
3681}
3682
3683/*!
3684 Returns a valid URL from a user supplied \a userInput string if one can be
3685 deduced. In the case that is not possible, an invalid QUrl() is returned.
3686
3687 This allows the user to input a URL or a local file path in the form of a plain
3688 string. This string can be manually typed into a location bar, obtained from
3689 the clipboard, or passed in via command line arguments.
3690
3691 When the string is not already a valid URL, a best guess is performed,
3692 making various assumptions.
3693
3694 In the case the string corresponds to a valid file path on the system,
3695 a file:// URL is constructed, using QUrl::fromLocalFile().
3696
3697 If that is not the case, an attempt is made to turn the string into a
3698 http:// or ftp:// URL. The latter in the case the string starts with
3699 'ftp'. The result is then passed through QUrl's tolerant parser, and
3700 in the case or success, a valid QUrl is returned, or else a QUrl().
3701
3702 \section1 Examples:
3703
3704 \list
3705 \li qt-project.org becomes http://qt-project.org
3706 \li ftp.qt-project.org becomes ftp://ftp.qt-project.org
3707 \li hostname becomes http://hostname
3708 \li /home/user/test.html becomes file:///home/user/test.html
3709 \endlist
3710
3711 In order to be able to handle relative paths, this method takes an optional
3712 \a workingDirectory path. This is especially useful when handling command
3713 line arguments.
3714 If \a workingDirectory is empty, no handling of relative paths will be done.
3715
3716 By default, an input string that looks like a relative path will only be treated
3717 as such if the file actually exists in the given working directory.
3718 If the application can handle files that don't exist yet, it should pass the
3719 flag AssumeLocalFile in \a options.
3720
3721 \since 5.4
3722*/
3723QUrl QUrl::fromUserInput(const QString &userInput, const QString &workingDirectory,
3724 UserInputResolutionOptions options)
3725{
3726 QString trimmedString = userInput.trimmed();
3727
3728 if (trimmedString.isEmpty())
3729 return QUrl();
3730
3731 // Check for IPv6 addresses, since a path starting with ":" is absolute (a resource)
3732 // and IPv6 addresses can start with "c:" too
3733 if (isIp6(trimmedString)) {
3734 QUrl url;
3735 url.setHost(trimmedString);
3736 url.setScheme(QStringLiteral("http"));
3737 return url;
3738 }
3739
3740 const QUrl url = QUrl(trimmedString, QUrl::TolerantMode);
3741
3742 // Check for a relative path
3743 if (!workingDirectory.isEmpty()) {
3744 const QFileInfo fileInfo(QDir(workingDirectory), userInput);
3745 if (fileInfo.exists())
3746 return QUrl::fromLocalFile(fileInfo.absoluteFilePath());
3747
3748 // Check both QUrl::isRelative (to detect full URLs) and QDir::isAbsolutePath (since on Windows drive letters can be interpreted as schemes)
3749 if ((options & AssumeLocalFile) && url.isRelative() && !QDir::isAbsolutePath(userInput))
3750 return QUrl::fromLocalFile(fileInfo.absoluteFilePath());
3751 }
3752
3753 // Check first for files, since on Windows drive letters can be interpretted as schemes
3754 if (QDir::isAbsolutePath(trimmedString))
3755 return QUrl::fromLocalFile(trimmedString);
3756
3757 QUrl urlPrepended = QUrl(QLatin1String("http://") + trimmedString, QUrl::TolerantMode);
3758
3759 // Check the most common case of a valid url with a scheme
3760 // We check if the port would be valid by adding the scheme to handle the case host:port
3761 // where the host would be interpretted as the scheme
3762 if (url.isValid()
3763 && !url.scheme().isEmpty()
3764 && urlPrepended.port() == -1)
3765 return adjustFtpPath(url);
3766
3767 // Else, try the prepended one and adjust the scheme from the host name
3768 if (urlPrepended.isValid() && (!urlPrepended.host().isEmpty() || !urlPrepended.path().isEmpty())) {
3769 int dotIndex = trimmedString.indexOf(QLatin1Char('.'));
3770 const QStringView hostscheme = QStringView{trimmedString}.left(dotIndex);
3771 if (hostscheme.compare(ftpScheme(), Qt::CaseInsensitive) == 0)
3772 urlPrepended.setScheme(ftpScheme());
3773 return adjustFtpPath(urlPrepended);
3774 }
3775
3776 return QUrl();
3777}
3778
3779QT_END_NAMESPACE
3780