|
| static boolean | isHttp (URI uri) |
| |
| static void | findLinks (URI base, String html, Collection< URI > links) |
| |
| static URI | toAbsolute (URI base, String href) |
| |
| static URI | clean (URI uri) |
| |
| static URI | toUri (String link) throws URISyntaxException |
| |
| static ArrayList< URI > | listUris (URI base, String html) |
| |
| static HashSet< URI > | uniqueUris (URI base, String html) |
| |
|
| static final Pattern | URI_PARTS = Pattern.compile("^(?:([^:]*):)?([^#]+)?(?:#(.*))?$") |
| |
Finds HTTP(S) URLs from the anchor tags within HTML code.
- Author
- CS 272 Software Development (University of San Francisco)
-
Ravneet Singh Bhatia
- Version
- Spring 2024
◆ clean()
| static URI edu.usfca.cs272.LinkFinder.clean |
( |
URI | uri | ) |
|
|
static |
Normalizes and removes the fragment from a URI. For non-opaque hierarchical URIs, will also make sure the path is default to / if it is missing.
- Parameters
-
- Returns
- the cleaned URI
- See also
- URI::normalize()
-
URI::isOpaque()
◆ findLinks()
| static void edu.usfca.cs272.LinkFinder.findLinks |
( |
URI | base, |
|
|
String | html, |
|
|
Collection< URI > | links ) |
|
static |
Finds all the valid HTTP(S) links in the HREF attribute of the anchor tags in the provided HTML. The links will be converted to an absolute URI using the base URI and cleaned.
Any links that do not use the HTTP/S protocol or are unable to be properly parsed for any reason will not be included.
- Parameters
-
| base | the base URI used to convert to absolute URIs |
| html | the raw HTML associated with the base URI |
| links | the data structure to store found HTTP(S) links |
- See also
- Pattern::compile(String)
-
Matcher::find()
-
Matcher::group(int)
-
#toAbsolute(URI, String)
-
#isHttp(URI)
-
#clean(URI)
◆ isHttp()
| static boolean edu.usfca.cs272.LinkFinder.isHttp |
( |
URI | uri | ) |
|
|
static |
Determines whether the URI provided uses the HTTP or HTTPS protocol or scheme (case-insensitive).
- Parameters
-
- Returns
- true if the URI uses the HTTP or HTTPS protocol (or scheme)
◆ listUris()
| static ArrayList< URI > edu.usfca.cs272.LinkFinder.listUris |
( |
URI | base, |
|
|
String | html ) |
|
static |
Returns a list of all the valid HTTP(S) URIs found in the HREF attribute of the anchor tags in the provided HTML.
- Parameters
-
| base | the base URI used to convert relative links to absolute URIs |
| html | the raw HTML associated with the base URL |
- Returns
- list of all valid HTTP(S) links in the order they were found
- See also
- #findLinks(URI, String, Collection)
◆ toAbsolute()
| static URI edu.usfca.cs272.LinkFinder.toAbsolute |
( |
URI | base, |
|
|
String | href ) |
|
static |
Attempts to create a normalized absolute URI from the provided base URI and link text without the fragment component if it is included. If the conversion fails for any reason, will return null.
- Parameters
-
| base | the base URI the link text was found on |
| href | the link text (usually from an anchor tag href attribute) |
- Returns
- the normalized absolute URI or
null
- See also
- #clean(URI)
-
#toUri(String)
-
URI::resolve(URI)
◆ toUri()
| static URI edu.usfca.cs272.LinkFinder.toUri |
( |
String | link | ) |
throws URISyntaxException |
|
static |
Attempts to create a URI from the provided text, attempting to encode the link text if the process initially fails. Especially useful when dealing with relative links that have spaces or other special characters.
- Parameters
-
| link | the text value to convert to URI |
- Returns
- the converted URI
- Exceptions
-
| URISyntaxException | if unable to create |
- See also
- URI::URI(String)
-
URI::URI(String, String, String)
◆ uniqueUris()
| static HashSet< URI > edu.usfca.cs272.LinkFinder.uniqueUris |
( |
URI | base, |
|
|
String | html ) |
|
static |
Returns a set of all the unique valid HTTP(S) URIs found in the HREF attribute of the anchor tags in the provided HTML.
- Parameters
-
| base | the base URI used to convert relative URIs to absolute3 |
| html | the raw HTML associated with the base URI |
- Returns
- set of all valid and unique HTTP(S) links found
- See also
- #findLinks(URI, String, Collection)
◆ URI_PARTS
| final Pattern edu.usfca.cs272.LinkFinder.URI_PARTS = Pattern.compile("^(?:([^:]*):)?([^#]+)?(?:#(.*))?$") |
|
static |
Regular expression to break URI into high level component parts in the form:
[scheme:]scheme-specific-part[#fragment]
Group 1 is the scheme without the : colon symbol, group 2 is the scheme specific part, and group 3 is the fragment without the # hash symbol. Groups may be null.
Warning: Does not validate the URI.
- See also
- URI
The documentation for this class was generated from the following file:
- C:/USFCS/CS272/projects/project-Ravneetsb/src/main/java/edu/usfca/cs272/LinkFinder.java