VIBE v5.1.6
Search Engine
Loading...
Searching...
No Matches
edu.usfca.cs272.LinkFinder Class Reference

Static Public Member Functions

static boolean isHttp (URI uri)
 
static void findLinks (URI base, String html, Collection< URI > links)
 
static URI toAbsolute (URI base, String href)
 
static URI clean (URI uri)
 
static URI toUri (String link) throws URISyntaxException
 
static ArrayList< URI > listUris (URI base, String html)
 
static HashSet< URI > uniqueUris (URI base, String html)
 

Static Public Attributes

static final Pattern URI_PARTS = Pattern.compile("^(?:([^:]*):)?([^#]+)?(?:#(.*))?$")
 

Detailed Description

Finds HTTP(S) URLs from the anchor tags within HTML code.

Author
CS 272 Software Development (University of San Francisco)
Ravneet Singh Bhatia
Version
Spring 2024

Member Function Documentation

◆ clean()

static URI edu.usfca.cs272.LinkFinder.clean ( URI uri)
static

Normalizes and removes the fragment from a URI. For non-opaque hierarchical URIs, will also make sure the path is default to / if it is missing.

Parameters
urithe URI to clean
Returns
the cleaned URI
See also
URI::normalize()
URI::isOpaque()

◆ findLinks()

static void edu.usfca.cs272.LinkFinder.findLinks ( URI base,
String html,
Collection< URI > links )
static

Finds all the valid HTTP(S) links in the HREF attribute of the anchor tags in the provided HTML. The links will be converted to an absolute URI using the base URI and cleaned.

Any links that do not use the HTTP/S protocol or are unable to be properly parsed for any reason will not be included.

Parameters
basethe base URI used to convert to absolute URIs
htmlthe raw HTML associated with the base URI
linksthe data structure to store found HTTP(S) links
See also
Pattern::compile(String)
Matcher::find()
Matcher::group(int)
#toAbsolute(URI, String)
#isHttp(URI)
#clean(URI)

◆ isHttp()

static boolean edu.usfca.cs272.LinkFinder.isHttp ( URI uri)
static

Determines whether the URI provided uses the HTTP or HTTPS protocol or scheme (case-insensitive).

Parameters
urithe URI to check
Returns
true if the URI uses the HTTP or HTTPS protocol (or scheme)

◆ listUris()

static ArrayList< URI > edu.usfca.cs272.LinkFinder.listUris ( URI base,
String html )
static

Returns a list of all the valid HTTP(S) URIs found in the HREF attribute of the anchor tags in the provided HTML.

Parameters
basethe base URI used to convert relative links to absolute URIs
htmlthe raw HTML associated with the base URL
Returns
list of all valid HTTP(S) links in the order they were found
See also
#findLinks(URI, String, Collection)

◆ toAbsolute()

static URI edu.usfca.cs272.LinkFinder.toAbsolute ( URI base,
String href )
static

Attempts to create a normalized absolute URI from the provided base URI and link text without the fragment component if it is included. If the conversion fails for any reason, will return null.

Parameters
basethe base URI the link text was found on
hrefthe link text (usually from an anchor tag href attribute)
Returns
the normalized absolute URI or null
See also
#clean(URI)
#toUri(String)
URI::resolve(URI)

◆ toUri()

static URI edu.usfca.cs272.LinkFinder.toUri ( String link) throws URISyntaxException
static

Attempts to create a URI from the provided text, attempting to encode the link text if the process initially fails. Especially useful when dealing with relative links that have spaces or other special characters.

Parameters
linkthe text value to convert to URI
Returns
the converted URI
Exceptions
URISyntaxExceptionif unable to create
See also
URI::URI(String)
URI::URI(String, String, String)

◆ uniqueUris()

static HashSet< URI > edu.usfca.cs272.LinkFinder.uniqueUris ( URI base,
String html )
static

Returns a set of all the unique valid HTTP(S) URIs found in the HREF attribute of the anchor tags in the provided HTML.

Parameters
basethe base URI used to convert relative URIs to absolute3
htmlthe raw HTML associated with the base URI
Returns
set of all valid and unique HTTP(S) links found
See also
#findLinks(URI, String, Collection)

Member Data Documentation

◆ URI_PARTS

final Pattern edu.usfca.cs272.LinkFinder.URI_PARTS = Pattern.compile("^(?:([^:]*):)?([^#]+)?(?:#(.*))?$")
static

Regular expression to break URI into high level component parts in the form:

[scheme:]scheme-specific-part[#fragment]

Group 1 is the scheme without the : colon symbol, group 2 is the scheme specific part, and group 3 is the fragment without the # hash symbol. Groups may be null.

Warning: Does not validate the URI.

See also
URI

The documentation for this class was generated from the following file: