Skip to content

Extract url components

Extract components from an URL.

Let's say we have http://www.cwi.nl:80/%7Eguido/Python.html;a=2;b=3?c=4,2&d=e#anchor as our URL. Then these components will be the following:

  • scheme: URL scheme specifier (http)
  • domain: Network location part (www.cwi.nl:80)
  • path: Hierarchical path (/%7Eguido/Python.html)
  • params: Parameters for last path element (a=2;b=3)
  • query: Query component (c=4,2&d=e)
  • fragment: Fragment identifier (anchor)

For more information about these components you can check urllib's description here.

Usage


The following are the step's expected inputs and outputs and their specific types.

Step signature
extract_url_components(urls: url, {"param": value}) -> (
    scheme: category,
    domain: category,
    path: category,
    params: category,
    query: category,
    fragment: category
)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Example

Use http as default scheme.

Example call (in recipe editor)
extract_url_components(ds.urls, {
  "default_scheme": "http",
}) -> (
  ds.scheme,
  ds.domain,
  ds.path,
  ds.params,
  ds.query,
  ds.fragment
)

Inputs


urls: column:url

The list of URLs you wish to decomponse.

Outputs


scheme: column:category

URL scheme specifier (http).


domain: column:category

Network location part (www.cwi.nl:80).


path: column:category

Hierarchical path (/%7Eguido/Python.html).


params: column:category

Parameters for last path element (a=2;b=3).


query: column:category

Query component (c=4,2&d=e).


fragment: column:category

Fragment identifier, like after hashtag (anchor).

Parameters


default_scheme: string | null = "http"

URL Default Scheme. If you wish to add a scheme (http, https...) prefix to those urls that don't have one, do it here. If you wish none to be added, use null instead.

Back to top