Skip to content

Fetch url content

NLP ยท text

Fetch the main text from a web URL, and return its title, author, content, excerpt and domain.

Example

Since the step has no configuration parameter, it's simply

fetch_url_content(ds.article_url) -> (
  ds.article_title,
  ds.article_author,
  ds.article_content,
  ds.article_excerpt,
  ds.article_domain
)

Usage

The following are the step's expected inputs and outputs and their specific types.

fetch_url_content(urls: url) -> (
    title: text,
    author: category,
    content: text,
    excerpt: text,
    domain: url
)

Inputs


urls: column:url

A column of URLs linkling to articles, blog posts or webpages.

Outputs


title: column:text

A text column containing the extracted article's title.


author: column:category

A categorical column containing the extracted article's author.


content: column:text

A text column containing the extracted article's main text.


excerpt: column:text

A text column containing a summary of the extracted article.


domain: column:url

A column containing only the domain of each original URL.