wget2 2.2.1
Loading...
Searching...
No Matches
Robots Exclusion file parser

Data Structures

struct  wget_robots_st

Macros

#define parse_record_field(d, f)

Functions

int wget_robots_parse (wget_robots **_robots, const char *data, const char *client)
void wget_robots_free (wget_robots **robots)
int wget_robots_get_path_count (wget_robots *robots)
wget_stringwget_robots_get_path (wget_robots *robots, int index)
int wget_robots_get_sitemap_count (wget_robots *robots)
const char * wget_robots_get_sitemap (wget_robots *robots, int index)

Detailed Description

The purpose of this set of functions is to parse a Robots Exclusion Standard file into a data structure for easy access.

Macro Definition Documentation

◆ parse_record_field

#define parse_record_field ( d,
f )
Value:
parse_record_field(d, f, sizeof(f) - 1)

Function Documentation

◆ wget_robots_parse()

int wget_robots_parse ( wget_robots ** _robots,
const char * data,
const char * client )
Parameters
[in]dataMemory with robots.txt content (with trailing 0-byte)
[in]clientName of the client / user-agent
Returns
Return an allocated wget_robots structure or NULL on error

The function parses the robots.txt data in accordance to https://www.robotstxt.org/orig.html#format and returns a ROBOTS structure including a list of the disallowed paths and including a list of the sitemap files.

The ROBOTS structure has to be freed by calling wget_robots_free().

◆ wget_robots_free()

void wget_robots_free ( wget_robots ** robots)
Parameters
[in,out]robotsPointer to Pointer to wget_robots structure

wget_robots_free() free's the formerly allocated wget_robots structure.

◆ wget_robots_get_path_count()

int wget_robots_get_path_count ( wget_robots * robots)
Parameters
robotsPointer to instance of wget_robots
Returns
Returns the number of paths listed in robots

◆ wget_robots_get_path()

wget_string * wget_robots_get_path ( wget_robots * robots,
int index )
Parameters
robotsPointer to instance of wget_robots
indexIndex of the wanted path
Returns
Returns the path at index or NULL

◆ wget_robots_get_sitemap_count()

int wget_robots_get_sitemap_count ( wget_robots * robots)
Parameters
robotsPointer to instance of wget_robots
Returns
Returns the number of sitemaps listed in robots

◆ wget_robots_get_sitemap()

const char * wget_robots_get_sitemap ( wget_robots * robots,
int index )
Parameters
robotsPointer to instance of wget_robots
indexIndex of the wanted sitemap URL
Returns
Returns the sitemap URL at index or NULL