Majestic Developer | DownloadBackLinks

DownloadBackLinks

Status

Command status:	Active
Supported by OpenApps API:	Yes
Supported by Internal/Reseller API:	Yes
Possibly queued processing:	Yes (Always)

Purpose

This command allows for large scale backlinks retrieval beyond the maximum of 50000 links supported by GetBackLinkData.

This command will generate data equivalent to the data obtained through the download interface for Advanced Reports on the Majestic site.

Due to the high volume of data to process, this command will always work asynchronously, meaning it doesn't return the data instantly but instead requests the generation of a .GZ compressed CSV file which can either be uploaded to an FTP server on completion, or downloaded from the location specified in the corresponding GetDownloadsList command.

The GetDownloadsList command can also be polled to monitor the status of the download, although if you have a internet-facing application you may find the "NotifyURL" parameter a more suitable way to be informed when your download has completed.

Resources consumed

Resource	Description
AnalysisResUnits	This resource will be decreased by an amount dependent on the size of the analysed item. The precise amount is detailed in the response to GetIndexItemInfo under the data column DownloadBacklinksAnalysisResUnitsCost.
RetrievalResUnits	This resource will be decreased by the number of rows of data returned by this command.

Parameters

Parameter	Description
cmd	Required: must be set to: DownloadBackLinks
datasource	Optional - defaults to historic Either: "fresh" - to query against Fresh Index, or "historic" - to query against Historic Index.
Query	Required: index item that is being queried using same convention as for GetIndexItemInfo, examples: http://www.example.com - URL www.example.com - subdomain example.com - root domain
SkipIfAnalysisCostGreaterThan	Analysis will be automatically aborted for any index items that have a greater analysis cost than specified. This parameter prevents accidental attempts to download backlinks for very large items, such as google.com. Important: This parameter defaults to 1000000. If you intend to get backlinks for items with a higher cost than this default, you will need to provide a maximum acceptable cost as a value for SkipIfAnalysisCostGreaterThan. Default: 1000000
NotifyURL	Important: notification URL must be accessible from outside your intranet - do not specify internal servers that can't be assessed from our servers, if you do then you will never get the notification you expect! Optional: if specified this URL (with HTTP/HTTPS protocols only) will be requested to notify you that the download has been fully prepared. The URL you provide can contain query string parameters to help you identify the download request that was made, we will also substitute (if they are present) the following macro variables (case sensitive): %%DOWNLOAD_FILE%% - will be changed to the download filename %%DOWNLOAD_FILE_LOCATION%% - will be changed to the download filename location using PublicDownloadLocation variable below Your notify URL should respond with single piece of data: OK (2 characters - no HTML) to indicate that you have successfully received this notification. Any other response (including server failures on your side) will be treated as an error. In the case of an error, the notification URL will be called again a number of times using exponential backoff before failing. Example of notification url: http://www.example.com/mjseonotify.php (this URL doesn't actually exist!)
UploadToFTP	Important: this FTP URL must be accessible from outside your intranet - do not specify internal servers that can't be assessed from our side: if you do then you will never get the upload! Optional: if not specified prepared file with backlinks will be uploaded to majestic.com, this behaviour can be overridden by specifying your own FTP server using appropriate URI format such as: ftp://username:password@yourftpserver.com/public_html/uploads/ with the relative path of the upload directory or ftp://username:password@yourftpserver.com//home/user/public_html/uploads/ with the absolute path of the upload directory. Make sure that the user specified is allowed to write files in the directory that you designated for these uploads. If a trailing filename is specified, this will be prepended to the output filename when it is upload to your server. If not specified the upload will be made to www.majestic.com from where you will be able to download the file.
PublicDownloadLocation	Optional: by default, the prepared backlinks file will be available from http://downloads.majestic.com (note: directory listing is denied by design there). This parameters allows you to provide an alternative FTP location's public URL.

Filtering rules

Filtering rules can be provided to specify which data should be analysed. For example, a particular analysis request may wish to focus on backlinks marked as "nofollow" or exclude such links from any analysis. Filtering rules can be combined, ie: only analyse backlinks found in a particular time period, excluding any links marked as "nofollow".

Rule name	Description
Target URLs filtering rules
URLs	One or more Target URLs delimited by CR LF - only meaningful for domain level analysis, this will force analysis to run only on backlinks pointing to these URLs. Exact matching of URLs will be performed.
IncludeMatchedURLs	Comma delimited list of strings that will be required to be matched in Target URLs in order for them to be analysed.
ExcludeMatchedURLs	Same as IncludeMatchedURLs but matched Target URLs will be excluded.
Source URL filtering rules
FlagIncludeDeleted (also works old flag: FlagIncludeOldCrawl) FlagIncludeNoFollow FlagIncludeRedirect FlagIncludeImage FlagIncludeFrame FlagIncludeAltText FlagIncludeMention	Includes into analysis source URLs that had at least one of the following flags set: Deleted (also works OldCrawl) - links that were present on a page but after recrawl found to be deleted (removed) NoFollow - links marked with "nofollow" Redirect - links that were redirecting Image - links that were image based rather than text Frame - source URL was used in a frameset (useful to find if someone frames content) AltText - text used in title attribute of A tag Mentions - text mentions of a domain or link See more about Source Flags: http://www.majestic.com/glossary.php#SourceFlags These are set by setting the parameter to 1 eg "FlagIncludeNoFollow=1"
FlagExcludeDeleted (also works old flag: FlagExcludeOldCrawl) FlagExcludeNoFollow FlagExcludeRedirect FlagExcludeImage FlagExcludeFrame FlagExcludeAltText FlagExcludeMention	Same as FlagInclude* filtering rules, only setting them will result in exclusion of source URLs with specified flags set. Recommended use: exclude non-rank passing backlinks such as those marked with Deleted, NoFollow, Mention, AltText, Frame, Redirect flags. These are set by setting the parameter to 1 eg "FlagExcludeNoFollow=1"
IncludeMatchedRefDomains	If set then only backlinks for matching referring domains will be retrieved. Referring domains can be comma delimited to provide multiple referring domains of interest. Example of valid referring domains: example.com,example.net It's possible that there will be no backlinks for specified referring domains in which case empty file (with headers) will be produced.
ExcludeMatchedRefDomains	If set then backlinks for matching referring domains will be excluded. Referring domains can be comma delimited to provide multiple referring domains to be excluded. Example of valid referring domains: example.com,example.net It's possible that there will be no backlinks returned when these referring domains are excluded.
IncludeMatchedRefURLs	If set then only backlinks containing the specified text in the source URL will be returned, multiple text can be specified in a comma separated list Example of valid text: page,folder It's possible that there will be no backlinks for specified text in which case empty file (with headers) will be produced.
ExcludeMatchedRefURLs	Same as IncludeMatchedRefURLs but matched URLs will be excluded from the analysis.
IncludeExactAnchorText	If set then only backlinks with exactly matching (lower cased) anchor text will be selected. Example: "yahoo" (without quotes) - this will only match anchor texts that are exactly "yahoo", it won't match "yahoo!" or anything different. Use \| (pipe) to separate multiple anchor texts, though this is not recommended.
ExcludeExactAnchorText	Same as IncludeExactAnchorText but matched anchor text items will be excluded from analysis.
IncludeContainsAnchorText	If set then only backlinks with anchor text containing (lower cased) specified anchor texts. Example: yahoo This will match anchor texts that contain that word, ie: "yahoo!", "click here to go yahoo" etc Use \| (pipe) to separate multiple anchor texts, though this is not recommended.
ExcludeExactAnchorText	Same as IncludeExactAnchorText but matched anchor text items will be excluded from analysis.
IncludeContainsAnchorText	If set then only backlinks with anchor text containing (lower cased) specified anchor texts. Example: yahoo This will match anchor texts that contain that word, ie: "yahoo!", "click here to go yahoo" etc Use \| (pipe) to separate multiple anchor texts, though this is not recommended.
ExcludeContainsAnchorText	Same as IncludeContainsAnchorText but matched anchor text items will be excluded from analysis.
IncludeMatchedRefTLDs	Comma delimited list of TLDs (Top Level Domains) that will be included in analysis, ie: edu,gov - note there is no . in front of TLD.
ExcludeMatchedRefTLDs	Same as IncludeMatchedRefTLDs but matched referring TLDs will be excluded from analysis.
EnableBackLinkDateRange	If set to 1 then Last Crawled date range analysis will be enabled.
IncludeBackLinksDateFrom	Date in format of DD/MM/YYYY (ie: 05/02/2008 for 5 February 2008) - links from Source URLs with Last Crawled date starting at that moment will be included in analysis. Requires EnableBackLinkDateRange to be set to 1.
IncludeBackLinksDateTo	Same as IncludeBackLinksDateFrom only this is the cut off Last Crawled date for backlinks analysis by date. Requires EnableBackLinkDateRange to be set to 1.
EnableBackLinkFFDateRange	If set to 1 then First Found date range analysis will be enabled.
IncludeBackLinksFFDateFrom	Date in format of DD/MM/YYYY (ie: 05/02/2008 for 5 February 2008) - links from Source URLs with First Found date starting at that moment will be included in analysis. Note: Fresh Index only includes data going back 30 days and it can only see when link was first found during that date range effectively making it impossible to say if the same link wasn't found 3 years ago. It is recommended to use historical index for first found dates. Requires EnableBackLinkFFDateRange to be set to 1.
IncludeBackLinksFFDateTo	Same as IncludeBackLinksFFDateFrom only this is the cut off First Found date for backlinks analysis by date. Requires EnableBackLinkFFDateRange to be set to 1.
IncludeMatchedIPs	Includes links from Source URLs with matched IP addresses of the resolved referring domains that matched comma delimited list of IPs. Prefix matching is used, you can match subnets by leaving out the end of the IP, ie: 212.34.4.
ExcludeMatchedIPs	Same as IncludeMatchedIPs but matched referring TLDs will be excluded from analysis.
IncludeMatchedGeoCountry	Includes backlinks referring domains of which were geo-located in specified countries. Use two letter country codes or NA for not geo-located IPs or unresolved domains. List of country codes: http://www.maxmind.com/app/iso3166
ExcludeMatchedGeoCountry	Same as IncludeMatchedGeoCountry but matched referring TLDs will be excluded from analysis.
MinRefACRank	If specified only source URLs with ACRank higher than specified will be considered. Acceptable values: 0-15
MinCitationFlow	If specified, only source URLs with a citation flow greater than the value provided will be considered 0 - 100 (Default 0)
MaxCitationFlow	If specified, only source URLs with a citation flow less than the value provided will be considered 0 - 100 (Default 100)
MinTrustFlow	If specified, only source URLs with a trust flow greater than the value provided will be considered 0 - 100 (Default 0)
MaxTrustFlow	If specified, only source URLs with a trust flow less than the value provided will be considered 0 - 100 (Default 100)
CitationTrustDeltaChoice	Enable/Disable the Citation / Trust flow delta filtering -1 : Off 0 : Minimum 1 : Maximum (Default 0)
CitationTrustDeltaValue	Apply filtering on trust/citation flows based on relative ratios to one another. For example, only consider URLs with a trust flow 50% higher than their citation flow. 0 - 100 (Default 0)
UsePrefixScan	This parameter indicates that you would like links to any page under the specified prefix eg setting item=http://www.site.com/folder/ and UsePrefixScan=1 will return any links to any page on www.site.com under /folder/ Default: 0
CreateBackLinkCountsOnly	Adding this parameter with any value will create a list of known URLs within the query scope, summarised by the number of backlinks to them. eg CreateBackLinkCountsOnly=1 or CreateBackLinkCountsOnly=0 Default: parameter not sent
MinBackLinks	Used in conjunction with CreateBackLinkCountsOnly to specify the minimum number of external backlinks a URL should have to be included in the report. 0 will include all known URLs within the scope of the report, 1 will include all URLs with 1 or more external backlinks, etc. Default: 0
UniqueBackLinksOnly	When set to 1 it will limit number of backlinks from the same source pointing to the same target URL. When used with FlagExcludeMention=1 it will produce same number of rows in file as External Backlinks count reported by GetIndexItemInfo Default: 0

Sample query and response

This is a protocol-level example query that uses a special URL that was overridden to have zero cost of analysis (you will need to use your own API_KEY to analyse other urls):

XML response:

<?xml version="1.0" encoding="UTF-8"?>
<Result Code="QueuedForProcessing" ErrorMessage="" FullError="">
 <GlobalVars DownloadJobID="FC19FC1BE7EC6FF277078A2073FE9562" IndexBuildDate="2017-09-04 13:42:54" IndexType="0" JobID="FC19FC1BE7EC6FF277078A2073FE9562" ReportName="DownloadBackLinks" ServerBuild="2017-10-12 15:22:34" ServerName="HUMMERR" ServerVersion="1.0.6494.25877" UniqueIndexID="20170904134254-HISTORICAL" UserID="895472" />
</Result>

JSON response:

{
"Code": "QueuedForProcessing",
"ErrorMessage": "",
"FullError": "",
"DownloadJobID": "CB8C36874702C69A767AC086AFD01B1B",
"IndexBuildDate": "2017-09-04 13:42:54",
"IndexType": 0,
"JobID": "CB8C36874702C69A767AC086AFD01B1B",
"ReportName": "DownloadBackLinks",
"ServerBuild": "2017-10-12 15:22:34",
"ServerName": "HUMMERR",
"ServerVersion": "1.0.6494.25877",
"UniqueIndexID": "20170904134254-HISTORICAL",
"UserID": 895472
}

This response indicates that the request was queued for asynchronous processing. Note: JobID value is returned in this case. This response indicates that this request has already been requested recently and data files were prepared for it. To see details on finding these files, please see the documentation regarding GetDownloadsList. Sample check query for this particular request data files (note: DownloadJobID value will be different - take it from XML returned by previous call):

Returned values

Return value	Description
Global variables
Code	Code indicating if this command successfully completed.
ErrorMessage	A message explaining the error. This will be blank if the code is "OK".
FullError	A verbose explanation of the error.
DownloadJobID	Unique identifier for this download.
IndexBuildDate	The date the index was built.
IndexType	The index queried (0=Fresh, 1=Historic).
JobID	Unique identifier for this download.
ReportName	Name of the report these files are downloaded for.
ServerBuild	Date/time on which the server was built.
ServerName	Name of the server where the file was stored.
ServerVersion	The version of the server when the file was deleted.
UniqueIndexID	Unique identifier for the index storing the deleted item.
UserID	ID of the user making the request.

Related commands

To see details on finding the files created by running DownloadBackLinks, please see the documentation regarding GetDownloadsList.

To see details on deleting this job, please see the documentation regarding DeleteDownloads.

To see details on how to obtain the cost of analysis, please see the documentation regarding GetIndexItemInfo.

Common problems

Problem	Solution
Making requests to very large domains such as google.com uses up resources very quickly.	Use SkipIfAnalysisCostGreaterThan parameter to avoid analysing too large domains and/or call GetIndexItemInfo command to get analysis cost first.
Repeated calls to the same domain with slightly varied parameters can quickly use up available resources.	Call this command once, then manipulate the data on your end.
Calls to this command yield few to no results.	Ensure that your parameters aren't too narrow. In particular, check date range filtering: please consider that our main index is not updated every day.

Majestic Developer Home