FreeBSD Sitemap XML Checker With NPM and Go

If you want to create a sitemap file, you must pay attention to the sitemap protocol. The general format of the Sitemap protocol consists of XML tags. All data values in a Sitemap must be entity-escaped. The file itself must be UTF-8 encoded. Usually when creating a sitemap it starts with the opening <urlset> tag and ends with the closing </urlset> tag.

The main function of a sitemap is a way to tell search engines like Google search console which pages on your site should be crawled and how often those pages are crawled by search engines. If the sitemap format you write is wrong or invalid, don't expect search engines to crawl your website content.

The sitemap is a quite crucial part of the main structure of the website. If you want your website to get a high ranking on SERP, a sitemap can also contribute to the SEO strategy that you implement. Because information in the form of categories, posts, pages, images and tags, are all listed on the sitemap page.

In this article we will create a sitemap URL checker application using GO and NPM. Writing and implementing the entire contents of this article using FreeBSD 13.2.

Generated by Embed Youtube Video online

2. Sitemap Checker With Go

In this first part we will create a sitema checker with the help of the Go programming language. With the help of Go, creating a sitemap is very easy and very helpful in correcting broken or invalid URLs.

To start with we will create a working directory called "FreeBSD_sitemap_checker". In this directory we will create one folders with the names "build and sitemap-checker". In this folder all the NPM and Go sitemap checker files will be placed. OK, let's just carry out the command.
root@ns3:~ # cd /var
root@ns3:/var # mkdir -p FreeBSD_sitemap_checker
root@ns3:/var # cd FreeBSD_sitemap_checker
root@ns3:/var/FreeBSD_sitemap_checker # mkdir -p sitemap-checker
root@ns3:/var/FreeBSD_sitemap_checker #

To create a sitemap checker with Go, we will use the "/var/FreeBSD_sitemap_checker/sitemap-checker " folder. Create 3 files in the folder by running the following script.
root@ns3:/var/FreeBSD_sitemap_checker # cd sitemap-checker
root@ns3:/var/FreeBSD_sitemap_checker/sitemap-checker # touch main.go sitemap.go sitemap_index.go
In each file, type the script below, by opening the link we have written below

/var/FreeBSD_sitemap_checker/sitemap-checker/main.go

https://raw.githubusercontent.com/unixwinbsd/FreeBSD_sitemap_checker/main/sitemap-checker/main.go

/var/FreeBSD_sitemap_checker/sitemap-checker/sitemap.go

https://raw.githubusercontent.com/unixwinbsd/FreeBSD_sitemap_checker/main/sitemap-checker/sitemap.go

/var/FreeBSD_sitemap_checker/sitemap-checker/sitemap_index.go

https://raw.githubusercontent.com/unixwinbsd/FreeBSD_sitemap_checker/main/sitemap-checker/sitemap_index.go

So that you can run these three files, use the command below.
root@ns3:/var/FreeBSD_sitemap_checker/sitemap-checker # go mod init sitemap-checker
root@ns3:/var/FreeBSD_sitemap_checker/sitemap-checker # go mod tidy
After that, run the "go build" command, to produce a binary file that we will later use to run this application.
root@ns3:/var/FreeBSD_sitemap_checker/sitemap-checker # go build -v ./...
The command above will produce a binary file called "sitemap-checker", run this file to start the application.
root@ns3:/var/FreeBSD_sitemap_checker/sitemap-checker # ./sitemap-checker -uri=https://www.unixwinbsd.site/sitemap.xml -out=output.xml
You can see the results in the image below.




2. Sitemap Checker With NPM

After you have successfully created a sitemap checker in Go language. We continue with creating a sitemap checker with NPM. This sitemap checker runs in Java, on FreeBSD you have to install Node and NPM. Below is a guide to installing the application.
root@ns3:~ # pkg install npm-node21-10.2.5
root@ns3:~ # pkg install node21-21.4.0_1
In discussing sitemap checker with NPM, we will create a folder with the name "build". Run the following command to create the folder.
root@ns3:~ # cd /var/FreeBSD_sitemap_checker
root@ns3:/var/FreeBSD_sitemap_checker # mkdir -p build
root@ns3:/var/FreeBSD_sitemap_checker # cd build
root@ns3:/var/FreeBSD_sitemap_checker/build #
In the /var/FreeBSD_sitemap_checker/build folder, create a file called "main.js".
root@ns3:/var/FreeBSD_sitemap_checker/build # touch main.js
In the "/var/FreeBSD_sitemap_checker/build/main.js file, type the script below

#!/usr/bin/env node
var __awaiter = (this && this.__awaiter) || function (thisArg, _arguments, P, generator) {
    function adopt(value) { return value instanceof P ? value : new P(function (resolve) { resolve(value); }); }
    return new (P || (P = Promise))(function (resolve, reject) {
        function fulfilled(value) { try { step(generator.next(value)); } catch (e) { reject(e); } }
        function rejected(value) { try { step(generator["throw"](value)); } catch (e) { reject(e); } }
        function step(result) { result.done ? resolve(result.value) : adopt(result.value).then(fulfilled, rejected); }
        step((generator = generator.apply(thisArg, _arguments || [])).next());
    });
};
var __generator = (this && this.__generator) || function (thisArg, body) {
    var _ = { label: 0, sent: function() { if (t[0] & 1) throw t[1]; return t[1]; }, trys: [], ops: [] }, f, y, t, g;
    return g = { next: verb(0), "throw": verb(1), "return": verb(2) }, typeof Symbol === "function" && (g[Symbol.iterator] = function() { return this; }), g;
    function verb(n) { return function (v) { return step([n, v]); }; }
    function step(op) {
        if (f) throw new TypeError("Generator is already executing.");
        while (_) try {
            if (f = 1, y && (t = op[0] & 2 ? y["return"] : op[0] ? y["throw"] || ((t = y["return"]) && t.call(y), 0) : y.next) && !(t = t.call(y, op[1])).done) return t;
            if (y = 0, t) op = [op[0] & 2, t.value];
            switch (op[0]) {
                case 0: case 1: t = op; break;
                case 4: _.label++; return { value: op[1], done: false };
                case 5: _.label++; y = op[1]; op = [0]; continue;
                case 7: op = _.ops.pop(); _.trys.pop(); continue;
                default:
                    if (!(t = _.trys, t = t.length > 0 && t[t.length - 1]) && (op[0] === 6 || op[0] === 2)) { _ = 0; continue; }
                    if (op[0] === 3 && (!t || (op[1] > t[0] && op[1] < t[3]))) { _.label = op[1]; break; }
                    if (op[0] === 6 && _.label < t[1]) { _.label = t[1]; t = op; break; }
                    if (t && _.label < t[2]) { _.label = t[2]; _.ops.push(op); break; }
                    if (t[2]) _.ops.pop();
                    _.trys.pop(); continue;
            }
            op = body.call(thisArg, _);
        } catch (e) { op = [6, e]; y = 0; } finally { f = t = 0; }
        if (op[0] & 5) throw op[1]; return { value: op[0] ? op[1] : void 0, done: true };
    }
};
var _this = this;
var chalk = require('chalk');
var axios = require('axios').default;
var parseString = require('xml2js').parseString;
var yargs = require("yargs");
var options = yargs
    .usage("Usage: [options] <URL>")
    .option('t', { alias: 'timeout', describe: "Timeout in seconds for a single URL", type: "number", default: 10 })
    .option('maxPerSitemap', { describe: "Maximum number of URLs to fetch from a single sitemap (default -1: visit all URLs)", type: "number", default: -1 })
    .demandCommand(1)
    .argv;
var log = console.log;
var d = function () {
    var args = [];
    for (var _i = 0; _i < arguments.length; _i++) {
        args[_i] = arguments[_i];
    }
    log.apply(void 0, args);
};
var dd = function () {
    var args = [];
    for (var _i = 0; _i < arguments.length; _i++) {
        args[_i] = arguments[_i];
    }
    log.apply(void 0, args);
    process.exit();
};
if (isNaN(options.timeout)) {
    log("Invalid value for timeout: " + options.timeout);
    process.exit(1);
}
if (isNaN(options.maxPerSitemap)) {
    log("Invalid value for maxPerSitemap: " + options.maxPerSitemap);
    process.exit(1);
}
axios.defaults.timeout = options.timeout * 1000;
var parseXml = function (xml) {
    return new Promise(function (resolve, reject) {
        parseString(xml, { explicitArray: false }, function (err, ok) {
            if (err)
                return resolve(err);
            return resolve(ok);
        });
    });
};
var isTimeoutError = function (err) {
    return err.code === 'ECONNABORTED' && err.message.indexOf('timeout') !== -1;
};
var headUrl = function (url) { return __awaiter(_this, void 0, void 0, function () {
    var result, err_1, statusOk, statusMsg;
    return __generator(this, function (_a) {
        switch (_a.label) {
            case 0:
                result = null;
                _a.label = 1;
            case 1:
                _a.trys.push([1, 3, , 4]);
                return [4 /*yield*/, axios.head(url)];
            case 2:
                result = _a.sent();
                return [3 /*break*/, 4];
            case 3:
                err_1 = _a.sent();
                result = null;
                return [3 /*break*/, 4];
            case 4:
                statusOk = result && result.status === 200;
                statusMsg = statusOk ? chalk.green('200 OK') : chalk.red('HTTP NOT OK');
                log("URL: " + url + " " + statusMsg);
                return [2 /*return*/];
        }
    });
}); };
var parseSitemap = function (url) { return __awaiter(_this, void 0, void 0, function () {
    var result, hasTimedOut, err_2, statusOk, response, data, err_3, isSitemap, isSitemapList, dataOk, statusMsg, msg, xmlValidMsg, type, urls, _i, urls_1, url_1, urls, _a, urls_2, url_2;
    return __generator(this, function (_b) {
        switch (_b.label) {
            case 0:
                hasTimedOut = false;
                _b.label = 1;
            case 1:
                _b.trys.push([1, 3, , 4]);
                return [4 /*yield*/, axios.get(url)];
            case 2:
                result = _b.sent();
                return [3 /*break*/, 4];
            case 3:
                err_2 = _b.sent();
                hasTimedOut = isTimeoutError(err_2);
                result = null;
                return [3 /*break*/, 4];
            case 4:
                statusOk = result && result.status === 200;
                response = result && result.data;
                _b.label = 5;
            case 5:
                _b.trys.push([5, 7, , 8]);
                return [4 /*yield*/, parseXml(response)];
            case 6:
                data = _b.sent();
                return [3 /*break*/, 8];
            case 7:
                err_3 = _b.sent();
                data = null;
                return [3 /*break*/, 8];
            case 8:
                isSitemap = data && data['urlset'] && data['urlset']['url'];
                isSitemapList = data && data['sitemapindex'] && data['sitemapindex']['sitemap'];
                dataOk = isSitemap || isSitemapList;
                if (statusOk) {
                    statusMsg = chalk.green('Fetched');
                }
                else {
                    msg = 'Not Fetched';
                    if (hasTimedOut) {
                        msg += ' (Timeout)';
                    }
                    statusMsg = chalk.red(msg);
                }
                xmlValidMsg = '';
                if (statusOk) {
                    if (dataOk) {
                        xmlValidMsg = chalk.green('Valid XML');
                    }
                    else {
                        xmlValidMsg = chalk.red('Invalid XML');
                    }
                }
                type = 'URL';
                if (isSitemapList) {
                    type = 'SITEMAP LIST';
                }
                else if (isSitemap) {
                    type = 'SITEMAP';
                }
                type = chalk.blue(type);
                log(type + ": " + url + " " + statusMsg + " " + xmlValidMsg);
                if (!statusOk) return [3 /*break*/, 17];
                if (!isSitemapList) return [3 /*break*/, 13];
                urls = data['sitemapindex']['sitemap'].map(function (el) { return el['loc']; });
                _i = 0, urls_1 = urls;
                _b.label = 9;
            case 9:
                if (!(_i < urls_1.length)) return [3 /*break*/, 12];
                url_1 = urls_1[_i];
                return [4 /*yield*/, parseSitemap(url_1)];
            case 10:
                _b.sent();
                _b.label = 11;
            case 11:
                _i++;
                return [3 /*break*/, 9];
            case 12: return [3 /*break*/, 17];
            case 13:
                if (!isSitemap) return [3 /*break*/, 17];
                urls = data['urlset']['url'].map(function (el) { return el['loc']; });
                if (options.maxPerSitemap > -1) {
                    urls = urls.slice(0, options.maxPerSitemap);
                }
                _a = 0, urls_2 = urls;
                _b.label = 14;
            case 14:
                if (!(_a < urls_2.length)) return [3 /*break*/, 17];
                url_2 = urls_2[_a];
                return [4 /*yield*/, headUrl(url_2)];
            case 15:
                _b.sent();
                _b.label = 16;
            case 16:
                _a++;
                return [3 /*break*/, 14];
            case 17: return [2 /*return*/];
        }
    });
}); };
(function () { return __awaiter(_this, void 0, void 0, function () {
    var url;
    return __generator(this, function (_a) {
        url = options._[0];
        parseSitemap(url);
        return [2 /*return*/];
    });
}); })();


You can continue by creating files with the names .editorconfig, .gitignore, main.ts, package.json, tsconfig.json.
root@ns3:/var/FreeBSD_sitemap_checker/build # cd ..
root@ns3:/var/FreeBSD_sitemap_checker # touch .editorconfig .gitignore main.ts package.json tsconfig.json
In your "/var/FreeBSD_sitemap_checker/.editorconfig" file, type the script below.

root = true

[*]
charset = utf-8
indent_style = space
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true

[*.{js,ts}]
indent_style = space
indent_size = 2

[**.jsx]
indent_style = space
indent_size = 2
quote_type = single
space_after_anon_function = true
curly_bracket_next_line = true
brace_style = end-expand

[**.css]
indent_style = space
indent_size = 4

[**.html]
indent_style = space
indent_size = 4
max_char = 78
brace_style = expand

[node_modules/**.js]
codepaint = false

[*.json]
indent_style = space
indent_size = 2

In your "/var/FreeBSD_sitemap_checker/.gitignore" file, type the script below.

# See http://help.github.com/ignore-files/ for more about ignoring files.

# compiled output
/dist
/tmp
/out-tsc

# Runtime data
pids
*.pid
*.seed
*.pid.lock

# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov

# Coverage directory used by tools like istanbul
coverage

# nyc test coverage
.nyc_output

# Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files)
.grunt

# Bower dependency directory (https://bower.io/)
bower_components

# node-waf configuration
.lock-wscript

# IDEs and editors
.idea
.project
.classpath
.c9/
*.launch
.settings/
*.sublime-workspace

# IDE - VSCode
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json

# misc
.sass-cache
connect.lock
typings

# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*


# Dependency directories
node_modules/
jspm_packages/

# Optional npm cache directory
.npm

# Optional eslint cache
.eslintcache

# Optional REPL history
.node_repl_history

# Output of 'npm pack'
*.tgz

# Yarn Integrity file
.yarn-integrity

# dotenv environment variables file
.env

# next.js build output
.next

# Lerna
lerna-debug.log

# System Files
.DS_Store
Thumbs.db

In your "/var/FreeBSD_sitemap_checker/package.json" file, type the script below.

{
  "devDependencies": {
    "@types/chalk": "^2.2.0",
    "@types/node": "^15.3.0",
    "@types/yargs": "^16.0.1"
  },
  "dependencies": {
    "@types/xml2js": "^0.4.8",
    "axios": "^1.6.7",
    "chalk": "^4.1.1",
    "xml2js": "^0.6.2",
    "yargs": "^17.0.1"
  },
  "name": "sitemap-check",
  "version": "0.0.2",
  "description": "A tool to parse and check sitemap URLs for problems such as broken XML or 404 errors",
  "main": "main.js",
  "scripts": {
    "test": "echo \"Error: no test specified\""
  },
  "bin": {
    "sitemap-check": "./build/main.js"
  },
  "files": [
    ".build/*"
  ],
  "keywords": [
    "sitemap",
    "sitemap-check"
  ],
  "author": "Ahmet Kun",
  "license": "ISC",
  "repository": {
    "type": "git",
    "url": "git+https://github.com/unixwinbsd/FreeBSD_sitemap_checker.git"
  },
  "bugs": {
    "url": "https://github.com/unixwinbsd/FreeBSD_sitemap_checker.git/issues"
  },
  "homepage": "https://github.com/unixwinbsd/FreeBSD_sitemap_checker#readme"
}

In your "/var/FreeBSD_sitemap_checker/tsconfig.json" file, type the script below.

{
    "compilerOptions": {
      "target": "es5",
      "module": "commonjs",
      "outDir": "build"
    }
  }

In your "/var/FreeBSD_sitemap_checker/main.ts" file, type the script below.

#!/usr/bin/env node

const chalk = require('chalk');
const axios = require('axios').default;
const parseString = require('xml2js').parseString;
const yargs = require("yargs");

const options = yargs
 .usage("Usage: [options] <URL>")
 .option('t', { alias: 'timeout', describe: "Timeout in seconds for a single URL", type: "number", default: 10 })
 .option('maxPerSitemap', { describe: "Maximum number of URLs to fetch from a single sitemap (default -1: visit all URLs)", type: "number", default: -1 })
 .demandCommand(1)
 .argv;

const log = console.log;
const d = (...args) => {
  log(...args);
};

const dd = (...args) => {
  log(...args);
  process.exit();
};

if (isNaN(options.timeout)) {
  log(`Invalid value for timeout: ${options.timeout}`);
  process.exit(1);
}
if (isNaN(options.maxPerSitemap)) {
  log(`Invalid value for maxPerSitemap: ${options.maxPerSitemap}`);
  process.exit(1);
}
axios.defaults.timeout = options.timeout * 1000;

const parseXml = (xml: string) => {
    return new Promise((resolve, reject) => {
        parseString(xml, {explicitArray: false}, function (err, ok) {
            if (err) return resolve(err);
            return resolve(ok);
        });
    });
};

const isTimeoutError = err => {
  return err.code === 'ECONNABORTED' && err.message.indexOf('timeout')!== -1;
};

const headUrl = async (url: string) => {
  let result = null;
  try {
    result = await axios.head(url);
  } catch (err) {
    result = null;
  }
  const statusOk = result && result.status === 200;
  const statusMsg = statusOk ? chalk.green('200 OK') : chalk.red('HTTP NOT OK');
  log(`URL: ${url} ${statusMsg}`);
};

const parseSitemap = async (url: string) => {
  let result,
    hasTimedOut = false;
  try {
    result = await axios.get(url);
  } catch (err) {
    hasTimedOut = isTimeoutError(err);
    result = null;
  }
  const statusOk = result && result.status === 200;
  const response = result && result.data;
  let data;
  try {
    data = await parseXml(response);
  } catch (err) {
    data = null;
  }
  const isSitemap = data && data['urlset'] && data['urlset']['url'];
  const isSitemapList = data && data['sitemapindex'] && data['sitemapindex']['sitemap'];
  const dataOk = isSitemap || isSitemapList;
  let statusMsg : string;
  if (statusOk) {
    statusMsg = chalk.green('Fetched');
  } else {
    let msg = 'Not Fetched';
    if (hasTimedOut) {
      msg += ' (Timeout)';
    }
    statusMsg = chalk.red(msg);
  }
  let xmlValidMsg = '';
  if (statusOk) {
    if (dataOk) {
      xmlValidMsg = chalk.green('Valid XML');
    } else {
      xmlValidMsg = chalk.red('Invalid XML');
    }
  }
  let type = 'URL';
  if (isSitemapList) {
    type = 'SITEMAP LIST';
  } else if (isSitemap) {
    type = 'SITEMAP';
  }
  type = chalk.blue(type);
  log(`${type}: ${url} ${statusMsg} ${xmlValidMsg}`);
  if (statusOk) {
    if (isSitemapList) {
      const urls = data['sitemapindex']['sitemap'].map(el => el['loc']);
      for (const url of urls) {
        await parseSitemap(url);
      }
    } else if (isSitemap) {
      let urls = data['urlset']['url'].map(el => el['loc']);
      if (options.maxPerSitemap > -1) {
        urls = urls.slice(0, options.maxPerSitemap);
      }
      for (const url of urls) {
        await headUrl(url);
      }
    }
  }
};

(async () => {
  const url = options._[0];
  parseSitemap(url);
})();

Run the command "npm install" to start the installation process of all the files.
root@ns3:/var/FreeBSD_sitemap_checker # npm install sitemap-check --global
After that you can run the application by running the command below.
root@ns3:/var/FreeBSD_sitemap_checker # sitemap-check https://www.unixwinbsd.site/sitemap.xml
You can see the test results in the image below.



For the complete script, you can see at "https://github.com/unixwinbsd/FreeBSD_sitemap_checker.git".

Having a site map makes it very easy for us to carry out SEO techniques. Because sitemaps are an important part of SEO and we need to make sure they are created well and valid. Apart from that, a valid sitemap will speed up the Google search console indexing process.
Iwan Setiawan

I Like Adventure: Mahameru Mount, Rinjani Mount I Like Writer FreeBSD

Post a Comment

Previous Post Next Post