xidel: forgot to put a description

Add cli tag
Some uniformization
2023-10-29 23:01:13 +01:00 · 2023-10-29 22:51:03 +01:00 · 2023-10-29 22:36:26 +01:00 · 2023-10-29 22:32:41 +01:00 · 2023-10-29 22:27:28 +01:00 · 2023-10-29 22:15:05 +01:00
10 changed files with 104 additions and 7 deletions
--- a/content/cheat-sheets/git-fr.md
+++ b/content/cheat-sheets/git-fr.md
@@ -4,7 +4,7 @@ Date: 2019-04-22 17:00
 Modified: 2023-05-14 20:00+02:00
 Author: Fabrice
 Category: antisèches
-Tags: git, termtosvg
+Tags: git, termtosvg, cli
 Slug: git-tricks
 Header_Cover: ../images/covers/water.jpg
 Summary: Une compilation de commandes git que j’utilise ponctuellement
--- a/content/cheat-sheets/git.md
+++ b/content/cheat-sheets/git.md
@@ -4,7 +4,7 @@ Date: 2019-04-22 17:00
 Modified: 2023-05-14 20:00+2:00
 Author: Fabrice
 Category: cheat sheets
-Tags: git, termtosvg
+Tags: git, termtosvg, cli
 Slug: git-tricks
 Header_Cover: images/covers/water.jpg
 Summary: A compilation of some `git` tricks I keep forgetting.
--- a/content/cheat-sheets/wget-fr.md
+++ b/content/cheat-sheets/wget-fr.md
@@ -3,7 +3,7 @@ Title: wget/curl
 Date: 2022-07-25 13:45 CEST
 Author: Fabrice
 Category: cheat sheets
-Tags: wget, curl
+Tags: wget, curl, cli
 Slug: wget-curl
 Header_Cover: ../images/covers/speedboat.jpg
 Summary: Quelques commandes wget et curl utiles dans la vie de tous les jours.
--- a/content/cheat-sheets/wget.md
+++ b/content/cheat-sheets/wget.md
@@ -3,7 +3,7 @@ Title: wget/curl
 Date: 2022-07-25 13:45 CEST
 Author: Fabrice
 Category: cheat sheets
-Tags: wget, curl
+Tags: wget, curl, cli
 Slug: wget-curl
 Header_Cover: images/covers/speedboat.jpg
 Summary: Some useful wget and curl commands, such as downloading a repository.
--- a/content/images/covers/aix-en-provence.jpg
+++ b/content/images/covers/aix-en-provence.jpg
--- a/content/images/covers/orgue.jpg
+++ b/content/images/covers/orgue.jpg
--- a/content/software/nvim-latex.md
+++ b/content/software/nvim-latex.md
@@ -1,7 +1,7 @@
 ---
 Title: Neovim as a LaTex Development Environment
 Date: 2023-10-14 12:00:00+0200
-Date: 2023-10-14 17:00:00+0200
+Modified: 2023-10-14 17:00:00+0200
 Lang: en
 Author: Fabrice
 Category: software
--- a/content/software/pass-fr.md
+++ b/content/software/pass-fr.md
@@ -5,7 +5,7 @@ Modified: 2019-04-24 11:12
 Lang: fr
 Author: Fabrice
 Category: programmes
-Tags: pass, git
+Tags: pass, git, cli
 Slug: password-store 
 Header_Cover: ../images/covers/clovers.jpg
 Summary: Un gestionnaire de mots de passe simple qui repose sur gpg, et synchronisé via git.
--- a/content/software/pass.md
+++ b/content/software/pass.md
@@ -4,7 +4,7 @@ Date: 2019-04-22 19:00
 Modified: 2019-04-23 14:24
 Author: Fabrice
 Category: software
-Tags: pass, git
+Tags: pass, git, cli
 Slug: password-store 
 Header_Cover: images/covers/clovers.jpg
 Summary: A simple password manager that relies on gpg, and synchronized with git.
--- a/content/software/xidel.md
+++ b/content/software/xidel.md
@@ -0,0 +1,97 @@
+---
+Title: Manipulate XML/HTML with Xidel
+Date: 2023-10-29 22:00
+Lang: en
+Author: Fabrice
+Category: software
+Tags: xidel, html, xml, cli
+Slug: xidel
+table-of-contents: false
+Header_Cover: ../images/covers/aix-en-provence.jpg
+Summary: An example-based approach on how to easily parse XML/HTML files and stubs with Xidel.
+---
+
+You may know [jq](https://jqlang.github.io/jq/) process
+[json](https://www.json.org/json-en.html) files in command line. At some point I
+was looking for the simplicity of such a swiss-knife tool for
+[XML](https://www.w3.org/XML/)/[HTML](https://html.spec.whatwg.org/multipage/),
+mostly for simple usages that don't require me to resort to a full-fledged
+scripting language such as [python](https://python.org) or dabbing in [regular
+expressions](https://en.wikipedia.org/wiki/Regular_expression) that will never
+work because of a carriage return at an unexpected place, and guess what? It exists!
+
+This tool is [Xidel](https://www.videlibri.de/xidel.html). It is a bit more than
+that as it also allows downloading files, which enables extra features such as
+navigating a site following specific links. You can find more about it in the
+[list of examples](https://www.videlibri.de/xidel.html#examples) given in the
+project website, which is a nice introduction to the possibilities of the tool.
+
+However, I mainly use it for simple cases, where I mix-and-match the best of
+both worlds: a graphical client (such as
+[firefox](https://www.mozilla.org/en-US/firefox/new/)), and a CLI tool, which in
+this case is Xidel.
+
+To do this, we will see a simple use case, where filtering by hand can be a bit
+tedious. Let us assume that we want to obtain the URL list of pdf versions of
+Victor Hugo's novels in French from Wikisource if available.
+
+We start from this page: <https://fr.wikisource.org/wiki/Auteur:Victor_Hugo>,
+that lists which is available on <https://fr.wikisource.org>.
+
+Now, we can simply select the “Romans” section as it is and copy it. Normally
+you can check that you indeed have the html in your clipboard by typing
+`wl-paste -t text/html` on wayland or `xclip -selection clipboard -o -t
+text/html` on X11 if you have xclip installed. In the following we will assume a
+Wayland environment with
+[wl-clipboard](https://github.com/bugaevc/wl-clipboard), but it should also work
+with `xclip` (not tested, please let me know how it behaves).
+
+Now that's good, but we now need to filter and parse it, we can start with a
+simple test:
+
+```bash
+wl-paste -t text/html | xidel -e '//a/@href'
+```
+
+Which will show us the target of each links in our selection. To explain the
+syntax, the option `-e` tells `xidel` to extract the content that is passed as
+input, which is either a
+[template](https://benibela.de/documentation/internettools/extendedhtmlparser.THtmlTemplateParser.html)
+or following the [XPath](https://en.wikipedia.org/wiki/XPath) syntax to parse
+the [DOM](https://en.wikipedia.org/wiki/Document_Object_Model) tree. In the
+above example we used the latter, to obtain every anchors (`//a`) and then their
+`href` attribute with `@href`.
+From there we can see that pdf versions contains the string… “pdf”.
+Now, we can see another nice part of XPath, is that we can filter using
+functions:
+
+```bash
+wl-paste -t text/html | xidel -e '//a/@href[contains(., "pdf")]'
+```
+
+The last magical part here, is the dot notation, which refers to the current
+item “value”. I’m not the most familiar with the subtleties here, and you can
+refer to this stackoverflow [short answer](https://stackoverflow.com/a/38240971)
+or long answer just above for more details.
+
+You can also edit the way the filtering is done, for instance if the anchors you
+are targeting are named “Download”, you can obtain the links with:
+```bash
+wl-paste -t text/html | xidel -e '//a[contains(., "Download")]/@href'
+```
+
+If you want strict equality because there are “Download PDF” and “Download epub”
+links for instance:
+
+```bash
+wl-paste -t text/html | xidel -e '//a[text()="Download PDF"]/@href'
+```
+
+To go further, you can also pass HTTP headers and cookies to `xidel` via the
+`--header/-H` and `--load-cookies` options respectively. It is also possible to
+use the `--follow/-f` command to hop in the pages that matches (using the same
+syntax as above) to obtain a link from it… or event directly download it with
+the `--download` option and so on.
+
+In this blogpost we only look at a local version of pre-filtered content using
+you web browser, but the possibilities are endless!
Author	SHA1	Message	Date
Fabrice Mouhartem	04f8994933	xidel: forgot to put a description	2023-10-29 23:01:13 +01:00
Fabrice Mouhartem	6a2670c94d	Add cli tag	2023-10-29 22:51:03 +01:00
Fabrice Mouhartem	7e05fb1432	Some uniformization	2023-10-29 22:36:26 +01:00
Fabrice Mouhartem	3912a2a74c	Add cover image for xidel	2023-10-29 22:32:41 +01:00
Fabrice Mouhartem	1aff1e7336	Orgue cover image	2023-10-29 22:27:28 +01:00
Fabrice Mouhartem	ecd50343a4	Duplicate date	2023-10-29 22:15:05 +01:00
Fabrice Mouhartem	d79c9e9d47	+ CLI tag	2023-10-29 22:13:57 +01:00
Fabrice Mouhartem	6cf9edd955	New article about xidel	2023-10-29 22:11:40 +01:00