libhtml – simple HTML parsing library

DESCRIPTION

libhtml is a minimal, open source (ISC-licensed) C library for parsing, serialising, and manipulating HTML-4.01-strict and XHTML-1.0-strict documents. You may enjoy this library if you're interested in tight correctness of input and output data.

Why? The predominant open source HTML parser, libxml, is enormous and complicated. For our needs, we wanted a small, strongly-validating parser focussing only on strict HTML.

The libhtml library is a BSD.lv Project member.

SOURCES

Sources correctly build and install on OpenBSD, NetBSD, and GNU/Linux operating systems, tested variously on generic i386, AMD64, and DEC Alpha. The current version is 0.3.0.

Download

Current source libhtml.tar.gz (md5)
Archived source archive/
On-line source cvsweb

Note: this library is heavily under development! Please contact the author if you wish to use it.

DOCUMENTATION

The manual is generated automatically and refers to the current snapshot. Examples are distributed with the source package.

html(3) simple HTML parsing library
test.c example interfacing utility

CONTACT

For all issues related to libhtml, contact Kristaps Dzonsons, kris...@bsd.lv.

NEWS

24-06-2010: version 0.3.0

Added hcache functionality (see html manual), which deprecates htmlc. Hcaches are extremely useful, allowing, for example, consolidating pages in multiple languages or different styles under uniform numeric identifiers.

22-04-2010: version 0.2.14

Added the hnode_alloc_chars function and changed the behaviour of hnode_alloc_text.

17-03-2010: version 0.2.13

Fixed segmentation fault when validating comment nodes.

See cvsweb for historical notes.

Copyright © 2009, 2010 Kristaps Dzonsons, $Date: 2010/06/24 12:54:03 $