Aspose.HTML for Java is an advanced HTML manipulation API to manipulate and generate HTML within the Java applications. API allows to add, delete, replace nodes, extract CSS and navigate through a document via multiple ways.
Vendor
Aspose
Company Website
Aspose.HTML for Java is an advanced HTML processing and rendering API that enables developers to manipulate, convert, parse, and generate HTML documents programmatically within Java applications. The API supports loading HTML, XHTML, MHTML, Markdown, EPUB, and SVG files and enables editing operations such as adding, deleting, and replacing nodes, extracting CSS, and navigating documents via XPath or CSS selectors. Aspose.HTML for Java includes a powerful rendering engine that converts HTML content into PDF, XPS, and raster image formats (JPEG, PNG, BMP, TIFF), using high‑fidelity layout processing. It supports rendering page setup, inserting custom headers and footers, and configuring document sandboxing. The library also offers scripting support that allows DOM manipulation through JavaScript. Additional capabilities include extracting images and SVG from websites, saving content from URLs, checking website accessibility, processing EPUB and MHTML archives, and performing Markdown to HTML (and reverse) conversions. Aspose.HTML for Java is designed for enterprise-grade automation scenarios such as document processing pipelines, reporting systems, content migration, web archiving, and digital publishing.
Features
HTML Creation, Editing & Navigation
- Create HTML pages from scratch or load existing files.
- Load HTML from files, URLs, streams, or strings.
- Insert, replace, or delete HTML nodes.
- Extract CSS styling information.
- Navigate nodes via XPath, DOM traversal, or CSS selector queries.
- Edit and manipulate XHTML, MHTML, Markdown, EPUB, and SVG content.
- Implement W3C specifications, HTML DOM, CSS, and scripting with JavaScript integration.
- Configure sandbox mode to influence document rendering behavior. Format Conversion & Rendering Convert HTML and related formats to:
- PDF, XPS
- JPEG, PNG, BMP, TIFF, GIF
- Markdown (MD)
- MHTML → PDF/JPEG/PNG, EPUB → PDF/JPG, SVG → PDF/PNG
- HTML to HTML conversions with transformations
- High‑fidelity rendering engine ensures accurate layout and text flow. Markdown Support
- Convert HTML to Markdown.
- Convert Markdown to HTML.
- Supports authentic Markdown, GitLab Flavored Markdown, and extensible rule configuration. EPUB & MHTML Processing
- Load and convert EPUB to PDF, JPG, XPS, and images.
- Load and convert MHTML to PDF and raster formats.
- Ideal for digital publishing and content archiving. Web Content Extraction
- Extract images from websites.
- Extract SVG files from a website.
- Save files directly from URL.
- Extract all anchor or node types using selector queries.
- Check web accessibility to ensure compliance. Image & Raster Rendering
- Convert HTML pages to TIFF, BMP, PNG, JPEG with high fidelity.
- Customize raster output (compression, page settings, resolution). Node Navigation Example (Official): Extract anchor nodes using querySelectorAll:
- Load website HTML via URL
- Retrieve all elements
- Print text and href values
Benefits
- Eliminates dependency on external browsers or rendering engines.
- Provides a unified toolset for converting and manipulating HTML content.
- Ensures consistent, accurate document rendering across output formats.
- Streamlines data extraction, document generation, and content transformation workflows.
- Ideal for enterprise applications requiring automated HTML-to-document processing.
- Fully Java‑based, runs on any platform supporting JVM.
- Supports dynamic HTML manipulation via integrated JavaScript scripting.
- Highly scalable for batch conversions and server‑side workflows.