Skip to content

API Reference

The HTML library provides the following core components:

ComponentDescriptionDocumentation
Package FunctionsConvenience functions for one-time callsFunctions
ProcessorProcessor instance for reusing resources and cacheProcessor
ConfigConfiguration struct and presetsConfig
Output FormatsMarkdown, JSON outputOutput Formats
Link ExtractionStandalone link extraction APILink Extraction
Batch ProcessingConcurrent batch extractionBatch Processing
InterfacesExtractor, StatsProvider, etc.Interfaces
TypesResult, ImageInfo, etc.Types
Constants & ErrorsDefaults, sentinel errorsConstants & Errors
Audit SystemAudit pipeline and SinksAudit System

API Overview

Two Calling Modes

text
┌─────────────────────────────────────────┐
│         Package Functions (Convenience)  │
│  html.Extract(data) → *Result, error    │
│  Uses sync.Pool internally              │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│         Processor (Instance Mode)        │
│  p, _ := html.New(cfg)                  │
│  defer p.Close()                        │
│  result, err := p.Extract(data)         │
│  ✓ Cache reuse  ✓ Statistics  ✓ Audit   │
└─────────────────────────────────────────┘

Function Naming Convention

PatternNamingExample
BasicExtract*Extract, ExtractText
From fileExtract*FromFileExtractFromFile
With contextExtract*WithContextExtractWithContext
From file + contextExtract*FromFileWithContextExtractFromFileWithContext

Module Information

  • Module path: github.com/cybergodev/html
  • Go version: 1.25+
  • Dependencies: golang.org/x/net, golang.org/x/text