Abstract:
In the report, the automatic fill of the text of micro-content class web page was analyzed, and the page region technique and the structural features of HTML were used to establish a text extraction method based on page region and auto fill (RAF), which can be used for extracting the text of micro-content class web page automatically. The experiments with the extraction tool were performed, the results indicated that the method can effectively and accurately extract the text of micro-content class web page.