Christoph Thiede uploaded a new version of HelpSystem-Core to project The Inbox: http://source.squeak.org/inbox/HelpSystem-Core-ct.129.mcz
==================== Summary ====================
Name: HelpSystem-Core-ct.129 Author: ct Time: 2 March 2020, 10:32:56.341949 am UUID: 65f1da58-ae12-c14a-a12c-bc9a3b7a08b3 Ancestors: HelpSystem-Core-mt.119
Improves parsing of html help topics
- Detect relative links and convert them to absolute version - Add support for a cleanseBlock that will be applied to the html body source - Trim leading and trailing blanks from the text - Cache contents
Small refactoring:
- Remove unnecessary duplicate parse logic from #subtopics - Again in #subtopics, don't pass the result of [self fooBlock] but the instance variable fooBlock instead. Don't manifest default values ...
=============== Diff against HelpSystem-Core-mt.119 ===============
Item was changed: AbstractHelpTopic subclass: #HtmlHelpTopic + instanceVariableNames: 'url level selectBlock convertBlock cleanseBlock document contents subtopicUrls subtopics' - instanceVariableNames: 'url document selectBlock convertBlock subtopicUrls subtopics level' classVariableNames: '' poolDictionaries: '' category: 'HelpSystem-Core-Model'!
Item was added: + ----- Method: HtmlHelpTopic>>cleanseBlock (in category 'accessing') ----- + cleanseBlock + "Answer the block that will be applied to the HTML body source in order to filter relevant information." + + ^ cleanseBlock ifNil: [ [:contents | contents] ]!
Item was added: + ----- Method: HtmlHelpTopic>>cleanseBlock: (in category 'accessing') ----- + cleanseBlock: aBlock + "Indicate the block that will be applied to the HTML body source in order to filter relevant information." + + cleanseBlock := aBlock.!
Item was changed: ----- Method: HtmlHelpTopic>>contents (in category 'accessing') ----- contents
+ | start end source text rootUrl | + contents ifNotNil: [^ contents]. + + start := self document findString: '<body'. - | start end | - start := (self document findString: '<body'). start := (self document findString: '>' startingAt: start) + 1. end := self document findString: '</body>' startingAt: start. start > end ifTrue: [^ self document]. + source := self document copyFrom: start to: end - 1. + source := self cleanseBlock value: source. + text := (source copyReplaceAll: String cr with: '<br>') + asTextFromHtml. + + "Convert relative URLs (https://www.w3.org/TR/WD-html40-970917/htmlweb.html#h-5.1.2)" + rootUrl := url readStream in: [:urlStream | + | host scheme | + scheme := urlStream upToAll: '://'. + host := urlStream upTo: $/. + scheme , '://' , host]. + (text runs gather: #yourself) withoutDuplicates + select: [:attribute | attribute isKindOf: TextURL] + thenDo: [:attribute | + (attribute info beginsWith: '..') + ifTrue: [attribute url: self url , (attribute info skip: 2)]. + (attribute info beginsWith: '/') + ifTrue: [attribute url: rootUrl , attribute info]]. + + ^ contents := text withBlanksTrimmed! - ^ ((self document copyFrom: start to: end - 1) - copyReplaceAll: String cr with: '<br>') - asTextFromHtml!
Item was changed: ----- Method: HtmlHelpTopic>>subtopics (in category 'accessing') ----- subtopics
- | start end urls | subtopics ifNotNil: [^ subtopics]. - urls := OrderedCollection new. - - start := self document findString: '<a '. - [start > 0] whileTrue: [ - start := self document findString: 'href' startingAt: start. - start := (self document findString: '"' startingAt: start) + 1. - end := self document findString: '"' startingAt: start. - urls addIfNotPresent: (self document copyFrom: start to: end - 1). - start := self document findString: '<a ' startingAt: start.]. - subtopics := (self subtopicUrls collect: [:aUrl | self class new level: self level + 1; url: aUrl; + selectBlock: selectBlock; + convertBlock: convertBlock; + cleanseBlock: cleanseBlock]). - selectBlock: self selectBlock; - convertBlock: self convertBlock]). Project current uiProcess == Processor activeProcess ifTrue: [self fetchSubtopics]. ^ subtopics!
squeak-dev@lists.squeakfoundation.org