package
0.0.0-20240228044302-56ad08b2fa1c
Repository: https://github.com/parsiya/parsia-code.git
Documentation: pkg.go.dev
# README
Gophercises - 4 - Link
Problem
Solutions
Lessons Learned
/x/net/html
- Read the package example: https://godoc.org/golang.org/x/net/html
- Token struct:
type Token struct { Type TokenType DataAtom atom.Atom Data string Attr []Attribute }
Type
can give us information about what kind of token it is. Important ones for this exercise are:StartTagToken
:<a href>
EndTagToken
:</a>
TextToken
: Text in between. Using text nodes will skip other elements inside the link.
Data
contains the data in the node.- Anchor tags:
a
. - Text nodes: The actual text of the node.
- Anchor tags:
- Attribute is of type:
type Attribute struct { Namespace, Key, Val string }
Key
is the name of the attribute andValue
is the value.<a href="example.net">
:key
=href
andvalue
=example.net
.
Parse
Parse is easy.
- Go through the nodes. If you reach a start anchor tag, set the
capturing
flag to start capturing. Store thehref
. - While capturing, add the text of every text node (trim all white space but add a space between nodes).
- After reaching the end anchor tag, stop capturing and store the link.
- Add link to the links slice.
Issues:
- Nested links are ignored. Child links are not stored and their text is stored as part of the parent link.
- For an example run
go run main.go -f ex5.html
.
- For an example run
strings.Builder
var sb strings.Builder // Create the builder.
sb.WriteString("whatever") // Write to it. We can use fmt.Sprintf as param too.
return sb.String() // Get the final string.