I’m parsing a web page written in spanish with scrapy . The problem is

Question

0

Asked: May 18, 20262026-05-18T04:36:13+00:00 2026-05-18T04:36:13+00:00

I’m parsing a web page written in spanish with scrapy . The problem is

0

I’m parsing a web page written in spanish with scrapy. The problem is that I can’t save the text because of the wrong encoding.

This is the parse function:

def parse(self, response):
        hxs = HtmlXPathSelector(response)
        text = hxs.select('//text()').extract() # Ex: [u' Sustancia mineral, m\xe1s o menos dura y compacta, que no es terrosa ni de aspecto met\xe1lico.']
        s = "".join(text)
        db = dbf.Dbf("test.dbf", new=True)
        db.addField(
            ("WORD", "C", 25),
            ("DATA", "M", 15000), # Memo field
        )
        rec = db.newRecord()
        rec["WORD"] = "Stone"
        rec["DATA"] = s
        rec.store()
        db.close()

When I try to save it to a db(a dbf db) I get an ASCII(128) error. I tried decoding/encoding using ‘utf-8’ and ‘latin1’ but with no success.

Edit:

To save the db I’m using dbfpy. I added the dbf saving code in the parse function above.

This is the error message:

Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1179, in mainLoop
    self.runUntilCurrent()
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 778, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 280, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 354, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 371, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File "/home/katy/Dropbox/proyectos/rae/rae/spiders/rae_spider.py", line 54, in parse
    rec.store()
  File "/home/katy/Dropbox/proyectos/rae/rae/spiders/record.py", line 211, in store
    self.dbf.append(self)
  File "/home/katy/Dropbox/proyectos/rae/rae/spiders/dbf.py", line 214, in append
    record._write()
  File "/home/katy/Dropbox/proyectos/rae/rae/spiders/record.py", line 173, in _write
    self.dbf.stream.write(self.toString())
  File "/home/katy/Dropbox/proyectos/rae/rae/spiders/record.py", line 223, in toString
    for (_def, _dat) in izip(self.dbf.header.fields, self.fieldData)
  File "/home/katy/Dropbox/proyectos/rae/rae/spiders/fields.py", line 215, in encodeValue
    return str(value)[:self.length].ljust(self.length)
exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 18: ordinal not in range(128)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T04:36:14+00:00

Please, don’t remember that DBF files don’t support unicode at all
and I also suggest to use Ethan Furman’s dbf package (link in another answer)

You can use only ‘table = dbf.Table(‘filename’) to guess real type.

Example of usage with non cp437 encoding is:

#!/usr/bin/env python
# coding: koi8-r
import dbf
text = 'текст в koi8-r'
table = dbf.Table(':memory:', ['test M'], 128, False, False, True, False, 'dbf', 'koi8-r')
record = table.append()
record.test = text

Please note following information about version 0.87.14 and ‘dbf’ table type:

With DBF package 0.87.14 you can found exception ‘TypeError: ord() excepted character…’ at “…/site-packages/dbf/tables.py”, line 686

Only ‘dbf’ table type has affected with this tupo!

DISCLAIMER: I don’t know real correct values to use in following values, so don’t blame me about incompatibility with this “fix”.

You can to replace values ” to ‘\0’ (at least) at lines 490 and 491 to make this test workable.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m parsing a web page written in spanish with scrapy . The problem is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply