[Solved] What is the python vesion of this?


Before getting into any of this: As chepner pointed out in a comment, this input looks like, and therefore is probably intended to be, JSON. Which means you shouldn’t be parsing it with regular expressions; just parse it as JSON:

>>> s=""' {"text": "Love this series!\ufeff", "time": "Hace 11 horas", "author": "HasRah", "cid": "UgyvXmvSiMjuDrOQn-l4AaABAg"}'''
>>> obj = json.loads(s)
>>> obj['author']
'HasRah'

Actually, it’s not clear whether your input is a JSON file (a file containing one JSON text), or a JSONlines file (a file containing a bunch of lines, each of which is a JSON text with no embedded newlines).1

For the former, you want to parse it like this:

obj = json.load(st)

For the latter, you want to loop over the lines, and parse each one like this:

for line in st:
    obj = json.loads(line)

… or, alternatively, you can get a JSONlines library off PyPI.


But meanwhile, if you want to understand what’s wrong with your code:

The error message is telling you the problem, although maybe not in the user-friendliest way:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 148, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

See the docs for search make clear:

re.search(pattern, string, flags=0)

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance…

You haven’t passed it a string, you’ve passed it a list of strings. That’s the whole point of readlines, after all.


There are two obvious fixes here.

First, you could read the whole file into a single string, instead of reading it into a list of strings:

s = st.read()
u = re.search('%s(.*)%s' % (start, end), s).group(1)

Alternatively, you could loop over the lines, trying to match each one. And, if you do this, you still don’t need readlines(), because a file is already an iterable of lines:

for line in st:
    u = re.search('%s(.*)%s' % (start, end), line).group(1)

While we’re at it, if any of your lines don’t match the pattern, this is going to raise an AttributeError. After all, search returns None if there’s no match, but then you’re going to try to call None.group(1).

There are two obvious fixes here as well.

You could handle that error:

try:
    u = re.search('%s(.*)%s' % (start, end), line).group(1)
except AttributeError:
    pass

… or you could check whether you got a match:

m = re.search('%s(.*)%s' % (start, end), line)
if m:
    u = m.group(1)

1. In fact, there are at least two other formats that are nearly, but not quite, identical to JSONlines. I think that if you only care about reading, not creating files, and you don’t have any numbers, you can parse all of them with a loop around json.loads or with a JSONlines library. But if you know who created the file, and know that they intended it to be, say, NDJ rather than JSONlines, you should read the docs on NDJ, or get a library made for NDJ, rather than just trusting that some guy on the internet thinks it’s OK to treat it as JSONlines.

0

solved What is the python vesion of this?