While it is still unclear what you mean by a “partial query”, the code below can do that, simply by you redefining a partial query in the function filter_out_common_queries
. E.g. if you are looking for an exact match of the query in search.txt
, you could replace # add your logic here
by return [' '.join(querylist), ]
.
import datetime as dt
from collections import defaultdict
def filter_out_common_queries(querylist):
# add your logic here
return querylist
queries_time = defaultdict(list) # personally, I'd use 'set' as the default factory
with open('log.txt') as f:
for line in f:
fields = [ x.strip() for x in line.split(',') ]
timestamp = dt.datetime.strptime(fields[0], "%H:%M:%S")
queries_time[fields[1]].append(timestamp)
with open('search.txt') as inputf, open('search_output.txt', 'w') as outputf:
for line in inputf:
fields = [ x.strip() for x in line.split(',') ]
timestamp = dt.datetime.strptime(fields[0], "%H:%M:%S")
queries = filter_out_common_queries(fields[1].split()) # "adidas watches for men" -> "adidas" "watches" "for" "men". "for" is a very generic keyword. You should do well to filter these out
results = []
for q in queries:
poss_timestamps = queries_time[q]
for ts in poss_timestamps:
if timestamp - dt.timedelta(seconds=15) <= ts <= timestamp:
results.append(q)
outputf.write(line.strip() + " - {}\n".format(results))
Output based on your input data:
19:00:15 , mouse , FALSE - []
19:00:15 , branded luggage bags and trolley , TRUE - []
19:00:15 , Leather shoes for men , FALSE - []
19:00:15 , printers , TRUE - []
19:00:16 , adidas watches for men , TRUE - ['adidas', 'adidas', 'adidas', 'adidas', 'adidas', 'adidas']
19:00:16 , Mobile Charger Stand/Holder black , FALSE - ['black']
19:00:16 , watches for men , TRUE - []
Remark that a match for ‘black’ in “Mobile Charger Stand/Holder black” was found. That’s because in the code above, I looked for each separate word in itself.
Edit: To implement your comment, you would redefine filter_out_common_queries
like so:
def filter_out_common_queries(querylist):
basequery = ' '.join(querylist)
querylist = []
for n in range(2,len(basequery)+1):
querylist.append(basequery[:n])
return querylist
3
solved comparing two text file and find out the related word in python