TextBlob
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
TextBlob: Simplified Text Processing — TextBlob 0.10.0-dev documentation
I have been using TextBlob, a package for Python (https://pypi.python.org/pypi/textblob) for translating articles to different language .
After reading their docs, I got to know that TextBlob makes use of Google Translate. Since google translate is not a free service, I wanted to know whether there is any usage limit on translating articles using TextBlob services?
Source: (StackOverflow)
I am working on a text search project, and using text blob to search for sentences from text.
TextBlob pulls all the sentences with the keywords efficiently. However for effective research i also want to pull out one sentence before and one after which I am unable to figure.
Below is the code I am using:
def extraxt_sents(Text,word):
search_words = set(word.split(','))
sents = ''.join([s.lower() for s in Text])
blob = TextBlob(sents)
matches = [str(s) for s in blob.sentences if search_words & set(s.words)]
print search_words
print(matches)
Source: (StackOverflow)
I am using TextBlob's NaiveBayesclassifier
for text analysis according to the given themes that I have chosen.
The data is huge(about 3000 entries).
Though I was able to get a result, I'm not able to save it for future use without calling that function again and waiting hours till the processing gets complete.
I tried pickling by the following method
ab = NaiveBayesClassifier(data)
import pickle
object = ab
file = open('f.obj','w') #tried to use 'a' in place of 'w' ie. append
pickle.dump(object,file)
and I got an error, which is as follows:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\pickle.py", line 1370, in dump
Pickler(file, protocol).dump(obj)
File "C:\Python27\lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 419, in save_reduce
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Python27\lib\pickle.py", line 663, in _batch_setitems
save(v)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 600, in save_list
self._batch_appends(iter(obj))
File "C:\Python27\lib\pickle.py", line 615, in _batch_appends
save(x)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 562, in save_tuple
save(element)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Python27\lib\pickle.py", line 662, in _batch_setitems
save(k)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 501, in save_unicode
self.memoize(obj)
File "C:\Python27\lib\pickle.py", line 247, in memoize
self.memo[id(obj)] = memo_len, obj
MemoryError
I also tried with sPickle but it also resulted in errors such as:
#saving object with function sPickle.s_dump
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sPickle.py", line 22, in s_dump
for elt in iterable_to_pickle:
TypeError: 'NaiveBayesClassifier' object is not iterable
#saving object with function sPickle.s_dump_elt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sPickle.py", line 28, in s_dump_elt
pickled_elt_str = dumps(elt_to_pickle)
MemoryError: out of memory
Can anyone tell me what I have to do to save the object?
Or is there anyway by which is save the results of the classifier for future use?
Source: (StackOverflow)
My goal is to create a system that will be able to take any random text, extract sentences, remove punctuations, and then, on the bare sentence (one of them), to randomly replace NN or VB tagged words with their meronym, holonym or synonim as well as with a similar word from a WordNet synset. There is a lot of work ahead, but I have a problem at the very beginning.
For this I use pattern and TextBlob packages. This is what I have done so far...
from pattern.web import URL, plaintext
from pattern.text import tokenize
from pattern.text.en import wordnet
from textblob import TextBlob
import string
s = URL('http://www.fangraphs.com/blogs/the-fringe-five-baseballs-most-compelling-fringe-prospects-35/#more-157570').download()
s = plaintext(s, keep=[])
secam = (tokenize(s, punctuation=""))
simica = secam[15].strip(string.punctuation)
simica = simica.replace(",", "")
simica = TextBlob(simica)
simicaTg = simica.words
synsimica = wordnet.synsets(simicaTg[3])[0]
djidja = synsimica.hyponyms()
Now everything works the way I want but when I try to extract the i.e. hyponym from this djidja
variable it proves to be impossible since it is a Synset
object, and I can't manipulate it anyhow.
Any idea how to extract a the very word that is reported in hyponyms list (i.e. print(djidja[2])
displays Synset(u'bowler')
...so how to extract only 'bowler'
from this)?
Source: (StackOverflow)
I am trying to install this one: https://pypi.python.org/pypi/textblob-aptagger and it says to use this code - but I do not know where to use it (command line and Python console do not work):
$ pip install -U textblob-aptagger
I installed easy_install and pip using exe files from
http://www.lfd.uci.edu/~gohlke/pythonlibs/
So when I use the command:
$ pip install -U textblob-aptagger
in the Python console I get this error:
File "<console>", line 1
$ pip install -U textblob-aptagger
^
SyntaxError: invalid syntax
Where should I use this installation command?
Source: (StackOverflow)
I am trying to implement a Naive Bayes algorithm to read tweets from a csv file and classify them into categories i define (for example: tech, science, politics)
I want to use NLTK's naive bayes classification algorithm but the example is not anywhere close to what i need to do.
One of my biggest confusion is how do we improve the classification accuracy of NB?
**I am hoping to get some guidance on the detailed steps i need take to do the classification.
- do i have to create separate csv files for each category where i
manually put the tweets in there?
- How do i train the algorithm if i do the above and how does the algorithm test?**
I have been researching online and found some brief examples like TextBlob which makes use if NLTK's NB algorithm to do sentiment classification of Tweets. it is simple to understand but difficult to tweak for beginners.
http://stevenloria.com/how-to-build-a-text-classification-system-with-python-and-textblob/
In his example from the link above, how does he implement the test when he already put the sentiment next to the tweets? I thought to test, we should hide the second argument.
train = [
('I love this sandwich.', 'pos'),
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
("What an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('He is my sworn enemy!', 'neg'),
('My boss is horrible.', 'neg')
]
test = [
('The beer was good.', 'pos'),
('I do not enjoy my job', 'neg'),
("I ain't feeling dandy today.", 'neg'),
("I feel amazing!", 'pos'),
('Gary is a friend of mine.', 'pos'),
("I can't believe I'm doing this.", 'neg')
]
Source: (StackOverflow)
I'm writing a Python command line utility that involves converting a string into a TextBlob, which is part of a natural language processing module. Importing the module is very slow, ~300 ms on my system. For speediness, I created a memoized function that converts text to a TextBlob only the first time the function is called. Importantly, if I run my script over the same text twice, I want to avoid reimporting TextBlob and recomputing the blob, instead pulling it from the cache. That's all done and works fine, except, for some reason, the function is still very slow. In fact, it's as slow as it was before. I think this must be because the module is getting reimported even though the function is memoized and the import statement happens inside the memoized function.
The goal here is to fix the following code so that the memoized runs are as speedy as they ought to be, given that the result does not need to be recomputed.
Here's a minimal example of the core code:
@memoize
def make_blob(text):
import textblob
return textblob.TextBlob(text)
if __name__ == '__main__':
make_blob("hello")
And here's the memoization decorator:
import os
import shelve
import functools
import inspect
def memoize(f):
"""Cache results of computations on disk in a directory called 'cache'."""
path_of_this_file = os.path.dirname(os.path.realpath(__file__))
cache_dirname = os.path.join(path_of_this_file, "cache")
if not os.path.isdir(cache_dirname):
os.mkdir(cache_dirname)
cache_filename = f.__module__ + "." + f.__name__
cachepath = os.path.join(cache_dirname, cache_filename)
try:
cache = shelve.open(cachepath, protocol=2)
except:
print 'Could not open cache file %s, maybe name collision' % cachepath
cache = None
@functools.wraps(f)
def wrapped(*args, **kwargs):
argdict = {}
# handle instance methods
if hasattr(f, '__self__'):
args = args[1:]
tempargdict = inspect.getcallargs(f, *args, **kwargs)
for k, v in tempargdict.iteritems():
argdict[k] = v
key = str(hash(frozenset(argdict.items())))
try:
return cache[key]
except KeyError:
value = f(*args, **kwargs)
cache[key] = value
cache.sync()
return value
except TypeError:
call_to = f.__module__ + '.' + f.__name__
print ['Warning: could not disk cache call to ',
'%s; it probably has unhashable args'] % (call_to)
return f(*args, **kwargs)
return wrapped
And here's a demonstration that the memoization doesn't currently save any time:
❯ time python test.py
python test.py 0.33s user 0.11s system 100% cpu 0.437 total
~/Desktop
❯ time python test.py
python test.py 0.33s user 0.11s system 100% cpu 0.436 total
This is happening even though the function is correctly being memoized (print statements put inside the memoized function only give output the first time the script is run).
I've put everything together into a GitHub Gist in case it's helpful.
Source: (StackOverflow)
hello it's my first question so my deepest apologies if i didn't go by a rule or so, i 'll try my best to make it as clear as possible
i am trying to use textBlob with a text file input all examples i found online were of input in this sense
wiki = TextBlob("Python is a high-level, general-purpose programming language.")
wiki.tage
whilst i tried something like this
from textblob import TextBlob
file=open("1.txt");
t=file.read();
print(type(t))
bobo = TextBlob(t)
bobo.tags
i hope nothing's wrong , thanx a lot
Edit: the code i tried did not work
Source: (StackOverflow)
I am currently working on pyspark for NLP processing etc. I am using TextBlob Python library.
Normally, in a standalone mode, it easy to install the external Python libraries. In case of cluster mode I am facing problem to install these libraries on worker nodes remotely. I cannot access each and every worker machine to install these libs in Python path.
I tried to use Sparkcontext pyfiles option to ship .zip
files...but the problem is these Python packages needs to be get installed on worker machines.
Could anyone let me know what are different ways of doing it so that this lib-Textblob could be available in Python path?
Source: (StackOverflow)
I've recently come across TextBlob, which seems like a very neat Natural Language Processing library. http://textblob.readthedocs.org/en/dev/quickstart.html
However, I am concerned because it seems to act as a regular Python string. I have a very large text file, and for example, calling blob.correct()
for a very modest text amount takes a very long time. Any feedback on the scale of TextBlob or any alternatives for natural language parsing?
Source: (StackOverflow)
How does textblob calculate polarity in sentiment analysis? What logic does it follow and can we change the logic?
Thank you.
Source: (StackOverflow)
I have been trying to install Python TextBlob, but I am getting this error:
Now downloading textblob packages
[localhost] run: python -m textblob.download_corpora
[localhost] out: /home/naren/VirtualEnvironment/bin/python: No module named textblob
[localhost] out:
Fatal error: run() received nonzero return code 1 while executing!
Requested: python -m textblob.download_corpora
Executed: /bin/bash -l -c "cd /home/naren/VirtualEnvironment && source bin/activate && python -m textblob.download_corpora"
Aborting.
Disconnecting from localhost... done.
run() received nonzero return code 1 while executing!
Source: (StackOverflow)
For a project for my lab, I'm analyzing Twitter data. The tweets we've captured all have the word 'sex' in them, that's the keyword we filtered the TwitterStreamer to capture based on.
I converted the CSV where all of the tweet data (json metatags) is housed into a pandas DB and saved the 'text' column to isolate the tweet text.
import pandas as pd
import csv
df = pd.read_csv('tweets_hiv.csv')
saved_column4 = df.text
print saved_column4
Out comes the correct output:
0 Some example tweet text
1 Oh hey look more tweet text @things I hate #stuff
...a bunch more lines
Name: text, Length: 8540, dtype: object
But, when I try this
from textblob import TextBlob
tweetstr = str(saved_column4)
tweets = TextBlob(tweetstr).upper()
print tweets.words.count('sex', case_sensitive=False)
My output is 22
.
There should be AT LEAST as many incidences of the word 'sex' as there are lines in the CSV, and likely more. I can't figure out what's happening here. Is TextBlob not configuring right around a dtype:object?
Source: (StackOverflow)
I have this code. I have two features. How do I train the two features together?
from textblob import TextBlob, Word, Blobber
from textblob.classifiers import NaiveBayesClassifier
from textblob.taggers import NLTKTagger
import re
import nltk
def get_word_before_you_feature(mystring):
keyword = 'you'
before_keyword, keyword, after_keyword = mystring.partition(keyword)
before_keyword = before_keyword.rsplit(None, 1)[-1]
return {'word_after_you': before_keyword}
def get_word_after_you_feature(mystring):
keyword = 'you'
before_keyword, keyword, after_keyword = mystring.partition(keyword)
after_keyword = after_keyword.split(None, 1)[0]
return {'word_after_you': after_keyword}
classifier = nltk.NaiveBayesClassifier.train(train)
lang_detector = NaiveBayesClassifier(train, feature_extractor=get_word_after_you_feature)
lang_detector = NaiveBayesClassifier(train, feature_extractor=get_word_before_you_feature)
print(lang_detector.accuracy(test))
print(lang_detector.show_informative_features(5))
This is the output I get.
word_before_you = 'do' refere : generi = 2.2 : 1.0
word_before_you = 'when' generi : refere = 1.1 : 1.0
It only seems to get the last feature. How do I get the classifier to train both features instead of one.
Source: (StackOverflow)
I'm trying to run through the TextBlob tutorial in Windows (using Git Bash shell) with Python 3.3.
I've installed textblob
and nltk
as well as any dependencies.
The Python code is:
from text.blob import TextBlob
wiki = TextBlob("Python is a high-level, general-purpose programming language.")
tags = wiki.tags
I'm getting the following error
Traceback (most recent call last):
File "textblob.py", line 4, in <module>
tags = wiki.tags
File "c:\Python33\lib\site-packages\text\decorators.py", line 18, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "c:\Python33\lib\site-packages\text\blob.py", line 357, in pos_tags
for word, t in self.pos_tagger.tag(self.raw)
File "c:\Python33\lib\site-packages\text\taggers.py", line 40, in tag
return pattern_tag(sentence, tokenize)
File "c:\Python33\lib\site-packages\text\en.py", line 115, in tag
for sentence in parse(s, tokenize, True, False, False, False, encoding).split():
File "c:\Python33\lib\site-packages\text\en.py", line 99, in parse
return parser.parse(unicode(s), *args, **kwargs)
File "c:\Python33\lib\site-packages\text\text.py", line 1213, in parse
s[i] = self.find_tags(s[i], **kwargs)
File "c:\Python33\lib\site-packages\text\en.py", line 49, in find_tags
return _Parser.find_tags(self, tokens, **kwargs)
File "c:\Python33\lib\site-packages\text\text.py", line 1161, in find_tags
map = kwargs.get( "map", None))
File "c:\Python33\lib\site-packages\text\text.py", line 967, in find_tags
tagged.append([token, lexicon.get(token, i==0 and lexicon.get(token.lower()) or None)])
File "c:\Python33\lib\site-packages\text\text.py", line 98, in get
return self._lazy("get", *args)
File "c:\Python33\lib\site-packages\text\text.py", line 79, in _lazy
self.load()
File "c:\Python33\lib\site-packages\text\text.py", line 367, in load
dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if x.strip()))
File "c:\Python33\lib\site-packages\text\text.py", line 367, in <genexpr>
dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if x.strip()))
File "c:\Python33\lib\site-packages\text\text.py", line 346, in _read
for line in f:
File "c:\Python33\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 16: character maps to <undefined>
Any idea what is wrong here? Adding a 'u'
before the string didn't help.
Source: (StackOverflow)