data science cs109 homework1

1.python及其第三方庫的引入

python和pip的安裝會出現各種錯誤,在安裝python之前一定要安裝openssl、openssl-devel、libxml,libxml-devel,然後才能以此安裝setuptools和pip,pip是python的管理器,能夠自動下載安裝很多python第三方包,非常方便。
小編搗鼓了好幾天,事實證明,python3還不是很成熟,第三方包非常難安裝,一直會出錯,最好還是使用python2.7來學習機器學習
data science的老師在homework中給出了詳細的需要下載的第三方庫的列表:

#IPython is what you are using now to run the notebook
import IPython
print "IPython version:      %6.6s (need at least 1.0)" % IPython.__version__

# Numpy is a library for working with Arrays
import numpy as np
print "Numpy version:        %6.6s (need at least 1.7.1)" % np.__version__

# SciPy implements many different numerical algorithms
import scipy as sp
print "SciPy version:        %6.6s (need at least 0.12.0)" % sp.__version__

# Pandas makes working with data tables easier
import pandas as pd
print "Pandas version:       %6.6s (need at least 0.11.0)" % pd.__version__

# Module for plotting
import matplotlib
print "Mapltolib version:    %6.6s (need at least 1.2.1)" % matplotlib.__version__

# SciKit Learn implements several Machine Learning algorithms
import sklearn
print "Scikit-Learn version: %6.6s (need at least 0.13.1)" % sklearn.__version__

# Requests is a library for getting data from the Web
import requests
print "requests version:     %6.6s (need at least 1.2.3)" % requests.__version__

# Networkx is a library for working with networks
import networkx as nx
print "NetworkX version:     %6.6s (need at least 1.7)" % nx.__version__

#BeautifulSoup is a library to parse HTML and XML documents
import BeautifulSoup
print "BeautifulSoup version:%6.6s (need at least 3.2)" % BeautifulSoup.__version__

#MrJob is a library to run map reduce jobs on Amazon's computers
import mrjob
print "Mr Job version:       %6.6s (need at least 0.4)" % mrjob.__version__

#Pattern has lots of tools for working with data from the internet
import pattern
print "Pattern version:      %6.6s (need at least 2.6)" % pattern.__version__

(1)這裏面matplotlib安裝出現了很多問題,其中matplotlib安裝必須依賴freetype、pnglib,還有yum install gcc-c++
(2)pandas安裝需要依賴 yum install cython,安裝起來會死機,還不知道爲什麼
(3)beautiful4就是bs4啊啊
(4)pattern安裝時候報錯,問題出在setup.py中的print是Python2.x版本的,而python是3.x版本的,需要將setup.py文件中所有的print換爲print();

此上是我總結的血淚史,感覺錯誤的關鍵是版本的不匹配,由於我對linux不是很熟悉,所以只能慢慢摸索了,好在在自己的電腦上成功的安裝了所有的第三方包,實驗室的還有其他奇葩的錯誤,再接再厲,那麼下面開始簡單的代碼吧。

2.代碼例子

#this line prepares IPython for working with matplotlib
%matplotlib inline  

# this actually imports matplotlib
import matplotlib.pyplot as plt  

x = np.linspace(0, 10, 30)  #array of 30 points from 0 to 10
y = np.sin(x)
z = y + np.random.normal(size=30) * .2
plt.plot(x, y, 'ro-', label='A sine wave')
plt.plot(x, z, 'b-', label='Noisy sine')
plt.legend(loc = 'lower right')
plt.xlabel("X axis")
plt.ylabel("Y axis")  

問題:
(1)No module named _tkinter
解決方案:
安裝tck-devel、tk-devel,重新編譯python
(2)no display name and no $DISPLAY environment variable
解決方案:
在文件頭添加implotlib.use()
或者import matplotlib >>> matplotlib.matplotlib_fname() # This is the file location in Ubuntu ‘/etc/matplotlibrc’
找到matplotlibrc之後,將backend從tkAGG修改爲AGG
上面的方法會導致show函數不能顯示圖像,解決的根本方法還不知道,但是後續會繼續尋找方法
(3)libpng16.so.16: cannot open shared object file: No such file or directory
[root@root python]
解決方案:

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章