Category Archives: Programming

PyCon 2015: Neural Nets for Newbies

The ideas and methods in neural nets (NNs) have been around for a long time, but in the last decade plus, we are finally starting to reap significant benefits, and this is just the beginning. This post provides an overview of my recent PyCon talk in Montreal which is a neural net primer of sorts. The video is below, my slides are on SpeakerDeck, and I have a repo on Github named Neural Nets for Newbies.

There is too much to cover to fully explain neural nets them; thus, the post and the talk provide a framework to start to understand neural nets. If you want to learn more, there are plenty of resources, some listed in my deck, to dive into.

What are they?
Machine learning is a set of algorithms for classification and prediction, and artificial neural nets are part of the machine learning space. At its core, neural nets are an algorithm which means an equation to help abstract and find patterns in data. Technically it’s a combination of equations.

The structure is modeled after our brains. Before you get all excited about robots that can think like us (side note that idea has been around since BC), the reality is that we still don’t fully understand how the human brain functions. Neural nets are only loosely mimicking brain functionality. For the enthusiasts out there, yes there are many researchers focused on creating a closer biological representation that acts like our brain. The Bottom line is we aren’t there yet.

The algorithm, structure and many of the ideas around neural net functionality have been around for a while; several of them date back to the 1950s. Neural nets have been applied for commercial solutions as far back as 1959 (reducing phone line echos), but we really haven’t seen significant value until recently. Key reasons are that our computational power (computer processing speed and memory capabilities) and access to useful data (amount of stored data) has significantly improved in the last decade alone.

Why should I care?
Because NNs have achieved technical advancement in areas like:

  • Natural Language Processing (Search & Sentiment)
  • Speech Recognition (Siri)
  • Computer Vision & Facial Recognition (Automatic Image Tagging)
  • Robotics (Automated Car)
  • Recommender Systems (Amazon)
  • Ad Placement

Some of you may roll your eyes at these advancements and complain about how Siri is limited in interactions. Need I remind you that we weren’t talking to our machines at the beginning of this century (or at least it wasn’t common). Hell, we didn’t have iPods at the beginning of this century if you remember what they are. I too fall into the sci-fi trap where I’ve seen it or read about it and so when we actually experience real advancements, it seems so boring and behind the times. Yeah, get over that.

All the areas I mentioned above still have plenty of room for growth and there are definitly other areas I haven’t listed especially in scientific fields. One of the reasons neural nets have had such impressive impact is their way of handling more data especially data that has layers of complexity.  This doesn’t mean that neural nets should be used for all problems. They are overkill in many situations. I cannot stress that enough that every problem is not a nail for the NN hammer.

If you have a good problem and want to apply NNs it’s important to understand how they work.

Ok, so how do they work?
Out the gate, If you want to get serious about applying NNs then you will need to embrace math no matter how much you don’t like it. Below I’ve given you some fundamentals around the math and the structure to get you started.

Basic Structure
Our brains are made up of neurons and synapses, and based on our interactions, certain neurons will fire and send signals to other neurons for data processing/interpretation. There is much more complex stuff going on than just that in our brains, but at a high-level that expresses the structure the neural net models.

NNs at a minimum have three layers: input, hidden, output.

  • Input = data
    • Data that is broken up into consumable information
    • Data can be pre-processed or raw
    • Bias and noise are applied sometimes
  • Hidden = processing units (aka, does math)
    • Made up of neurons
    • A neuron determines if it will be active (math under equation section)
    • Typically there are multiple neurons in a hidden layer (can be thousands or even billions depending on the data used and objective)
  • Output = results
    • One node per classification and just one or many
    • A net to classify dogs or cats in a picture has two output nodes for each type of classification
    • A net to classify handwritten digits between 0-9 has ten output nodes

You can have more than one hidden layer in a neural net, and when you start adding hidden layers, they trade off as inputs and outputs based on where they are in the structure.

Basic Equation
Each neuron represents an equation, and it takes in a set of inputs, multiplies weights, combines the data and then applies an activation function to determine if the neuron is active. A neuron is known as a processing unit because it computes the data to determine its response.

  • Inputs = input layer data in numerical format
  • Weights = coefficients (also known as theta)
    • Specialize each neuron to handle the problem (dataset) you are working with
    • Can initialize randomly
    • One way to initialize is to create a distribution of the existing data set and randomly sample that distribution
    • Often weights are represented between -1 to 1
  • Bias = can be included as an input or used as a threshold to compare data after the activation function is applied
  • Activation Function = data transformation to determine if the neural will send a signal
    • Also known as the energy function
    • There are many different equations that can be used, and it depends on the problem and data you are working with
    • Example equations: sigmoid/logistic, step/binary threshold, linear, rectified linear (combines binary threshold & linear), …
  • Output(s) = each node results in a binary, percentage or number range

Equation

Each neuron is unique from other neurons in a hidden layer based on the weights applied. They can also be unique in the inputs and outputs. There are many hyperparameters that you can tweak for one single neuron let alone the whole structure to improve its performance. What makes neural nets powerful is the combination of linear with nonlinear functions in the equation.

Optimization
When applying a neural net, an effort is needed to optimize the model, so it produces the results you are targeting.

The breakthroughs in neural nets are largely in the area of supervised learning. Supervised learning means you have a dataset labeled with the results you expect. The data is used to train the model so you can make sure it functions as needed. Cross validation is a technique typically used in supervised learning where you split the dataset into a training set to build the model and test set for validation. Note, there are areas in neural net research that explores unlabeled data, but that is too much to cover in this post.

In order to optimize, you start out with a structure and probably randomized weights on each neuron in the hidden layer(s). You’ll run your label data through the structure and come out with results at the end. Then you compare those results to real labels using a loss function to help define the error value. The loss function will transform the comparison, so it becomes a type of compass when going back to optimize the weights on each neuron.

The optimization method (aka back propagation or backprop) is a way of taking the derivative of the loss function and applying it to the weights throughout the model. This method can change all weights on every neuron and because of the way the method works, it does not change the weights equally. You want shifts that vary across weights because each neuron is unique.

  • Error = difference between NN results to the real labels
  • Loss Function = calculates the error  (also referred to as cost function)
    • There are many different equations that are used, and it depends on the problem and data you are working with
    • Example equations: mean squared error, negative log likelihood, cross entropy, hinge, …
  • Regularization = noise applied in the loss function to prevent overfitting
  • Optimization Method = learning method to tune weights
    • There are many different equations that are used, and it depends on the problem and data you are working with
    • Example equations: stochastic gradient descent, Adagrad (J Duchi), Adadelta (M Zeiler), RMSprop (T. Tieleman), …
  • Learning Rate = size of how much to change the weights each time and sometimes part of optimization algorithms

Backprop in essence wiggles (to quote Karpathy) the weights a little each time you run the data through the model during training. You keep running the data through and adjusting the weights until the error stops changing. Hopefully it’s as low as you need it to be for the problem. And if it’s not, you may want to investigate other model structure modifications.

Note reducing the error rate is a common model objective but not always the objective. For the sake of simplicity, that’s our focus right now.

Validation / Testing
Once you’ve stopped training your model, you can run the test data set through it to see how it performs. If the error rate is horrible, then you may have overfit, or there could be a number of other issues to consider. Error rate and other standard validation approaches can be used to check how your model is performing.

Structure Types
I’ve given you a basic structure on how the neural net connects but its important to understand there are variations in that structure that are better for different types of problems. Example types include:

  • Feed Forward (FFN) =  basic structure and passes data forward through the structure in the order of connections
    • There are no loops
    • Data moves in one direction
    • Key Applications: financial prediction, image compression, medical diagnosis and protein structure prediction
  • Recurrent (RNN) = depending on the timing the neuron fires, data can be looped back earlier in the net structure as inputs
    • Data can become input to the same neuron, other neurons in that layer or neurons in a hidden layer prior to that layer
    • Operates on linear progression of time
    • Good for supervised learning in discrete time settings
    • Key Applications: sentiment analysis, speech recognition, NLP
  • Convolutional (CNN) = uses a mixture of hidden layers types (e.g. pooling, convolutional, etc.)
    • Best structure for scaling
    • Inspired by biological processes and variant of multilayer perceptrons
    • Key Applications: computer vision, image & video recognition
  • Other types to checkout:
    • Recursive (RNN) = related to Recurrent but based on structure vs time
    • Restricted Boltzmann Machine (RBM) = 1st neural net to demonstrate learning of latent / hidden variables
    • Autoencoder (Auto) = RBM variant
    • Denoising Autoencoder (DAE)
    • Deep Belief Networks (DBN)

Neural nets can get complex in the structure and combined equations. It can be tricky and time-consuming to develop a useful model and confusing on where to start. Due to extensive research, there are already pre-baked templates for certain types of problems that you can adapt and avoid starting from scratch.

There are a couple other points to note about neural nets to point you in the right direction when developing and deploying.

Systems Engineering
In order to run a neural net to solve problems like mentioned above, it’s important to understand certain system engineering concepts.

The main one to spend time on is graphical processing units (GPUs). These chips are playing a key role in improving latency (speed) to develop NNs. You want every advantage you can get with reducing the time it takes to make a neural net.

GPUs are highly optimized for computation compared to CPUs which is whey they are popular in gaming and research. Granted there are advances going on in CPUs that some argue are making them function more like GPUs. At the heart of this, just spend some time learning about GPUs and try running an NN on it.

I listed a few other topics in my talk that you should research further to go above and beyond single server computation of a neural net.

  • Distributed Computing
  • High-Performance Computing

Note if you go down the distributed path you are starting to get into sharing the data across nodes or splitting the model, which can be extremely tricky.  Try sticking to a single server for as long as possible because you can’t beat that latency and with where technology is, you should be able to do a lot with one computer especially when starting out. Only go down the distributed path when the data and problem are complex enough it can’t be contained on one server.

Python Packages
There are many Python packages you can use to get started with building neural nets and some that will automate most of the process for you to get you off the ground faster. Below is a list of ones I’ve come across so far.

  • Theano
  • Machine Learning Packages
    • Graphlab
    • PyLearn2
    • Lasagne
    • Kayak
    • Blocks
    • OpenDeep
    • PyBrain
    • Keras
    • Sklearn
  • Packages based in C with Python Bindings
    • Caffe
    • CXXNet
    • FANN2
    • GUI with Python API
  • GUI with Python API
    • MetaMind

I highly recommend that you spend time exploring Theano because it’s well documented, will give you the best exposure and control of the math and structure and it’s regularly applied to solve real world problems. Many of the machine learning packages are built off of it. The machine learning packages vary in terms of how easy they are to use, and some have easy integration with GPUs.

MNIST Code Example
For the example in the talk, I used the MNIST (Mixed National Institute of Standards and Technology) dataset, which is the “hello world” of neural nets. It’s handwritten digit analysis of grayscale pictures (28 x 28 pixels).

  • Structure can be as simple as 784 inputs, 1000 hidden units, 10 outputs with at least 794K connections
  • Based on Yann LeCunn’s work at ATT with LeNet in 1990s

For reference, I’ve pulled MNIST examples for some of the Python packages into a Github repository as mentioned above, and you can also find here: github.com/nyghtowl/Neural_Net_Newbies.

What’s next for NN?
Neural nets will continue to play a signficant role in advancements in all the areas I’ve mentioned especially with natural language processing and computer vision. The real key value for nearl nets is in automatic feature engineering and we will continue to see neural nets applied to help identify features especially as richer datasets for certain problems are captured. 

Additionally, combining neural net structures as well as other machine learing models models with NNs will help drive these advancements. Some great research came out last fall around combinging CNNs with RNNs to apply sentence long descriptions to images. 

Where a number of experts have talked about for the long-term value is the potential impact with unlabeled data. Finding patterns in data that we have no knowledge of or data we’ve labeled with our own set of biases. These types of patterns will drive advancements that may very well be akin to what we read in sci-fi as well as stuff we really haven’t though of yet. 

Reality is NNs are algorithms with the most potential to really create greater intelligence in our machines. Having technology that can reason and come up with new ideas is very possible when NNs are factored in.

Last thoughts…
If you want to get serious about researching neural nets, spend time studying linear algebra (matrix math), calculus (derivatives), existing neural net research and systems engineering (esp. GPUs and distributed systems). The slides I posted have a number of references and there are many other resources online. There are many great talks coming out post conferences that can help you tap into the latest progress. Most importantly, code and practice applying neural nets. Best way to learn is by doing.

Targeting Email with Random Forest at Change.org

Last fall, a couple of my colleagues (Kristiane Skiolmen, Scott Lau) and I presented Change’s machine learning email optimization approach as a lecture in Stanford’s Human Computer Interaction Seminar for CS grad students.

The video gives an overview of how Change.org uses email to drive petition engagement from the business and social perspective to the specific technical optimization we made. It starts with an overview of Change and examples of petitions that have literally improved and saved lives.

As of the date of the video, here are some stats we presented:

  • 77M total users globally
  • 1.2M users visiting the site daily
  • 450M signatures total
  • 10K declared victories in over 120 countries

Our most successful source of engaging users to sign petitions is email. It’s not an ideal channel and we know that and want to change that. Still since it drives the most response at this time we did take steps to optimize that channel with machine learning. I’m sharing this video and a little about the project so you can see a real world application of machine learning. Below are a couple summary points from the video.

We have an email team that specializes in helping put petitions in front of users who would connect with them. We have groups that define certain petitions to showcase every week through email and the email team was using a cause (topic) filtering model to determine what petitions to send users. It was a manual process of tagging petitions to causes and comparing them to our user base that had been grouped by causes based on petitions they signed.

There are a lot of limitations with this approach from scaling for data size as well as adapting culturally and internationally. Also, the challenge with the manual approach is that some causes had much smaller audiences and lower rates of responses; thus, certain petitions were doomed to fall short of signatures because their cause had a smaller audience.

Our data team built a model to help improve email targeting. Basically, we identified over 500 features (e.g. # petitions signed in the past, etc.) that were predictive of signatures and we tried out a couple classification algorithms to come up with a predictive model to use. The accuracy scores were pretty close on the models we investigated. So we went with a random forest algorithm because we didn’t need to binarize our data, our data is unbalanced (which random forest handles well) and it was the most transparent in feature detection if we wanted to dig into the results.

How it works is each time the email team gets a set of petitions to showcase, they send emails to a sample set of users. Based on the signature response to one petition, a random forest model is developed and then all users are run through the model to predict her/his signature response to that one petition. A random forest model is built per petition the email team showcases that week and we run signature predictions on all users for each of the showcased petitions. Each random forest model produces a probability of signature response per user and then our program sorts the probabilities and identifies the petition with the highest success rate for each user (filtering out ones the user has already received in email). The email team gets back a list of users per petition to send their showcased petitions to for that week.

In the video, I go into more detail around how a random forest works as well as the way it was implemented. Also, Scott provides an overview of how we used Amazon Web Services to implement this data product.

Note there are other ways to approach this problem, but for what we needed, this solution has increased our sign to send rate by 30% which is substantial.  On one petition, for example, we would have had  4% signature response out of a pool of 2M people to email, but our new approach with machine learning enabled us to target 5M users with a 16% signature response rate.

As mentioned,  I don’t see email as the best communication source and even though we can and will improve on our current solution, we are working to incorporate more effective means of engagement.

Graphlab & ODBC

For those out there working with Dato(Graphlab) and trying to setup an ODBC connection to just pull all the data straight into the SFrame, here are some tips I’ve learned from troubleshooting.

What is ODBC?
Open Database Connectivity which is a middleware API to help standardize and simplify access to database management systems.

Connection Pointers:
There are a number of links on odbc setup but it was a little tricky to get it to work with Graphlab, Linux and OSX and Graphlab’s documentation is a little sparse in that area right now.

Linux
This is one of the links I found that was helpful for setting up on a Linux machine. The following are the steps I used

  • wget http://yum.postgresql.org/%5Bversion #]/redhat/rhel-[version #]/pgdg-centos-[OS type & #].noarch.rpm
  • Use the package version link from http://yum.postgresql.org/ in the wget command above to pull the rpm file that you need. Note, you are setting up the postgres yum server on your computer to run yum install postgres odbc packages after the fact
  • rpm -ivh ./pgdg-[OS type & #].noarch.rpm
  • yum install postgresql[version #]-odbc.[version #]
  • yum install postgresql[version #]-odbc-debuginfo.[verions #]
  • yum install unixODBCl

In the yum install portion, you can combine and separate with spaces each package on one line. You may need to sudo install depending on the role you are logged into the system as and the available permissions. Best practice is to avoid using sudo.

Now that you have the packages installed, update the odbcinist.ini file which should be in /etc/ directory. Sample file contents include:

[PostgreSQL]
Description = ODBC for PostgreSQL
Driver = /usr/pgsql-[version #]/lib/psqlodbc.so
Setup = /usr/lib64/libodbcpsqlS.so
Driver64 = /usr/pgsql-[version #]/lib/psqlodbcw.so
Setup64 = /usr/lib64/libodbcpsqlS.so.2.0.0
Database = [database name]
Server = [address for server which if redshift it will look like: ?……redshift.amazonaws.com]
Port = [port for your setup something like 5432 or 5439]
FileUsage = 1

Settings above can vary. Definitely read up on options and how it relates to your connection setup.

OSX
This was a little trickier because the documentation wasn’t as clear. I ended up using homebrew package manager and the following steps worked.

  • brew update
  • brew install unixodbc
  • brew install psqlodbc

Next setup odbc.ini which should be under the /usr/local/Cellar/unixodbc/[version #]/etc/ directory. Sample file contents include:

[Postgres_db]
Description = ODBC for PostgreSQL
Driver = PostgreSQL
Database = [database name]
Server = [address for server which if redshift it will look like: ?……redshift.amazonaws.com]
Port = [port for your setup]
Protocol = [protocol for your setup]
Debug = 1

Then setup odbcinst.ini which should also be under the /usr/local/Cellar/unixodbc/[version #]/etc/ directory. Sample file contents include:

[PostgreSQL]
Description = PostgreSQL ODBC driver
Driver = /usr/local/Cellar/psqlodbc/[version #]/lib/psqlodbcw.so
Setup = /usr/local/Cellar/unixodbc/[version #]/lib/libodbc.2.dylib
Debug = 0
CommLog = 1
UsageCount = 1

The sticky part for getting graphlab odbc connect to work was that I needed path variables to point to the odbc config files. Thankfully I got this idea from this Stackoverflow post. So in the .bash_profile (which should be in your home directory – use ~/ to get there) add the following:

export ODBCINI=/usr/local/Cellar/unixodbc/[version #]/etc/odbc.ini
export ODBCSYSINI=/usr/local/Cellar/unixodbc/[version #]/etc/

Same with Linux, the setup will vary based on your configuring needs. If at first you don’t succeed, keep researching on how to adjust.

 Graphlab/Data
At this point you can go into a python or Ipython kernal and try:

  • import graphlab
  • graphlab.connect_odbc(“Driver=PostgreSQL;Server=[server address like above];Database=[database name];UID=[username];PWD=[password]”)

For some reason even though the parameters in the connection string are defined in the odbcinst.ini config files, Graphlab complains that the string is missing data without them. Specifically, you need to include Driver, Server, Database, UID and PWD. Its good security to pass in your password at least as a variable that comes form a config file and/or the environment.

Once the odbc connection worked, it made the data product run so much more effectively. I’m able to pull the data directly into the package that will build the model and stripped out an extra step that previously existed to query the data into a middle storage before loading it to the package that would train the model. There are other tools out there like that coming into wider use to cut to the chase regarding data processing and machine learning. Spark is one such tool that I’m especially interested in and will try to write about in the future.

What to Do When You are Hacked

I am someone who practices pretty decent technical security hygiene, but I had my Yahoo account hacked this week (despite using two factor identification).  This post focuses on sharing what I did to deal with the attack, what I think went wrong and some steps and resources you can use for security.

Summary of attack…

The attacker logged in at 4:57PM PST and sent off 28 emails to about 5 recipients in each email from 4:58PM to 5:05PM. I learned about the attack from a friend at 5:08PM and logged in and changed my password at 5:10PM. The attacker appeared to pull email addresses from my account (I’m guessing from my contacts and sent file folders) to use in the spam emails.

Having someone access the account like that left me feeling completely violated. I’ve seen many friends and family get hacked and when it happened to me, I was left thinking what do I do now, where did the attacker get in from and what else was compromised. I thought a number of other things too, but those are a little off topic for this post.

What I did to deal with the attack…

  • Password: Immediately I logged into my account and changed my password. Thankfully it hadn’t been changed. I highly recommend that you use as long of a password you can get away with because that is going to be your best password defense. To be clear I don’t mean 6 characters. I mean like over 15.
  • Two Factor Identification: I checked whether two factor/second sign-in identification was still on the email. It is a pain to use, but I can’t recommend enough about implementing it on any site that allows you to use it. Two factor is basically a way for sites to provide extra security by require another form of identity verification in addition to a password. Usually a site will text a code to your mobile that you have to enter into the site before they grant access.
  • Router: I reset the home wireless router. Since, I didn’t know where the attack came from, I was concerned someone had hacked the router and was sniffing data off it and/or there was malicious software on my computer. So I got hold of a secondary computer that I hadn’t logged into my Yahoo account with and plugged directly into the router with a LAN cable. After pressing the reset button on it, I update the router’s software. I set the router so it was hidden from broadcasting its name (disabled SSID). The reset also required that I reset the router id and password.
  • Research: I researched with my phone while the router was restarting to find a couple sources that recommend what to do in this situation.
  • OS (Operating System): I updated my Mac’s OS which is your best defense on resolving security weaknesses. I also made sure any other devices in the house had updated OS.
  • Antivirus & Anti-spyware Software: I loaded and ran virus checking software on my Mac. I found a free software called ClamXav. I found it through a quick search and can’t say whether its better or the best or even a good idea. I’ve also read all the literature about how the Mac does not get viruses, and it makes attacks very difficult. At the time I just wanted to check the computer for any viruses for my piece of mind and nothing is 100% foolproof. Plus third-party software and applications on the computer are not alway as secure as Mac software.
  • Yahoo Mail Security: I went back into Yahoo and made sure the security questions hadn’t been changed (attackers tend to do this to get back in later) and that two factor sign-in would only use my mobile phone to verify my login. I also changed the sign-in settings to automatically log out any open sessions every day, and set it so I can only get help for when I forget passwords through my cell phone. Additionally, I checked to make sure no changes were made to my personal information like any unknown phone numbers or email addresses added to my account. Attackers sometimes change the personal information to their own. Last I de-authorized apps that were granted access to the Yahoo mail. All this can be found under Account Info or Mail Options which are in a drop down list under the gear image in the top right of a Yahoo mail screen.
  • Yahoo / Contact ISP: I called and emailed Yahoo to alert them to the hack. It took about 20 to 30 minutes to get someone on the phone. The customer service rep had difficulty recommending how to deal with the issue and was very unsure about her answers to my questions. After I asked multiple times in different ways, she did confirm that the attacker would definitely be logged out after I had reset my password (I had originally asked if they could force a logout across the board of anyone in the account, but it wasn’t something she said she could initiate from her side). When I asked to get information on what the attacker accessed, she told me to file a police report and fax it to Yahoo legal to get that information. There was a list of the most recent login attempts that I found under my account information. That is where I was able to find out when the attacker logged in and their ISP address. But it did not show what the attacker looked at and its unfortunate that I would need a police report to get that information.
  • Other Account Security: I logged into my most sensitive accounts (mostly financial) to confirm I had changed all my email addresses away from Yahoo, and that I had the tightest security in place on them. Note, you have to be careful with any accounts you have linked to your email that have emailed you passwords (some actually do that) and/or sends you password change notices. If any of that was floating around in my email it was fair game. I went through the rest of my accounts to check if I had reused something similar to my previous Yahoo password and changed them where necessary. I had already started using different passwords for different accounts. Still I (like many) was lazy about it  at times and its hard to keep track of all those passwords.
  • Warning Email: The 28 spam messages were in my sent box so I was able to see who received the email. Using Gmail, I sent an email to all those contacts warning them about the spam and letting them know I would no longer use Yahoo to email them.

After all that, I finally logged out of all my accounts and got some rest. The next day I logged into Yahoo a couple of times to check the log files and make sure no further strange activity was occurring.

What I did probably seems like overkill to some, but I had been using two factor and this was my first experience getting hacked (that I was aware of); thus, I really didn’t know where the weakness was and I needed overkill.

What I think went wrong…

Thankfully my mentor helped me narrow down where I think the attack came from which is that my Yahoo SSL (secured socket layer) was not automatically turned on. I had thought it was, but for some reason, that is not a required setting on Yahoo. SSL is a way of managing the security of a message transmission on the Internet. Typically you see that you are using it with HTTPS in the browser address bar. https

So here’s what I think happened (can’t completely prove it but seems like the best answer now). I worked out of a coffee shop the other day, and accessed the shop’s open wireless. Granted this can be dangerous anyway for so many reasons, and I typically don’t do it. When I do I usually I log out of my sensitive accounts (like email and definitely financial) as well as turned off any apps (outside the browser) that were automatically running updates (Dropbox or Evernote).

Still I am one of those who will have a zillion browser windows and tabs open. So I think I left a Yahoo session on somewhere in my tabs, and it automatically checks for updates. When it did someone may have been sniffing that router (software/tool that can capture data passed through the router), and picked up my Yahoo session token. At that point they could have used it to access my account until I changed the password.

What you can do…

I’m going to list a few things you can do (beyond what I mentioned above) and resources you can use to help with security. Note, there are many other resources you can research online that can provide help. You don’t have to do everything that is recommended here. Do what works best for you and just know that no security setup is perfect.

  • If you are still using Yahoo, go into Mail Options and make sure that SSL is turned on and take any of the steps that I mentioned above regarding changing your Yahoo settings.

    SSL_Yahoo

    Yahoo SSL Checkbox

  • Make sure your computer has a firewall that is turned on and antivirus software (esp. if it’s not a Mac). Make sure that it doesn’t accept bluetooth connections that you are not aware of and that you are backing up your data.
  • Load and use HTTPS Everywhere with Firefox and Chrome. EFF (Electronic Frontier Foundation) provides the plugin to help encrypt your communications with many major websites, making your browsing more secure. It basically pushes for the site to use SSL if its available on a website. It is not perfect because it apparently didn’t get Yahoo to switch to SSL, but on the whole it is a good plugin to have to improve your security.
  • Secure your wireless router. There are several sites out there that gives you information on how to secure it like this site.  There is a debate about how useful it is to turn off SSID (service set identifier). I subscribe to the perspective of why make it easier for people to find it; thus, I stopped it from being broadcast. Also, I highly recommend changing the name of your wireless router (SSID) to something that is unique. It shows you are not a novice.
  • Consider using a VPN (Virtual Private Network) when logging into a wireless connection that you are unsure of.
  • And if you get hacked, this is the site I used to help give me guidance on some of the steps I took to when addressing the hack. There are many other resources online that can help.

Really we can only be so secure especially with how sophisticated technology is getting. Take some steps to protect your information where it seems reasonable and if you are hacked go a little beyond just changing the password.

Now what with Yahoo…

I opened the Yahoo account around 2002/2003, and I had used it for everything. It was a couple of months ago someone I respect completely in the tech industry convinced me to move to Gmail because Yahoo is perceived as being an email dinosaur. I was reluctant to switch because switching over email is very time-consuming and basically a pain. Also, I wanted to stick by Yahoo because I had used it for so long and a part of me wanted to support it since Marissa Mayer took over as CEO.

Still I thankfully took the advice and had already started making the change. Despite making the move there was still over 10 years worth of information stored in my Yahoo account, and I hadn’t finished making the move. I can say that hack definitely motivated me to quickly wrap up making the switch over.

Even though I was the one to use an un-secure wireless network, I do find fault with Yahoo for not having SSL automatically turned on in addition to their poor performance/response in addressing the hack. They can and should do better and that’s the reason they have lost me as a consumer.

HB Quick Tips

Below are some quick tips for those considering something like Hackbright Academy. If you have time before or want to get some preliminary building blocks, check out the links and pointers below.

  1. Keyboard Shortcuts:
  • Learn keyboard short cuts (before starting any programming bootcamp)
  • If you are not used to it, it does suck royally to learn
  • Push through it and practice, practice, practice!
  • The less you use the mouse, the more credible you are as a developer
  1. Ask for Help Early and Often:
  • Leverage your classmates
  • Leverage google, stackoverflowreddit, etc…
  • Leverage your network and online social tools
  1. Start Reviewing:
  1. Check-out HTML, CSS & JavaScript
  1. Git with Git:
  • Read about git
  • Read about GitHub
  • Want more?…
  • Load git on your home computer
  • Setup a GitHub account
  1. Balance is Key:
  • As Cynthia would say “take breaks!”
  • Make sure to keep up a workout regime – its good for the brain
  • Eat healthy to keep the energy up
  1. Check out Zach Holman’s deck (it’s 5 min)
  1. Additional Resources:

The feeling when you figure out the code solution to the problem is my favorite part of coding…so far.

Me