共享单车项目分析

简介：随着共享单车的星期，这次探索三大美国城市的自行车共享系统相关的数据：芝加哥、纽约和华盛顿特区，帮助共享单车公司得到一些关键性的数据信息，例如哪个起始车站最热门，哪一趟行程最热门等等，来对共享单车的投放给予一定帮助。

一、分析步骤

编写代码导入数据，并通过计算描述性统计数据回答有趣的问题。
编写一个脚本，该脚本会接受原始输入并在终端中创建交互式体验，以展现这些统计信息。
提出问题
终端应用脚本

二、提出问题

起始时间（Start Time 列）中哪个月份最常见？
起始时间中，一周的哪一天（比如 Monday, Tuesday）最常见？
起始时间中，一天当中哪个小时最常见？
总骑行时长（Trip Duration）是多久，平均骑行时长是多久？
哪个起始车站（Start Station）最热门，哪个结束车站（End Station）最热门？
哪一趟行程最热门（即，哪一个起始站点与结束站点的组合最热门）？
每种用户类型有多少人？
每种性别有多少人？
出生年份最早的是哪一年、最晚的是哪一年，最常见的是哪一年？

三、代码实现

工具：Python
文本编辑器：Pycharm

import time
import pandas as pd
import numpy as np


CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

def get_filters():
    """
    Asks user to specify a city, month, and day to analyze.

    Returns:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    """
    print('Hello! Let\'s explore some US bikeshare data!')
    # get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputs
    city = input("Which city do you want to analyze? input ：chicago, new york city, washington\n").lower()
    while True:
        if city not in CITY_DATA.keys():
            city = input('Invalid input======\nwould you like to see data for chicago, '
                         'new youk city, or washington?')
        else:
            break

    # get user input for month (all, january, february, ... , june)
    months = ['all', 'january', 'february', 'march', 'april', 'may', 'june']
    month = input("Which month data do you want to analyze？input ：all，january, february, "
                  "march, april, may, june\n").lower()
    while True:
        if month not in months:
            month = input('Invalid input======\nWhich month data do you want to analyze？input ：all，january, february,'
                  'march, april, may, june\n').lower()
        else:
            break

    # get user input for day of week (all, monday, tuesday, ... sunday)
    days = ['all', 'monday','tuesday','wednesday','thursday','friday','saturday','sunday']
    day = input("Which day of week do you want to analyze? input："
                "all，monday, tuesday, wednesday, thursday, friday, saturday, sunday").lower()
    while True:
        if day not in days:
            day = input("Invalid input======\nWhich day of week do you want to analyze? input："
                "all，monday, tuesday, wednesday, thursday, friday, saturday, sunday").lower()
        else:
            break

    print('-'*40)
    return city, month, day


def load_data(city, month, day):
    """
    Loads data for the specified city and filters by month and day if applicable.

    Args:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    Returns:
        df - Pandas DataFrame containing city data filtered by month and day
    """
    # load data file into a dataframe
    df = pd.read_csv(CITY_DATA[city])

    # convert the Start Time column to datetime
    df['Start Time'] = pd.to_datetime(df['Start Time'])

    # extract month and day of week from Start Time to create new columns
    df['month'] = df['Start Time'].dt.month
    df['day_of_week'] = df['Start Time'].dt.weekday_name

    # filter by month if applicable
    if month != 'all':
        # use the index of the months list to get the corresponding int
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        month = months.index(month) + 1

        # filter by month to create the new dataframe
        df = df[df['month'] == month]

    # filter by day of week if applicable
    if day != 'all':
        # filter by day of week to create the new dataframe
        df = df[df['day_of_week'] == day.title()]
    return df


def time_stats(df):
    """Displays statistics on the most frequent times of travel."""

    print('\nCalculating The Most Frequent Times of Travel...\n')
    start_time = time.time()

    # display the most common month
    common_month = df['month'].mode()[0]
    print('The most common month: ', common_month)

    # display the most common day of week
    common_day_of_week = df['day_of_week'].mode()[0]
    print('The most common day of week: ', common_day_of_week)

    # display the most common start hour
    df['start_hour'] = df['Start Time'].dt.hour
    common_start_hour = df['start_hour'].mode()[0]
    print('The most common start hour: ', common_start_hour)


    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def station_stats(df):
    """Displays statistics on the most popular stations and trip."""

    print('\nCalculating The Most Popular Stations and Trip...\n')
    start_time = time.time()

    # display most commonly used start station
    common_start_station = df['Start Station'].mode()[0]
    print('The most commonly used start station: ', common_start_station)

    # display most commonly used end station
    common_end_station = df['End Station'].mode()[0]
    print('The most commonly used end station: ', common_end_station)

    # display most frequent combination of start station and end station trip
    df['Station'] = df['Start Station'] + df['End Station']
    frequent_station = df['Station'].mode()[0]
    print('The most frequent station: ', frequent_station)

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def trip_duration_stats(df):
    """Displays statistics on the total and average trip duration."""

    print('\nCalculating Trip Duration...\n')
    start_time = time.time()

    # display total travel time
    total_travel_time = df['Trip Duration'].sum()
    print('The total trabel time: ', total_travel_time)

    # display mean travel time
    mean_trabel_time = df['Trip Duration'].mean()
    print('The mean travel time: ', mean_trabel_time)

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def user_stats(df):
    """Displays statistics on bikeshare users."""

    print('\nCalculating User Stats...\n')
    start_time = time.time()

    # Display counts of user types
    count_user_types = df['User Type'].value_counts()
    print('Counts of user types: ', count_user_types)

    # Display counts of gender
    try:
        count_gender = df['Gender'].value_counts()
        print('Counts of gender: ', count_gender)
    except KeyError:
        print('Counts of gender:oh sorry, this city have no this data.')

    # Display earliest, most recent, and most common year of birth
    try:
        earliest_birth = df['Birth Year'].min()
        most_recent_birth = df['Birth Year'].max()
        most_common_birth = df['Birth Year'].mode()[0]
        print('Earliest year of birth:',earliest_birth)
        print('Most recent year of birth',most_recent_birth)
        print('Most common year of birth',most_common_birth)
    except KeyError:
        print('oh sorry, this city have no Birth Year data.')

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def main():
    while True:
        city, month, day = get_filters()
        df = load_data(city, month, day)

        time_stats(df)
        station_stats(df)
        trip_duration_stats(df)
        user_stats(df)

        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break


if __name__ == "__main__":
    main()

四、互动式体验

该文件是一个脚本，它接受原始输入在终端中创建交互式体验，来回答有关数据集的问题。
输入想要查看的问题：

输入.png

得出答案：

答案.png

Ps：脚本还可以持续地优化，这次只是做了一个简易的版本，另外还可以在脚本加入可视化的工具，输入需要的数据，自动生成需要的图表，这就不要太方便了啊啊啊啊啊！！！！！！